# Difference between revisions of "R-factors"

(Enclosed the equations in dashed boxes to make things clearer - let me know if this doesn't work well!) |
|||

Line 4: | Line 4: | ||

=== Data quality indicators === | === Data quality indicators === | ||

In the following, all sums over hkl extend only over unique reflections with more than one observation! | In the following, all sums over hkl extend only over unique reflections with more than one observation! | ||

− | * R<sub>sym</sub> and R<sub>merge</sub> | + | * R<sub>sym</sub> and R<sub>merge</sub> - the formula for both is: |

− | <math> | + | <math> |

− | R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} | + | R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} |

− | </math> | + | </math> |

<br> | <br> | ||

where <math>\langle I_{hkl}\rangle</math> is the average of symmetry- (or Friedel-) related observations of a unique reflection. | where <math>\langle I_{hkl}\rangle</math> is the average of symmetry- (or Friedel-) related observations of a unique reflection. | ||

Line 13: | Line 13: | ||

It can be shown that this formula results in higher R-factors when the redundancy is higher. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality! | It can be shown that this formula results in higher R-factors when the redundancy is higher. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality! | ||

* Redundancy-independant version of the above: | * Redundancy-independant version of the above: | ||

− | <math> | + | <math> |

− | R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} | + | R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} |

− | </math> | + | </math> |

<br> | <br> | ||

which unfortunately results in higher (but more realistic) numerical values than R<sub>sym</sub> / R<sub>merge</sub> | which unfortunately results in higher (but more realistic) numerical values than R<sub>sym</sub> / R<sub>merge</sub> | ||

Line 21: | Line 21: | ||

for intensities use | for intensities use | ||

− | <math> | + | <math> |

− | R_{p.i.m.} (or R_{mrgd-I}) = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} | + | R_{p.i.m.} (or R_{mrgd-I}) = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} |

− | </math> | + | </math> |

<br> | <br> | ||

<br> | <br> | ||

and similarly for amplitudes: | and similarly for amplitudes: | ||

− | <math> | + | <math> |

− | R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}} | + | R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}} |

− | </math> | + | </math> |

<br> | <br> | ||

with <math>\langle F_{hkl}\rangle</math> defined analogously as <math>\langle I_{hkl}\rangle</math>. | with <math>\langle F_{hkl}\rangle</math> defined analogously as <math>\langle I_{hkl}\rangle</math>. | ||

Line 36: | Line 36: | ||

=== Model quality indicators === | === Model quality indicators === | ||

* R and R<sub>free</sub> : the formula for both is | * R and R<sub>free</sub> : the formula for both is | ||

− | <math> | + | <math> |

− | R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}} | + | R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}} |

− | </math> | + | </math> |

<br> | <br> | ||

<br> | <br> |

## Revision as of 12:06, 15 February 2008

Historically, R-factors were introduced by ... ???

## Definitions

### Data quality indicators

In the following, all sums over hkl extend only over unique reflections with more than one observation!

- R
_{sym}and R_{merge}- the formula for both is:

```
[math]\displaystyle{
R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
}[/math]
```

where [math]\displaystyle{ \langle I_{hkl}\rangle }[/math] is the average of symmetry- (or Friedel-) related observations of a unique reflection.

It can be shown that this formula results in higher R-factors when the redundancy is higher. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!

- Redundancy-independant version of the above:

```
[math]\displaystyle{
R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
}[/math]
```

which unfortunately results in higher (but more realistic) numerical values than R_{sym} / R_{merge}

- measuring quality of averaged intensities/amplitudes:

for intensities use

```
[math]\displaystyle{
R_{p.i.m.} (or R_{mrgd-I}) = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
}[/math]
```

and similarly for amplitudes:

```
[math]\displaystyle{
R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
}[/math]
```

with [math]\displaystyle{ \langle F_{hkl}\rangle }[/math] defined analogously as [math]\displaystyle{ \langle I_{hkl}\rangle }[/math].

### Model quality indicators

- R and R
_{free}: the formula for both is

```
[math]\displaystyle{
R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
}[/math]
```

where [math]\displaystyle{ F_{hkl}^{obs} }[/math] and [math]\displaystyle{ F_{hkl}^{calc} }[/math] have to be scaled w.r.t. each other. R and R_{free} differ in the set of reflections they are calculated from: R is calculated for the working set, whereas R_{free} is calculated for the test set.

## what do R-factors try to measure, and how to interpret their values?

- relative deviation of

### Data quality

- typical values: ...

### Model quality

## what kinds of problems exist with these indicators?

- (R_{sym} / R_{merge} ) should not be used, R_{meas} should be used instead (explain why ?)

- R/R_{free} and NCS: reflections in work and test set are not independant