# Difference between revisions of "R-factors"

Historically, R-factors were introduced by ... ???

## Definitions

### Data quality indicators

In the following, all sums over hkl extend only over unique reflections with more than one observation!

• Rsym and Rmerge - the formula for both is:
$\displaystyle{ R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} }$


where $\displaystyle{ \langle I_{hkl}\rangle }$ is the average of symmetry- (or Friedel-) related observations of a unique reflection.

It can be shown that this formula results in higher R-factors when the redundancy is higher . In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!

• Redundancy-independant version of the above:
$\displaystyle{ R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} }$


which unfortunately results in higher (but more realistic) numerical values than Rsym / Rmerge  (M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205).

• measuring quality of averaged intensities/amplitudes:

for intensities use (M.S. Weiss. Global indicators of X-ray data quality. J. Appl. Cryst. (2001). 34, 130-135 )

$\displaystyle{ R_{p.i.m.} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}} }$


$\displaystyle{ R_{mrgd-I} }$ is similarly defined in Diederichs and Karplus .

Similarly, one should use Rmrgd-F as a quality indicator for amplitudes , which may be calculated as:

$\displaystyle{ R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}} }$


with $\displaystyle{ \langle F_{hkl}\rangle }$ defined analogously as $\displaystyle{ \langle I_{hkl}\rangle }$.

### Model quality indicators

• R and Rfree : the formula for both is
$\displaystyle{ R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}} }$


where $\displaystyle{ F_{hkl}^{obs} }$ and $\displaystyle{ F_{hkl}^{calc} }$ have to be scaled w.r.t. each other. R and Rfree differ in the set of reflections they are calculated from: R is calculated for the working set, whereas Rfree is calculated for the test set.

## what do R-factors try to measure, and how to interpret their values?

• relative deviation of

### Data quality

• typical values: ...

### Model quality

#### Relation between R and Rfree as a function of resolution

References:

• Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree Ratio. I. Derivation of Expected Values of Cross-Validation Residuals Used in Macromolecular Least-Squares Refinement. Acta Cryst. (1998). D54, 547-557 
• Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree ratio. II. Calculation of the expected values and variances of cross-validation statistics in macromolecular least-squares refinement. Acta Cryst. (2000). D56, 442-450 

- formula from that paper: Rfree = 1.065*R + 0.036

- plot with empirical data: http://xray.bmc.uu.se/gerard/supmat/rfree2000/rfminusr_vs_resolution.gif

- many more plots: http://xray.bmc.uu.se/gerard/supmat/rfree2000

- harry plotter (java): http://xray.bmc.uu.se/gerard/supmat/rfree2000/plotter.html

## what kinds of problems exist with these indicators?

- (Rsym / Rmerge ) should not be used, Rmeas should be used instead (explain why ?)

- R/Rfree and NCS: reflections in work and test set are not independant