Difference between revisions of "R-factors"

From CCP4 wiki
(Enclosed the equations in dashed boxes to make things clearer - let me know if this doesn't work well!)
(Model quality)
Line 47: Line 47:
 
* typical values: ...
 
* typical values: ...
 
=== Model quality ===
 
=== Model quality ===
 +
 +
 +
==== Relation between R and R<sub>free</sub> as a function of resolution ====
 +
 +
References:
 +
* Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree Ratio. I. Derivation of Expected Values of Cross-Validation Residuals Used in Macromolecular Least-Squares Refinement. Acta Cryst. (1998). D54, 547-557 [http://dx.doi.org/10.1107/S0907444997013875]
 +
 +
* Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree ratio. II. Calculation of the expected values and variances of cross-validation statistics in macromolecular least-squares refinement. Acta Cryst. (2000). D56, 442-450 [http://dx.doi.org/10.1107/S0907444999016868]
 +
 +
* GJ Kleywegt and TA Jones (2002). Homo Crystallographicus - Quo vadis? Structure 10, 465-472. (reprint from http://xray.bmc.uu.se/cgi-bin/gerard/reprint_mailer.pl?pref=65)
 +
- plot: http://xray.bmc.uu.se/gerard/supmat/rfree2000/rfminusr_vs_resolution.gif
 +
 +
- many more plots: http://xray.bmc.uu.se/gerard/supmat/rfree2000
 +
 +
- harry plotter (java): http://xray.bmc.uu.se/gerard/supmat/rfree2000/plotter.html
  
 
== what kinds of problems exist with these indicators? ==
 
== what kinds of problems exist with these indicators? ==

Revision as of 17:52, 17 February 2008

Historically, R-factors were introduced by ... ???

Definitions

Data quality indicators

In the following, all sums over hkl extend only over unique reflections with more than one observation!

  • Rsym and Rmerge - the formula for both is:
[math]
 R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
 [/math]


where [math]\langle I_{hkl}\rangle[/math] is the average of symmetry- (or Friedel-) related observations of a unique reflection.

It can be shown that this formula results in higher R-factors when the redundancy is higher. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!

  • Redundancy-independant version of the above:
[math]
 R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
 [/math]


which unfortunately results in higher (but more realistic) numerical values than Rsym / Rmerge

  • measuring quality of averaged intensities/amplitudes:

for intensities use

[math]
 R_{p.i.m.} (or R_{mrgd-I}) = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
 [/math]



and similarly for amplitudes:

[math]
 R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
 [/math]


with [math]\langle F_{hkl}\rangle[/math] defined analogously as [math]\langle I_{hkl}\rangle[/math].

Model quality indicators

  • R and Rfree : the formula for both is
[math]
 R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
 [/math]



where [math]F_{hkl}^{obs}[/math] and [math]F_{hkl}^{calc}[/math] have to be scaled w.r.t. each other. R and Rfree differ in the set of reflections they are calculated from: R is calculated for the working set, whereas Rfree is calculated for the test set.

what do R-factors try to measure, and how to interpret their values?

  • relative deviation of

Data quality

  • typical values: ...

Model quality

Relation between R and Rfree as a function of resolution

References:

  • Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree Ratio. I. Derivation of Expected Values of Cross-Validation Residuals Used in Macromolecular Least-Squares Refinement. Acta Cryst. (1998). D54, 547-557 [1]
  • Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree ratio. II. Calculation of the expected values and variances of cross-validation statistics in macromolecular least-squares refinement. Acta Cryst. (2000). D56, 442-450 [2]

- plot: http://xray.bmc.uu.se/gerard/supmat/rfree2000/rfminusr_vs_resolution.gif

- many more plots: http://xray.bmc.uu.se/gerard/supmat/rfree2000

- harry plotter (java): http://xray.bmc.uu.se/gerard/supmat/rfree2000/plotter.html

what kinds of problems exist with these indicators?

- (Rsym / Rmerge ) should not be used, Rmeas should be used instead (explain why ?)

- R/Rfree and NCS: reflections in work and test set are not independant