R-factors: Difference between revisions

Revision as of 17:09, 18 February 2008

Historically, R-factors were introduced by ... ???

Definitions

Data quality indicators

In the following, all sums over hkl extend only over unique reflections with more than one observation!

R_sym and R_merge - the formula for both is:

[math]\displaystyle{ 
 R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
  }[/math]

where [math]\displaystyle{ \langle I_{hkl}\rangle }[/math] is the average of symmetry- (or Friedel-) related observations of a unique reflection.

It can be shown that this formula results in higher R-factors when the redundancy is higher ^[1]. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!

Redundancy-independant version of the above:

[math]\displaystyle{ 
 R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
  }[/math]

which unfortunately results in higher (but more realistic) numerical values than R_sym / R_merge ^[1] (M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205[2]).

measuring quality of averaged intensities/amplitudes:

for intensities use (M.S. Weiss. Global indicators of X-ray data quality. J. Appl. Cryst. (2001). 34, 130-135 [3])

[math]\displaystyle{ 
 R_{p.i.m.} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
  }[/math]

[math]\displaystyle{ R_{mrgd-I} }[/math] is similarly defined in Diederichs and Karplus ^[1].

Similarly, one should use R_mrgd-F as a quality indicator for amplitudes ^[1], which may be calculated as:

[math]\displaystyle{ 
 R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
  }[/math]

with [math]\displaystyle{ \langle F_{hkl}\rangle }[/math] defined analogously as [math]\displaystyle{ \langle I_{hkl}\rangle }[/math].

Model quality indicators

R and R_free : the formula for both is

[math]\displaystyle{ 
 R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
  }[/math]

where [math]\displaystyle{ F_{hkl}^{obs} }[/math] and [math]\displaystyle{ F_{hkl}^{calc} }[/math] have to be scaled w.r.t. each other. R and R_free differ in the set of reflections they are calculated from: R is calculated for the working set, whereas R_free is calculated for the test set.

what do R-factors try to measure, and how to interpret their values?

relative deviation of

Data quality

typical values: ...

Model quality

Relation between R and R_free as a function of resolution

References:

Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree Ratio. I. Derivation of Expected Values of Cross-Validation Residuals Used in Macromolecular Least-Squares Refinement. Acta Cryst. (1998). D54, 547-557 [4]

Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree ratio. II. Calculation of the expected values and variances of cross-validation statistics in macromolecular least-squares refinement. Acta Cryst. (2000). D56, 442-450 [5]

GJ Kleywegt and TA Jones (2002). Homo Crystallographicus - Quo vadis? Structure 10, 465-472. (reprint from http://xray.bmc.uu.se/cgi-bin/gerard/reprint_mailer.pl?pref=65)

- formula from that paper: R_free = 1.065*R + 0.036

- plot with empirical data: http://xray.bmc.uu.se/gerard/supmat/rfree2000/rfminusr_vs_resolution.gif

- many more plots: http://xray.bmc.uu.se/gerard/supmat/rfree2000

- harry plotter (java): http://xray.bmc.uu.se/gerard/supmat/rfree2000/plotter.html

what kinds of problems exist with these indicators?

- (R_sym / R_merge ) should not be used, R_meas should be used instead (explain why ?)

- R/R_free and NCS: reflections in work and test set are not independant

Notes

↑ ^1.0 ^1.1 ^1.2 ^1.3 K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [1]

[DiKa97-1] 1.0 ^1.1 ^1.2 ^1.3 K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [1]

[1]

@@ Line 10: / Line 10: @@
 where <math>\langle I_{hkl}\rangle</math> is the average of symmetry- (or Friedel-) related observations of a unique reflection.
-It can be shown that this formula results in higher R-factors when the redundancy is higher (K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [http://strucbio.biologie.uni-konstanz.de/strucbio/files/nsb-1997.pdf]). In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!
+It can be shown that this formula results in higher R-factors when the redundancy is higher <ref name="DiKa97">K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [http://strucbio.biologie.uni-konstanz.de/strucbio/files/nsb-1997.pdf]</ref>. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!
 * Redundancy-independant version of the above:
   <math>
   R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
   </math>
-which unfortunately results in higher (but more realistic) numerical values than R<sub>sym</sub> / R<sub>merge</sub> (Diederichs and Karplus (1997)[http://strucbio.biologie.uni-konstanz.de/strucbio/files/nsb-1997.pdf], and M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205[http://dx.doi.org/10.1107/S0021889897003907]).
+which unfortunately results in higher (but more realistic) numerical values than R<sub>sym</sub> / R<sub>merge</sub> <ref name="DiKa97"/> (M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205[http://dx.doi.org/10.1107/S0021889897003907]).
 * measuring quality of averaged intensities/amplitudes:
@@ Line 24: / Line 24: @@
   </math>
-<math>R_{mrgd-I}</math> is similarly defined in Diederichs and Karplus (1997).
+<math>R_{mrgd-I}</math> is similarly defined in Diederichs and Karplus <ref name="DiKa97"/>.
-Similarly, one should use R<sub>mrgd-F</sub> as a quality indicator for amplitudes (Diederichs and Karplus (1997) [http://strucbio.biologie.uni-konstanz.de/strucbio/files/nsb-1997.pdf]), which may be calculated as:
+Similarly, one should use R<sub>mrgd-F</sub> as a quality indicator for amplitudes <ref name="DiKa97"/>, which may be calculated as:
   <math>
   R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
@@ Line 67: / Line 67: @@
 - R/R<sub>free</sub> and NCS: reflections in work and test set are not independant
+==Notes==
+<references/>

R-factors: Difference between revisions

Revision as of 17:09, 18 February 2008

Contents

Definitions

Data quality indicators

Model quality indicators

what do R-factors try to measure, and how to interpret their values?

Data quality

Model quality

Relation between R and R_free as a function of resolution

what kinds of problems exist with these indicators?

Notes

Navigation menu

R-factors: Difference between revisions

Revision as of 17:09, 18 February 2008

Definitions

Data quality indicators

Model quality indicators

what do R-factors try to measure, and how to interpret their values?

Data quality

Model quality

Relation between R and Rfree as a function of resolution

what kinds of problems exist with these indicators?

Notes

Navigation menu

Search

Relation between R and R_free as a function of resolution