Difference between revisions of "R-factors"

From CCP4 wiki
m (Data quality indicators)
(Enclosed the equations in dashed boxes to make things clearer - let me know if this doesn't work well!)
Line 4: Line 4:
 
=== Data quality indicators ===
 
=== Data quality indicators ===
 
In the following, all sums over hkl extend only over unique reflections with more than one observation!
 
In the following, all sums over hkl extend only over unique reflections with more than one observation!
* R<sub>sym</sub> and R<sub>merge</sub> : the formula for both is
+
* R<sub>sym</sub> and R<sub>merge</sub> - the formula for both is:
<math>
+
<math>
R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
+
R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
</math>
+
</math>
 
<br>
 
<br>
 
where <math>\langle I_{hkl}\rangle</math> is the average of symmetry- (or Friedel-) related observations of a unique reflection.
 
where <math>\langle I_{hkl}\rangle</math> is the average of symmetry- (or Friedel-) related observations of a unique reflection.
Line 13: Line 13:
 
It can be shown that this formula results in higher R-factors when the redundancy is higher. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!
 
It can be shown that this formula results in higher R-factors when the redundancy is higher. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!
 
* Redundancy-independant version of the above:  
 
* Redundancy-independant version of the above:  
<math>
+
<math>
R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
+
R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
</math>
+
</math>
 
<br>
 
<br>
 
which unfortunately results in higher (but more realistic) numerical values than R<sub>sym</sub> / R<sub>merge</sub>  
 
which unfortunately results in higher (but more realistic) numerical values than R<sub>sym</sub> / R<sub>merge</sub>  
Line 21: Line 21:
  
 
for intensities use  
 
for intensities use  
<math>
+
<math>
R_{p.i.m.} (or R_{mrgd-I}) = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
+
R_{p.i.m.} (or R_{mrgd-I}) = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
</math>
+
</math>
 
<br>
 
<br>
 
<br>
 
<br>
  
 
and similarly for amplitudes:  
 
and similarly for amplitudes:  
<math>
+
<math>
R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
+
R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
</math>
+
</math>
 
<br>
 
<br>
 
with <math>\langle F_{hkl}\rangle</math> defined analogously as <math>\langle I_{hkl}\rangle</math>.
 
with <math>\langle F_{hkl}\rangle</math> defined analogously as <math>\langle I_{hkl}\rangle</math>.
Line 36: Line 36:
 
=== Model quality indicators ===
 
=== Model quality indicators ===
 
* R and R<sub>free</sub> : the formula for both is  
 
* R and R<sub>free</sub> : the formula for both is  
<math>
+
<math>
R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
+
R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
</math>
+
</math>
 
<br>
 
<br>
 
<br>
 
<br>

Revision as of 12:06, 15 February 2008

Historically, R-factors were introduced by ... ???

Definitions

Data quality indicators

In the following, all sums over hkl extend only over unique reflections with more than one observation!

  • Rsym and Rmerge - the formula for both is:
[math]\displaystyle{ 
 R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
  }[/math]


where [math]\displaystyle{ \langle I_{hkl}\rangle }[/math] is the average of symmetry- (or Friedel-) related observations of a unique reflection.

It can be shown that this formula results in higher R-factors when the redundancy is higher. In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!

  • Redundancy-independant version of the above:
[math]\displaystyle{ 
 R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
  }[/math]


which unfortunately results in higher (but more realistic) numerical values than Rsym / Rmerge

  • measuring quality of averaged intensities/amplitudes:

for intensities use

[math]\displaystyle{ 
 R_{p.i.m.} (or R_{mrgd-I}) = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
  }[/math]



and similarly for amplitudes:

[math]\displaystyle{ 
 R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
  }[/math]


with [math]\displaystyle{ \langle F_{hkl}\rangle }[/math] defined analogously as [math]\displaystyle{ \langle I_{hkl}\rangle }[/math].

Model quality indicators

  • R and Rfree : the formula for both is
[math]\displaystyle{ 
 R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
  }[/math]



where [math]\displaystyle{ F_{hkl}^{obs} }[/math] and [math]\displaystyle{ F_{hkl}^{calc} }[/math] have to be scaled w.r.t. each other. R and Rfree differ in the set of reflections they are calculated from: R is calculated for the working set, whereas Rfree is calculated for the test set.

what do R-factors try to measure, and how to interpret their values?

  • relative deviation of

Data quality

  • typical values: ...

Model quality

what kinds of problems exist with these indicators?

- (Rsym / Rmerge ) should not be used, Rmeas should be used instead (explain why ?)

- R/Rfree and NCS: reflections in work and test set are not independant