Data quality

From CCP4 wiki
Revision as of 02:20, 10 February 2008 by Wgscott (talk | contribs) (changed to greek letter sigma)
Jump to navigationJump to search

What is the resolution of my dataset?

First of all, it is limited by completeness. In practical terms this means that the highest resolution you can get is the resolution at the edge of the detector. If you collected enough frames, you may be able to squeeze out 0.1A if you process data all the way to the corner. Usually the detector is positioned close enough to the crystal so that you don't have any diffraction at the edge and then resolution limits should be chosen based on strength of the diffraction.

This limit is commonly based on average [math]\displaystyle{ I/\sigma }[/math]. Examples of such choices are:

- [math]\displaystyle{ I/\sigma=1 }[/math] in the highest resolution shell

- [math]\displaystyle{ I/\sigma=2 }[/math] in the highest resolution shell

- at least 50% of reflections in the highest resolution shell have [math]\displaystyle{ I/\sigma }[/math] > 2


Some of these choices are more liberal than others (and so will give you higher resolution). It is probably not worthwhile to argue which choice is the best, since it is indeed a matter of personal preference.

There is not probably much reason to limit resolution by Rmerge. When the resolution limit is selected based on Rmerge being less than certain cutoff, the argument is that in higher resolution shells the variation among independent measurements of the intensity of the same reflection is too high. But such variation is bound to be high for weak reflections. Plus, factors such as redundancy may significantly affect Rmerge. Rmerge may and should be used as the measure of the overall data quality (e.g. of two independent datasets the one that has higher Rmerge probably is noisier).

One thing you achieve by choosing resolution limit based on Rmerge (which generally means that your I/sigma in the highest resolution shell will be >4), of course, is lower R-factors in refinement. It is perfectly OK to aspire low R-factors, but to achieve this by throwing away data probably isn't.