CC1/2: Difference between revisions

121 bytes added ,  28 September 2016
no edit summary
No edit summary
Line 1: Line 1:


== number of reflection pairs ==
== number of reflection pairs ==
[[CORRECT.LP]] and XSCALE.LP do not explicitly state the ''number of reflection pairs'' that were used to calculated CC1/2..
[[CORRECT.LP]] and XSCALE.LP do not explicitly state the ''number of reflection pairs'' that were used to calculated CC<sub>1/2</sub>.


However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC1/2 and the R-value calculations.
However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC<sub>1/2</sub> and the R-value calculations.


The ''number of reflections pairs'' that were used for the CC1/2 calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were  (X-Y+Z) reflection pairs that went into CC1/2.
The ''number of reflections pairs'' that were used for the CC<sub>1/2</sub> calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were  (X-Y+Z) reflection pairs that went into CC<sub>1/2</sub>.




== value of CC1/2 at a resolution where the signal vanishes ==
== value of CC<sub>1/2</sub> at a resolution where the signal vanishes ==
At a resolution where the signal vanishes, CC1/2 should be around zero. However, empirically we sometimes see negative values of CC1/2 (to values down to around -0.4)  when using SFTOOLS or PHENIX.CC_STAR for calculating it. On the other hand, CC1/2 as printed out in CORRECT.LP does approach zero. How can this be understood?
At a resolution where the signal vanishes, CC<sub>1/2</sub> should be around zero. However, empirically we sometimes see negative values of CC<sub>1/2</sub> (to values down to around -0.4)  when using SFTOOLS or PHENIX.CC_STAR for calculating it. On the other hand, CC<sub>1/2</sub> as printed out in CORRECT.LP does approach zero. How can this be understood?


The reason is that CORRECT does "alien" rejection (as documented in [[CORRECT.LP]])  ''after'' the final statistics table is printed. "Aliens" are reflections that are much stronger than should be expected in their resolution range, e.g. ice reflections. These reflections are identified in the following way: the average intensity in a resolution range is calculated. Any (acentric) reflection whose intensity is larger than 10 times the average is suspicious/unexpected; it is printed out at the bottom of CORRECT.LP (for centrics, the criterion is a bit different). By default, the parameter REJECT_ALIENS has a value of 20, which means that those reflections with intensity > 20*average are marked as aliens (outliers), and are disregarded in downstream processing (e.g. [[XDSCONV]]).
The reason is that CORRECT does "alien" rejection (as documented in [[CORRECT.LP]])  ''after'' the final statistics table is printed. "Aliens" are reflections that are much stronger than should be expected in their resolution range, e.g. ice reflections. These reflections are identified in the following way: the average intensity in a resolution range is calculated. Any (acentric) reflection whose intensity is larger than 10 times the average is suspicious/unexpected; it is printed out at the bottom of CORRECT.LP (for centrics, the criterion is a bit different). By default, the parameter REJECT_ALIENS has a value of 20, which means that those reflections with intensity > 20*average are marked as aliens (outliers), and are disregarded in downstream processing (e.g. [[XDSCONV]]).
Line 15: Line 15:
This is useful for identifying ice/salt/cosmic ray reflections if the average intensity/noise is high enough. However, in a resolution shell where the noise is much stronger than the signal (empirically, if the average I/sigma is less than 0.2), many reflections are considered as aliens - those where the noise happens to be strongly positive. If these are rejected (i.e. if the default REJECT_ALIEN is applied) then the average intensity even may become negative.  
This is useful for identifying ice/salt/cosmic ray reflections if the average intensity/noise is high enough. However, in a resolution shell where the noise is much stronger than the signal (empirically, if the average I/sigma is less than 0.2), many reflections are considered as aliens - those where the noise happens to be strongly positive. If these are rejected (i.e. if the default REJECT_ALIEN is applied) then the average intensity even may become negative.  


In addition, CC1/2 becomes negative as can be seen in a simulation that should clarify the principle. It employs random numbers that are normally distributed, with an average of 0.05 and a variance of one. In the figure below, each reflection is represented at a location determined by the intensities of its two subsets. Reflections with total intensity>1 are rejected (red crosses), whereas reflections with intensity<1 are used for calculating CC1/2 (green). The magenta line divides the plot into reflections with positive (total) intensity (upper right) and negative (total) intensity (lower left). The blue line is a least-squares fit to the "green" reflections; the correlation coefficient is -0.3 (while that of all reflections is close to 0.0).
In addition, CC1<sub>1/2</sub> becomes negative as can be seen in a simulation that should clarify the principle. It employs random numbers that are normally distributed, with an average of 0.05 and a variance of one. In the figure below, each reflection is represented at a location determined by the intensities of its two subsets. Reflections with total intensity>1 are rejected (red crosses), whereas reflections with intensity<1 are used for calculating CC<sub>1/2</sub> (green). The magenta line divides the plot into reflections with positive (total) intensity (upper right) and negative (total) intensity (lower left). The blue line is a least-squares fit to the "green" reflections; the correlation coefficient is -0.3 (while that of all reflections is close to 0.0).


To ensure that this type of rejection does not take place, one should e.g. specify REJECT_ALIENS=20000 in XDS.INP. To obtain the statistics ''after'' rejecting aliens, one could use [[XSCALE]].
To ensure that this type of rejection does not take place, one should e.g. specify REJECT_ALIENS=20000 in XDS.INP. To obtain the statistics ''after'' rejecting aliens, one could use [[XSCALE]].
Line 21: Line 21:
[[File:Reject_aliens.png]]
[[File:Reject_aliens.png]]


== why CC1/2 can be negative ==
== why CC<sub>1/2</sub> can be negative ==
There is a mathematical reason, explained in §4.1 of [https://cms.uni-konstanz.de/index.php?eID=tx_nawsecuredl&u=0&g=0&t=1475179096&hash=5cf64234a23a794a1894c5408384c57208d7b602&file=fileadmin/biologie/ag-strucbio/pdfs/Assman2016_JApplCryst.pdf Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.]
There is a mathematical reason, explained in §4.1 of [https://cms.uni-konstanz.de/index.php?eID=tx_nawsecuredl&u=0&g=0&t=1475179096&hash=5cf64234a23a794a1894c5408384c57208d7b602&file=fileadmin/biologie/ag-strucbio/pdfs/Assman2016_JApplCryst.pdf Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.]
2,651

edits