CC1/2: Difference between revisions

Jump to navigation Jump to search
3,539 bytes added ,  15 June 2020
→‎See also: link PDF
No edit summary
(→‎See also: link PDF)
(31 intermediate revisions by 2 users not shown)
Line 1: Line 1:


== number of reflection pairs ==
==CC<sub>1/2</sub> calculation==
[[CORRECT.LP]] and XSCALE.LP do not explicitly state the ''number of reflection pairs'' that were used to calculated CC<sub>1/2</sub>.


However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC<sub>1/2</sub> and the R-value calculations.
CC<sub>1/2</sub> can be calculated with the so-called σ-τ method ([https://cms.uni-konstanz.de/index.php?eID=tx_nawsecuredl&u=0&g=0&t=1475179096&hash=5cf64234a23a794a1894c5408384c57208d7b602&file=fileadmin/biologie/ag-strucbio/pdfs/Assman2016_JApplCryst.pdf Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography. J. Appl. Cryst. 49, 1021-1028.]) by:


The ''number of reflections pairs'' that were used for the CC<sub>1/2</sub> calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were  (X-Y+Z) reflection pairs that went into CC<sub>1/2</sub>.
: <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math>


This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities, and <math>\sigma^2_{\epsilon} </math>, the average of the variances of the averaged (merged) intensities.


== why CC<sub>1/2</sub> can be negative ==
== Method ==
There is a mathematical reason, explained in §4.1 of [https://cms.uni-konstanz.de/index.php?eID=tx_nawsecuredl&u=0&g=0&t=1475179096&hash=5cf64234a23a794a1894c5408384c57208d7b602&file=fileadmin/biologie/ag-strucbio/pdfs/Assman2016_JApplCryst.pdf Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.]


===''' <math>\sigma^2_{\epsilon} </math>'''===


==CC<sub>1/2</sub> calculation==
With <math>x_{j,i} </math> , a single observation j of all n observations of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the unbiased sample variance of the mean for every unique reflection i by:


CC<sub>1/2</sub> is calculated by:
<math>s^2_{\epsilon i} =  \frac{1}{n_{i}-1} \cdot \left ( \sum^{n_{i}}_{j} x^2_{j,i} - \frac{\left ( \sum^{n_{i}}_{j}x_{j,i} \right )^2}{n_{i}} \right )    / \frac{n_{i}}{2} </math>


: <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math>
<math>s^2_{\epsilon i} </math> is divided by the factor  <math>\frac{n}{2} </math>, because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by '''n/2''' takes care of providing the variance of the mean ([https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]) (merged) intensity of the '''half'''-datasets, as defined in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. These "variances of means" are averaged over all unique reflections of the resolution shell:
<math>s^2_{\epsilon}=\sum^N_{i} s^2_{\epsilon i} / N </math>  


This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities across the unique reflections of a resolution shell, and <math>\sigma^2_{\epsilon} </math>, the average of all sample variances of the mean across all unique reflections of a resolution shell.


== Implementation ==
----


===''' <math>\sigma^2_{\epsilon} </math>'''===


The average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by:
If the standard deviations <math>\sigma_{int\_j,i} </math> for the single observations are considered as weights for the CC<sub>1/2</sub> calculation, with <math>w_{j,i}=\frac{1}{\sigma_{int\_j,i}^2} </math>, one way to obtain the unbiased '''weighted''' sample variance of the half-dataset mean for every unique reflection i is:  


<math>\sigma^2_{\epsilon i} =  \frac{1}{n-1} \cdot \left ( \sum^n_{j} x^2_{j,i} - \frac{\left ( \sum^n_{j}x_{j,i} \right )^2}{ n} \right )    / \frac{n}{2} </math>
<math>s^2_{\epsilon i\_w} =  \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right )    / \frac{n_{i}}{2} </math>


With <math>x_{j,i} </math> , a single observation j of all observations n of one reflection i. <math>\sigma^2_{\epsilon i} </math> is then divided by the factor  <math>\frac{n}{2} </math>, because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by n/2 takes care of providing the variance of the mean (merged) intensity of the half-datasets, as defined in [https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]. These "variances of means" are averaged over all unique reflections of the resolution shell:
These " weighted variances of means" are averaged over all unique reflections of the resolution shell:


<math>\sum^N_{i} \sigma^2_{\epsilon i} / N </math>  
<math>s^2_{\epsilon_w}=\sum^N_{i} s^2_{\epsilon i\_w} / N </math>  


It should be noted that it is not straightforward to define the correct way to calculate a weighted variance (and the weighted variance of the mean). The formula <math>s^2_w =  \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right )</math> is - after some manipulation - the same as that found at [https://stats.stackexchange.com/questions/6534/how-do-i-calculate-a-weighted-standard-deviation-in-excel],[https://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf]. Other ways of calculating the weighted variance of the mean ([https://en.wikipedia.org/wiki/Weighted_arithmetic_mean],[https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html]) should be considered.


----
----
Line 37: Line 38:
===''' <math>\sigma^2_{y} </math>'''===
===''' <math>\sigma^2_{y} </math>'''===


The unbiased sample variance from all averaged intensities of all unique reflections is calculated by:  
Let N be the number of reflections in a resolution shell. With <math>\overline{x}_{i}= \sum^n_{j} x_{j,i}</math> , the unbiased sample variance from all averaged intensities of all unique reflections is calculated by:  


<math>\sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) </math>
<math>s^2_{y} = \frac{1}{N-1} \cdot \left (\sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) </math>  


With <math>\overline{x}_{i}= \sum^n_{j} x_{j,i}</math> , average intensity of all observations from all frames/crystals of one unique reflection i. This is done for all reflections N in a resolution shell.
----
If the standard deviations <math>\sigma_{int\_j,i} </math> for the single observations are used for weighting, <math>s^2_{y}</math> is obtained from
<math>\overline{x}_{i_w} = \frac{\sum^n_{j} w_{j,i}  x_{j,i}} {\sum^n_{j}w_{j,i}}</math>:


<math>s^2_{y_w} = \frac{1}{N-1} \cdot \left (\sum^N_{i} \overline{x}_{i_w}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i_w} \right )^2}{ N} \right ) </math>


== Example ==
== Example ==
An example is shown for a very simplified data file (unmerged ASCII.HKL). Only two frames/crystals are looked at and the diffraction pattern also consists only of two unique reflections with each three observations for every unique reflection.  
An example is shown for a very simplified data file (unmerged ASCII.HKL).


<pre>
<pre>
First reflection with 6 observations:
First reflection with 6 observations:
     h    k    l      int    σ(int)  #datset
     h    k    l      int    σ(int)   
     2    0    0  9.156E+02  3.686E+00  1  
     2    0    0  9.156E+02  3.686E+00  1  
     0    2    0  5.584E+02  3.093E+00  1  
     0    2    0  5.584E+02  3.093E+00  1  
Line 57: Line 61:
     0    0    2  7.301E+02  2.405E+01  2  
     0    0    2  7.301E+02  2.405E+01  2  
</pre>
</pre>
<math>x_{i} </math> , the average intensity of all observations from all frames/crystals of this reflection = 669.6999
<math>\overline{x}_{1} </math> , the average intensity of all observations of this reflection = 669.6999


<math>\sigma^2_{\epsilon i} </math>, the unbiased sample variance of the mean of all observations of this unique reflection i = 20848.2198 (62544.6597/(n/2))
<math>\sigma^2_{\epsilon 1} </math>, the unbiased sample variance of the mean of all observations of this unique reflection = 62544.6597/(n/2) = 20848.2198


   
   
<pre>
<pre>
Second reflection with 6 observations:
Second reflection with 6 observations:
     h    k    l      int    σ(int)  #datset
     h    k    l      int    σ(int)   
     1    1    2  2.395E+01  8.932E+01  1   
     1    1    2  2.395E+01  8.932E+01  1   
     1    2    1  9.065E+01  7.407E+00  1   
     1    2    1  9.065E+01  7.407E+00  1   
Line 72: Line 76:
     2    1    1  1.608E+01  2.215E+01  2   
     2    1    1  1.608E+01  2.215E+01  2   
</pre>
</pre>
<math>x_{i} </math> , the average intensity of all observations from all frames/crystals of this reflection = 52.5150  
<math>\overline{x}_{2} </math> , the average intensity of all observations of this reflection = 52.5150  
 
<math>\sigma^2_{\epsilon i} </math>, the unbiased sample variance of the mean of all observations of this unique reflection i = 363.3267 (1089.9803/(n/2))


<math>\sigma^2_{\epsilon 2} </math>, the unbiased sample variance of the mean of all observations of this unique reflection = 1089.9803/(n/2) = 363.3267
1089.9803/(n/2)


<math>\sigma^2_{\epsilon} </math> , the average of all the <math>\sigma^2_{\epsilon i} </math> = 10605.7733
<math>\sigma^2_{\epsilon} </math> , the average of all the <math>\sigma^2_{\epsilon i} </math> = 10605.7733
Line 81: Line 85:
<math>\sigma^2_{y} </math>, the variance of all the averaged intensities = 190458.6533
<math>\sigma^2_{y} </math>, the variance of all the averaged intensities = 190458.6533


As a result of these calculations  CC<sub>1/2</sub> =.9458 ((190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733))
As a result of these calculations  CC<sub>1/2</sub> = (190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733)), which results in 0.9458.
 
The described calculation is implemented in [[XDSCC12]], and CC<sub>1/2</sub> and [[DeltaCC12|ΔCC<sub>1/2</sub>]] can be calculated for XDS_ASCII.HKL and XSCALE.HKL files.
 
==Fortran 95 code that assumes that all unique reflections have the same number of observations==
 
<pre>
sumibar=0
sumibar2=0
sumsig2eps=0
DO i=1,nref
  xbar=SUM(iobs(:,i))/nobs
  sumibar=sumi+xbar
  sumibar2=sumibar2+xbar**2
  sumsig2eps=sumeps + (SUM(iobs(:,i)**2)-xbar**2*nobs)/(nobs-1)/(nobs/2)
END DO
sig2y=(sumibar2-sumibar**2/nref)/(nref-1)
sig2eps=sumsig2eps/nref
print *,(sig2y-0.5*sig2eps)/(sig2y+0.5*sig2eps)
</pre>
 
 
== number of reflection pairs ==
[[CORRECT.LP]] and XSCALE.LP do not explicitly state the ''number of reflection pairs'' that were used to calculated CC<sub>1/2</sub>.
 
However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC<sub>1/2</sub> and the R-value calculations.
 
The ''number of reflections pairs'' that were used for the CC<sub>1/2</sub> calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were  (X-Y+Z) reflection pairs that went into CC<sub>1/2</sub>.
 
 
== why CC<sub>1/2</sub> can be negative ==
If the numerator of the formula becomes negative, CC<sub>1/2</sub> is negative. This happens if the variance of the average intensities across the unique reflections of a resolution shell is low, but the individual measurements of each unique reflection vary strongly. This is discussed in §4.1 of [https://cms.uni-konstanz.de/index.php?eID=tx_nawsecuredl&u=0&g=0&t=1475179096&hash=5cf64234a23a794a1894c5408384c57208d7b602&file=fileadmin/biologie/ag-strucbio/pdfs/Assman2016_JApplCryst.pdf Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.]
 
== Implementation ==
This way of calculating CC<sub>1/2</sub> is implemented in [[XDSCC12]] and in [http://www.desy.de/~twhite/crystfel/manual-partialator.html partialator] of [http://www.desy.de/~twhite/crystfel/relnotes-0.8.0 CrystFEL].
 
== See also ==
[https://www.youtube.com/watch?v=LirxJIcQ6T0 CC* - Linking crystallographic model and data quality.] Video recorded at SBGrid/NE-CAT workshop 2014; the PDF is [https://strucbio.biologie.uni-konstanz.de/pub/CC%20-%20Linking%20crystallographic%20model%20and%20data%20quality.pdf here]. The sound is poor.
2,652

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu