# CC1/2

## CC1/2 calculation

CC1/2 can be calculated with the so-called σ-τ method (Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.) by:

$\displaystyle{ CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} }$

This requires calculation of $\displaystyle{ \sigma^2_{y} }$, the variance of the average intensities across the unique reflections of a resolution shell, and $\displaystyle{ \sigma^2_{\epsilon} }$, the average of all sample variances of the averaged (merged) intensities across all unique reflections of a resolution shell.

## Method

### $\displaystyle{ \sigma^2_{\epsilon} }$

With $\displaystyle{ x_{j,i} }$ , a single observation j of all observations n of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the unbiased sample variance of the mean for every unique reflection i by:

$\displaystyle{ \sigma^2_{\epsilon i} = \frac{1}{n_{i}-1} \cdot \left ( \sum^{n_{i}}_{j} x^2_{j,i} - \frac{\left ( \sum^{n_{i}}_{j}x_{j,i} \right )^2}{n_{i}} \right ) / \frac{n_{i}}{2} }$

$\displaystyle{ \sigma^2_{\epsilon i} }$ is divided by the factor $\displaystyle{ \frac{n}{2} }$, because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by n/2 takes care of providing the variance of the mean ([1]) (merged) intensity of the half-datasets, as defined in Karplus and Diederichs (2012). These "variances of means" are averaged over all unique reflections of the resolution shell:

$\displaystyle{ \sigma^2_{\epsilon}=\sum^N_{i} \sigma^2_{\epsilon i} / N }$

If the standard deviations $\displaystyle{ \sigma_{int\_j,i} }$ for the single observations are considered as weights for the CC1/2 calculation, with $\displaystyle{ \sum^{n_{i}}_{j}w_{i}=\sum^{n_{i}}_{j}\frac{1}{\sigma_{int\_j,i}^2} }$ the unbiased weighted sample variance of the mean for every unique reflection i is obtained by:

$\displaystyle{ \sigma^2_{\epsilon i\_w} = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right ) / \frac{n_{i}}{2} }$

These " weighted variances of means" are averaged over all unique reflections of the resolution shell:

$\displaystyle{ \sigma^2_{\epsilon_w}=\sum^N_{i} \sigma^2_{\epsilon i\_w} / N }$

### $\displaystyle{ \sigma^2_{y} }$

Let N be the number of reflections in a resolution shell. With $\displaystyle{ \overline{x}_{i}= \sum^n_{j} x_{j,i} }$ , the unbiased sample variance from all averaged intensities of all unique reflections is calculated by:

$\displaystyle{ \sigma^2_{y} = \frac{1}{N-1} \cdot \left (\sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) }$

## Example

An example is shown for a very simplified data file (unmerged ASCII.HKL).

First reflection with 6 observations:
h     k     l       int     σ(int)
2     0     0  9.156E+02  3.686E+00   1
0     2     0  5.584E+02  3.093E+00   1
0     0     2  6.301E+02  2.405E+01   1
2     0     0  9.256E+02  3.686E+00   2
0     2     0  2.584E+02  3.093E+00   2
0     0     2  7.301E+02  2.405E+01   2


$\displaystyle{ \overline{x}_{1} }$ , the average intensity of all observations of this reflection = 669.6999

$\displaystyle{ \sigma^2_{\epsilon 1} }$, the unbiased sample variance of the mean of all observations of this unique reflection = 62544.6597/(n/2) = 20848.2198

Second reflection with 6 observations:
h     k     l       int     σ(int)
1     1     2  2.395E+01  8.932E+01   1
1     2     1  9.065E+01  7.407E+00   1
2     1     1  5.981E+01  9.125E+00   1
1     1     2  3.395E+01  8.932E+01   2
1     2     1  9.065E+01  7.407E+00   2
2     1     1  1.608E+01  2.215E+01   2


$\displaystyle{ \overline{x}_{2} }$ , the average intensity of all observations of this reflection = 52.5150

$\displaystyle{ \sigma^2_{\epsilon 2} }$, the unbiased sample variance of the mean of all observations of this unique reflection = 1089.9803/(n/2) = 363.3267 1089.9803/(n/2)

$\displaystyle{ \sigma^2_{\epsilon} }$ , the average of all the $\displaystyle{ \sigma^2_{\epsilon i} }$ = 10605.7733

$\displaystyle{ \sigma^2_{y} }$, the variance of all the averaged intensities = 190458.6533

As a result of these calculations CC1/2 = (190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733), which results in 0.9458.

The described calculation is implemented in XDSCC12, and CC1/2 and ΔCC1/2 can be calculated for XDS_ASCII.HKL and XSCALE.HKL files.

## Fortran 95 code that assumes that all unique reflections have the same number of observations

sumibar=0
sumibar2=0
sumsig2eps=0
DO i=1,nref
xbar=SUM(iobs(:,i))/nobs
sumibar=sumi+xbar
sumibar2=sumibar2+xbar**2
sumsig2eps=sumeps + (SUM(iobs(:,i)**2)-xbar**2*nobs)/(nobs-1)/(nobs/2)
END DO
sig2y=(sumibar2-sumibar**2/nref)/(nref-1)
sig2eps=sumsig2eps/nref
print *,(sig2y-0.5*sig2eps)/(sig2y+0.5*sig2eps)


## number of reflection pairs

CORRECT.LP and XSCALE.LP do not explicitly state the number of reflection pairs that were used to calculated CC1/2.

However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC1/2 and the R-value calculations.

The number of reflections pairs that were used for the CC1/2 calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were (X-Y+Z) reflection pairs that went into CC1/2.

## why CC1/2 can be negative

There is a mathematical reason, explained in §4.1 of Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.