CORRECT.LP: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
No edit summary
Line 83: Line 83:
== Statistics of observations ==
== Statistics of observations ==


XDS, like e.g. SCALA/aimless and d*TREK, gives statistics about unaveraged and averaged quantities, but in different tables.
XDS, like e.g. SCALA/aimless and d*TREK, gives statistics about unaveraged (individual observations) and averaged ("merged") quantities, but in different tables.
The unaveraged values are in a table that is fine-grained in terms of resolution, at the beginning of CORRECT.LP. The Sigma values in that table are corrected to match the RMS scatter.
The unaveraged values are in a table that is fine-grained in terms of resolution, at the beginning of CORRECT.LP. The Sigma values in that table are corrected to match the RMS scatter.


Line 91: Line 91:
at first the definitions of the quantities in the table are given, and then the table itself is printed.
at first the definitions of the quantities in the table are given, and then the table itself is printed.


Specifically, the heading of the table which talks about the unaveraged data looks like this:
Specifically, the heading of the table which talks about the unaveraged data ("observations") looks like this:


   I/Sigma  = mean intensity/Sigma of a reflection in shell
   I/Sigma  = mean intensity/Sigma of a reflection in shell

Revision as of 12:38, 9 June 2015

Space group determination

The approach to space group determination is well explained in CORRECT.LP :

XDS adopts the following approach.
(1) it looks for possible symmetries of the crystal lattice
(2) it computes a redundancy independent R-factor for all enantiomorphous
    point groups compatible with the observed lattice symmetry.
(3) it selects the group which explains the intensity data at an acceptable,
    redundancy-independent R-factor (Rmeas, Rrim) using a minimum number of
    unique reflections.

This approach does not test for the presence of screw axes. Consequently,
orthorhombic cell axes will be specified in increasing length (following
conventions), despite the possibility that different assignments for the
cell axes could become necessary for space groups P222(1) and P2(1)2(1)2
containing one or two screw axes, respectively.

The user can always override the automatic decisions by specifying the
correct space group number and unit cell constants in XDS.INP and repeating
the CORRECT step of XDS. This provides a simple way to rename orthorhombic
cell constants if screw axes are present.

In addition, the user has the option to specify in XDS.INP
(a) a reference data set or
(b) a reindexing transformation or
(c) the three basis vectors (if known from processing a previous data set
taken at the same crystal orientation in a multi-wavelength experiment).
These features of XDS are useful for resolving the issue of alternative
settings of polar or rhombohedral cells (like P4, P6, R3).

Please note the sentence: This approach does not test for the presence of screw axes. The information about those reflections that may indicate screw axes is actually given in the table "REFLECTIONS OF TYPE H,0,0 0,K,0 0,0,L OR EXPECTED TO BE ABSENT (*)" (near the end of CORRECT.LP) but there is no automatic evaluation of that table that would result in screw axis assignment.

Therefore, the space group determination of XDS only results in evaluation of the possible point groups that are compatible with the lattice symmetry. Space group determination of XDS only suggests one representative for each point group - the other possible space groups belonging to its point group are possibilities as well, but are not listed! For example, if XDS suggest space group 89 (P422), then any other space group of point group PG422, like 90, 91, 92, 93, 94, 95 and 96 is equally possible.

If the user wants more automatic determination, it is suggested to run

echo SETTING SYMMETRY-BASED | pointless XDS_ASCII.HKL

Please note that SETTING SYMMETRY-BASED overrides a pointless default that would lead to ambiguity between space group numbers and space group symbols for space group numbers 5, 17 and 18. The mapping of numbers and names is:

****** LATTICE SYMMETRY IMPLICATED BY SPACE GROUP SYMMETRY ******

BRAVAIS-           POSSIBLE SPACE-GROUPS FOR PROTEIN CRYSTALS
 TYPE                     [SPACE GROUP NUMBER,SYMBOL]
 aP      [1,P1]
 mP      [3,P2] [4,P2(1)]
mC,mI    [5,C2]
 oP      [16,P222] [17,P222(1)] [18,P2(1)2(1)2] [19,P2(1)2(1)2(1)]
 oC      [21,C222] [20,C222(1)]
 oF      [22,F222]
 oI      [23,I222] [24,I2(1)2(1)2(1)]
 tP      [75,P4] [76,P4(1)] [77,P4(2)] [78,P4(3)] [89,P422] [90,P42(1)2]
         [91,P4(1)22] [92,P4(1)2(1)2] [93,P4(2)22] [94,P4(2)2(1)2]
         [95,P4(3)22] [96,P4(3)2(1)2]
 tI      [79,I4] [80,I4(1)] [97,I422] [98,I4(1)22]
 hP      [143,P3] [144,P3(1)] [145,P3(2)] [149,P312] [150,P321] [151,P3(1)12]
         [152,P3(1)21] [153,P3(2)12] [154,P3(2)21] [168,P6] [169,P6(1)]
         [170,P6(5)] [171,P6(2)] [172,P6(4)] [173,P6(3)] [177,P622]
         [178,P6(1)22] [179,P6(5)22] [180,P6(2)22] [181,P6(4)22] [182,P6(3)22]
 hR      [146,R3] [155,R32]
 cP      [195,P23] [198,P2(1)3] [207,P432] [208,P4(2)32] [212,P4(3)32]
         [213,P4(1)32]
 cF      [196,F23] [209,F432] [210,F4(1)32]
 cI      [197,I23] [199,I2(1)3] [211,I432] [214,I4(1)32]


Scaling information

Details about the error model

Statistics of reflections

Near the top of CORRECT.LP we find:

 531781 REFLECTIONS ON FILE "INTEGRATE.HKL"
      0 CORRUPTED REFLECTION RECORDS (IGNORED)
      0 REFLECTIONS INCOMPLETE OR OUTSIDE IMAGE RANGE       1 ...    1799
      0 OVERLOADED REFLECTIONS (IGNORED)
     81 REFLECTIONS OUTSIDE ACCEPTED RESOLUTION RANGES
               OR TOO CLOSE TO ROTATION AXIS (IGNORED)
 531700 REFLECTIONS ACCEPTED

Statistics of observations

XDS, like e.g. SCALA/aimless and d*TREK, gives statistics about unaveraged (individual observations) and averaged ("merged") quantities, but in different tables. The unaveraged values are in a table that is fine-grained in terms of resolution, at the beginning of CORRECT.LP. The Sigma values in that table are corrected to match the RMS scatter.

The table that has information about the averaged data (suitably weighted) is repeated several times. It is less fine-grained in resolution (9 shells, and overall). [if a user wants this table in fine-grained form, s/he can use XSCALE].

The way the tables are printed is the same for both types of tables: at first the definitions of the quantities in the table are given, and then the table itself is printed.

Specifically, the heading of the table which talks about the unaveraged data ("observations") looks like this:

 I/Sigma  = mean intensity/Sigma of a reflection in shell
 Chi2    = goodness of fit between sample variances of
            symmetry-related intensities and their errors
            (Chi2 = 1 for perfect agreement)
 R-FACTOR
 observed = (SUM(ABS(I(h,i)-I(h))))/(SUM(I(h,i)))
 expected = expected R-FACTOR derived from Sigma(I)

  NUMBER  = number of reflections in resolution shell
            used for calculation of R-FACTOR
 ACCEPTED = number of accepted reflections
 REJECTED = number of rejected reflections (MISFITS),
            recognized by comparison with symmetry-related
            reflections.

and then the table itself is:

RESOLUTION RANGE  I/Sigma  Chi2  R-FACTOR  R-FACTOR  NUMBER ACCEPTED REJECTED
                                  observed  expected

  48.268  17.853     9.63   0.97      5.06      6.10     865     868      44
  17.853  13.079    10.02   0.97      5.22      6.14    1301    1305      81
  13.079  10.812     9.83   1.10      5.56      5.94    1374    1388      99
  10.812   9.423     9.88   1.09      5.32      6.03    1820    1825     108
   9.423   8.460     9.56   1.07      6.03      6.21    2087    2101     167

.... (many resolution shells deleted for brevity)

Statistics of unique reflections

Later tables talk about the averaged intensities:


R-FACTOR
observed = (SUM(ABS(I(h,i)-I(h))))/(SUM(I(h,i)))
expected = expected R-FACTOR derived from Sigma(I)

COMPARED = number of reflections used for calculating R-FACTOR
I/SIGMA  = mean of intensity/Sigma(I) of unique reflections
           (after merging symmetry-related observations)
Sigma(I) = standard deviation of reflection intensity I
           estimated from sample statistics

R-meas   = redundancy independent R-factor (intensities)
           Diederichs & Karplus (1997), Nature Struct. Biol. 4, 269-275.

CC(1/2)  = percentage of correlation between intensities from
           random half-datasets. Correlation significant at
           the 0.1% level is marked by an asterisk.
           Karplus & Diederichs (2012), Science 336, 1030-33
Anomal   = percentage of correlation between random half-sets
 Corr      of anomalous intensity differences. Correlation
           significant at the 0.1% level is marked.
SigAno   = mean anomalous difference in units of its estimated
           standard deviation (|F(+)-F(-)|/Sigma). F(+), F(-)
           are structure factor estimates obtained from the
           merged intensity observations in each parity class.
 Nano    = Number of unique reflections used to calculate
           Anomal_Corr & SigAno. At least two observations
           for each (+ and -) parity are required.

and the table itself is

      NOTE:      Friedel pairs are treated as different reflections.

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    5.72       23750    7284      7488       97.3%       6.6%      6.6%    23666   14.59     7.9%    99.3*    33*   1.043    3033
    4.06       41574   12997     13384       97.1%      10.0%      8.3%    41476   11.40    12.1%    98.3*    45*   1.341    5775
    3.32       56679   16961     17336       97.8%      16.8%     15.4%    56494    6.49    20.1%    97.9*    31*   1.079    7697
    2.88       67173   20272     20497       98.9%      38.4%     39.0%    66875    2.91    45.9%    93.1*    19*   0.840    9333
    2.57       79365   23100     23197       99.6%      77.6%     85.3%    79063    1.46    92.1%    75.3*     5    0.701   10761
    2.35       86431   25554     25631       99.7%     128.9%    146.7%    86014    0.86   153.2%    54.7*     3    0.633   11894
    2.18       83863   27529     27946       98.5%     197.0%    230.0%    81669    0.49   237.7%    31.6*    -1    0.575   11422
    2.04       51338   23815     29966       79.5%     286.2%    343.0%    43478    0.26   361.1%    15.1*     0    0.526    5523
    1.92       25803   15877     31898       49.8%     483.3%    577.5%    17026    0.12   635.3%     3.8      2    0.519    1856
   total      515976  173389    197343       87.9%      27.8%     29.3%   495761    2.89    33.5%    98.2*    19*   0.781   67294


NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES  531700
NUMBER OF REJECTED MISFITS                           15698
NUMBER OF SYSTEMATIC ABSENT REFLECTIONS                  0
NUMBER OF ACCEPTED OBSERVATIONS                     516002
NUMBER OF UNIQUE ACCEPTED REFLECTIONS               173398

Why is there a discrepancy between "total 515976 173389" versus "NUMBER OF ACCEPTED OBSERVATIONS 516002", and "NUMBER OF UNIQUE ACCEPTED REFLECTIONS 173398" ?? The reason is that the (higher) numbers below the table include observations (and unique reflections) with I < -3*sigma(I), whereas the numbers in the table refer only to those reflections which should be used downstream (for phasing and refinement). Indeed, XDSCONV filters out those unique reflections which have I<-3*sigma(I).

It should also be noted that the alien rejection controlled by REJECT_ALIEN= (default 20) will be performed after making this table. So the number of reflections which you will get from XDSCONV is not the same as reported here. If you want to see the statistics of reflections which will be converted by XDSCONV (thus will be used for further process), you should prepare REMOVE.HKL to explicitly specify the reflections which will be thrown away and run CORRECT step again.

At the bottom of CORRECT.LP we find:

NUMBER OF UNIQUE ALIEN REFLECTIONS WITH A Z-SCORE ABOVE LIMIT       162
(ALIENS ABOVE LIMIT (REJECT_ALIEN=      20.0) ARE MARKED INVALID)

NUMBER OF REFLECTION RECORDS ON OUTPUT FILE "XDS_ASCII.HKL"      531700
NUMBER OF ACCEPTED OBSERVATIONS (INCLUDING SYSTEMATIC ABSENCES)  515712
NUMBER OF REJECTED MISFITS & ALIENS (marked by -1*SIGMA(IOBS))    15988

The file XDS_ASCII.HKL actually has all 531700 reflections.