SSX

Revision as of 11:50, 5 August 2019 by Kay (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This article deals with how to process serial synchrotron crystallography (SSX) data.

The particular data we are processing are artificial and were prepared by James Holton. The files Illuin_microfocus_minimal_00[1-3].tar.bz2 can be downloaded and the data and problem are described on his microfocus challenge page, and in a paper.

The challenges are

  1. partial data sets: each of the 100 data sets has only 3 good frames of 1° oscillation; later frames have strong radiation damage
  2. the crystals decay to about 1/2 within these 3 frames
  3. the b and c axes are the same length, but the simulated crystals are orthorhombic. This makes it difficult to index them consistently - it is wrong to just merge them in a orthorhombic space group without resolving the indexing ambiguity, because that yields a pseudo-tetragonal twinned merged data set.

Round 1: processing the data, and determining the space group

In order to be able to merge the data in XSCALE, we must ensure that they are all processed in the same space group, with similar cell parameters. Some exploratory processing (not shown) and averaging of cell parameters reveals that IDXREF finds a primitive lattice with one axis of 38.3 Å, and two with 79.1 Å; angles are 90°. The data go to 1.8 Å; beyond that, the intensities suddenly drop to 0 - presumably because James Holton simulated them only that far. Using the following as the processing script integrate.rc:

#!/bin/bash -f
for f in `seq 1 100`;
do
 export OUT=wedge0`printf "%03d" $f`
 export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
 rm -rf $OUT
 mkdir $OUT
 cd $OUT
 generate_XDS.INP $NAMES
 sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
 sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=1/" XDS.INP
 sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
 sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
 sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
 /usr/local/bin/xds_par
 cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 1
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=1 CORRECTIONS=ALL"}' >> XSCALE.INP

we obtain in P1

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.03        3014     908       958       94.8%      44.5%     42.0%     2896    2.55     52.1%    65.0*     3    0.983     231
     5.68        5502    1679      1788       93.9%      46.8%     42.5%     5239    2.50     54.8%    50.3*     6    1.001     390
     4.64        6996    2164      2292       94.4%      47.5%     42.3%     6656    2.48     55.9%    68.4*     5    1.080     495
     4.01        8079    2580      2735       94.3%      48.7%     42.5%     7591    2.38     57.3%    50.0*     2    1.106     557
     3.59        9167    2904      3099       93.7%      52.1%     42.7%     8694    2.36     61.7%    43.6*    -6    1.017     599
     3.28       10276    3226      3397       95.0%      53.3%     43.3%     9728    2.35     62.8%    36.0*     1    1.104     708
     3.03       11040    3472      3687       94.2%      54.5%     44.3%    10500    2.17     64.2%    44.4*     2    1.044     728
     2.84       12022    3771      3977       94.8%      55.9%     47.2%    11424    1.97     65.8%    36.2*     3    0.999     835
     2.68       12705    3985      4227       94.3%      58.5%     51.0%    12065    1.78     68.8%    37.8*    -3    0.934     898
     2.54       13370    4252      4489       94.7%      59.5%     56.2%    12670    1.61     70.5%    30.1*     4    0.887     869
     2.42       14299    4505      4744       95.0%      62.4%     63.6%    13594    1.46     73.7%    30.2*    -2    0.824     979
     2.32       14835    4647      4915       94.5%      63.8%     70.0%    14083    1.35     75.1%    29.9*    -2    0.765    1041
     2.23       15599    4917      5181       94.9%      65.7%     72.6%    14809    1.31     77.5%    27.6*    -1    0.756    1075
     2.15       15888    4965      5272       94.2%      65.1%     78.6%    15117    1.28     76.9%    26.8*    -2    0.708    1115
     2.07       16872    5324      5601       95.1%      69.1%     88.1%    16035    1.14     81.6%    22.2*     3    0.687    1119
     2.01       16856    5349      5649       94.7%      73.4%     92.5%    15988    1.06     86.5%    19.7*    -3    0.673    1144
     1.95       17842    5666      5976       94.8%      76.7%    105.9%    16959    0.97     90.8%    20.7*    -8    0.606    1189
     1.89       18102    5767      6069       95.0%      84.4%    127.9%    17152    0.85     99.9%    15.1*    -1    0.590    1183
     1.84       18633    5933      6256       94.8%      92.8%    162.0%    17667    0.72    109.8%    17.6*     0    0.533    1236
     1.80       15519    5405      6479       83.4%     103.0%    194.1%    14280    0.58    122.7%    18.2*     1    0.503     940
    total      256616   81419     86791       93.8%      54.3%     51.3%   243147    1.43     64.0%    64.6*     0    0.788   17331

and feed this to pointless:

pointless xdsin temp.ahkl

which tells us

Scores for each symmetry element

Nelmt  Lklhd  Z-cc    CC        N  Rmeas    Symmetry & operator (in Lattice Cell)

  1   0.854   5.41   0.54     801  0.706     identity
  2   0.842   4.62   0.46     785  0.819 **  2-fold l ( 0 0 1) {-h,-k,l}
  3   0.867   5.13   0.51     746  0.912 **  2-fold k ( 0 1 0) {-h,k,-l}
  4   0.837   5.64   0.56     735  0.807 **  2-fold h ( 1 0 0) {h,-k,-l}
  5   0.869   4.96   0.50     742  0.757 **  2-fold   ( 1-1 0) {-k,-h,-l}
  6   0.846   5.52   0.55     719  0.789 **  2-fold   ( 1 1 0) {k,h,-l}
  7   0.852   5.44   0.54    1325  1.146 **  4-fold l ( 0 0 1) {-k,h,l}{k,-h,l}
...
...
Best Solution:    space group P 42 21 2

   Reindex operator:                   [k,l,h]                 
   Laue group probability:             0.989
   Systematic absence probability:     0.915
   Total probability:                  0.905
   Space group confidence:             0.874
   Laue group confidence               0.986

   Unit cell:   79.10  79.10  38.30     90.00  90.00  90.00

   79.10 to  13.70   - Resolution range used for Laue group search

   79.10 to   1.80   - Resolution range in file, used for systematic absence check

   Number of batches in file:      3

The data do not appear to be twinned, from the L-test

$$ <!--SUMMARY_END-->


HKLIN spacegroup: P 1  primitive triclinic

$TEXT:Warning:$$ $$

The input crystal system is primitive triclinic
 (Cell:   38.30  79.10  79.10     90.00  90.00  90.00)
The crystal system chosen for output is primitive tetragonal
 (Cell:   79.10  79.10  38.30     90.00  90.00  90.00)

Based on the P4(2)2(1)2 suggestion, we may try to modify the header of XSCALE.INP to

SPACE_GROUP_NUMBER= 94
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
REIDX=0 1 0 0   0 0 1 0  1 0 0 0

where the last line takes care of the shuffling of axes into the order k,l,h, (after all, the XDS_ASCII.HKL are in P1 with a,b,c of 38.3,79.1,79.1) , and obtain

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.03        2978     167       167      100.0%      53.6%     45.8%     2978    5.94     55.1%    99.2*    22    1.190      76
     5.68        5488     274       274      100.0%      54.0%     46.1%     5488    6.12     55.4%    97.0*    20    0.915     175
     4.64        6976     338       338      100.0%      55.4%     46.1%     6976    6.25     57.0%    99.1*    15    0.983     237
     4.01        8069     390       390      100.0%      57.5%     46.3%     8069    6.01     59.0%    93.7*     8    0.991     294
     3.59        9191     440       440      100.0%      63.9%     46.7%     9191    5.80     65.5%    89.2*     3    1.071     338
     3.28       10239     474       474      100.0%      63.8%     47.0%    10239    5.85     65.4%    89.4*     4    1.119     375
     3.03       11037     511       511      100.0%      66.0%     47.5%    11037    5.33     67.6%    91.7*     3    1.068     412
     2.84       12014     547       547      100.0%      69.6%     49.1%    12014    4.80     71.2%    82.2*    -1    1.092     447
     2.68       12698     580       580      100.0%      72.2%     51.0%    12698    4.34     73.9%    83.8*    -7    0.969     478
     2.54       13360     612       612      100.0%      73.5%     54.1%    13360    3.98     75.3%    73.4*     4    1.025     511
     2.42       14299     642       642      100.0%      76.8%     58.2%    14299    3.59     78.6%    57.0*     6    1.016     545
     2.32       14827     667       667      100.0%      77.8%     62.3%    14827    3.38     79.6%    70.3*     1    0.924     563
     2.23       15588     698       698      100.0%      79.5%     64.6%    15588    3.22     81.3%    64.9*    -1    0.914     597
     2.15       15888     705       705      100.0%      79.3%     68.0%    15888    3.23     81.1%    52.5*    -5    0.882     614
     2.07       16867     754       754      100.0%      82.7%     74.7%    16867    2.92     84.6%    50.1*     3    0.920     647
     2.01       16847     754       754      100.0%      86.1%     77.3%    16847    2.73     88.1%    47.6*    -3    0.839     658
     1.95       17842     799       799      100.0%      90.4%     86.7%    17842    2.47     92.4%    49.3*     1    0.822     696
     1.89       18095     810       811       99.9%      96.8%    101.2%    18095    2.21     99.1%    44.6*    -4    0.773     707
     1.84       18633     829       829      100.0%     106.4%    126.3%    18633    1.90    108.9%    39.6*    -6    0.730     736
     1.80       15510     824       863       95.5%     118.1%    151.4%    15500    1.46    121.2%    32.3*     2    0.688     699
    total      256446   11815     11855       99.7%      64.9%     51.6%   256436    3.61     66.5%    97.9*     1    0.910    9805

Analysis with

xscale_isocluster -dim 2 -clu 2 temp.ahkl

yields a iso.pdb which is not at all a single cluster; it is a severely elongated single cloud:

 

(If the space group were correct, the result of xscale_isocluster should look similar to this:

 

which is from a lysozyme SSX data collection performed at the SLS; outliers are labelled. In this case, the data are truely tetragonal.)

We must now investigate whether the data have lower than tetragonal symmetry. XSCALEing with

SPACE_GROUP_NUMBER=16
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90

gives a new temp.ahkl, with orthorhombic symmetry.

xscale_isocluster -dim 2 -clu 2 temp.ahkl

gives

 psi=  0.1692468      nhalo=           0
cluster:  1 center:     2 elements:    51 core:    51 halo:     0
cluster:  2 center:     6 elements:    49 core:    49 halo:     0

and prepares XSCALE.1.INP (and XSCALE.2.INP) for further use (these two files collect the differently, but internally-consistently indexed XDS_ASCII.HKL files).

coot iso.pdb 

shows

 

and thus reveals two well separated clouds, corresponding to the two possible indexing modes of the data in an orthorhombic space group.

Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and FRIEDEL'S_LAW=TRUE, we get

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.03        1493     297       306       97.1%      11.8%     23.7%     1467    6.04     13.0%    98.2*    52*   0.662     123
     5.68        2829     514       521       98.7%      18.9%     24.2%     2796    5.98     20.9%    96.1*    26*   0.778     258
     4.64        3576     638       646       98.8%      23.3%     24.2%     3554    6.07     25.7%    93.3*    12    0.829     346
     4.01        4140     748       756       98.9%      28.2%     24.5%     4105    5.84     31.0%    89.4*    -5    0.818     418
     3.59        4735     838       852       98.4%      30.9%     25.0%     4709    5.72     33.9%    86.7*     5    0.983     470
     3.28        5268     912       921       99.0%      34.7%     25.8%     5228    5.52     38.0%    85.9*     0    1.005     533
     3.03        5664     982       994       98.8%      37.8%     27.4%     5634    4.90     41.4%    82.1*     4    1.031     563
     2.84        6114    1065      1068       99.7%      40.4%     31.7%     6082    4.13     44.4%    82.5*     5    0.963     613
     2.68        6486    1127      1133       99.5%      44.5%     37.2%     6450    3.54     48.9%    74.8*     1    0.824     644
     2.54        6819    1188      1197       99.2%      48.2%     44.6%     6784    3.01     53.0%    70.4*     1    0.816     709
     2.42        7278    1249      1259       99.2%      51.9%     54.7%     7249    2.56     56.9%    70.6*     4    0.751     756
     2.32        7595    1297      1304       99.5%      55.9%     63.4%     7555    2.26     61.5%    58.5*     4    0.729     809
     2.23        7943    1361      1371       99.3%      57.8%     66.4%     7903    2.16     63.3%    63.5*    -3    0.687     844
     2.15        8093    1375      1385       99.3%      60.1%     75.4%     8054    2.03     65.9%    66.7*     3    0.664     860
     2.07        8561    1476      1482       99.6%      64.8%     88.3%     8512    1.76     71.1%    53.0*     7    0.640     914
     2.01        8613    1473      1482       99.4%      68.3%     95.8%     8570    1.60     74.9%    60.6*    -1    0.628     928
     1.95        9048    1566      1571       99.7%      73.1%    112.2%     9004    1.41     80.2%    56.7*    -3    0.571     966
     1.89        9236    1580      1593       99.2%      82.6%    142.1%     9204    1.19     90.8%    56.3*    -5    0.504    1000
     1.84        9467    1618      1631       99.2%      92.8%    180.0%     9432    0.96    101.9%    43.2*     4    0.467    1007
     1.80        7927    1570      1701       92.3%     104.8%    225.2%     7811    0.70    116.1%    42.6*    -5    0.425     785
    total      130885   22874     23173       98.7%      38.3%     41.0%   130103    2.77     42.1%    92.0*     3    0.703   13546

At this point, we run

xdscc12 -w XSCALE.1.HKL | grep ^a | sort -nk6

and find that data sets 1 and 17 are wrongly included in the cloud of 51 data sets. Thus they are removed manually from XSCALE.INP.

After xscale_isocluster -dim 2 -clu 1 ,

coot iso.pdb

now reveals a single cloud:

 

We then re-run XSCALE with MERGE=TRUE. The resulting reflection output file XSCALE.1.HKL is then used as REFERENCE_DATA_SET for a second round of integration with XDS.

pointless xdsin XSCALE.1.HKL

gives

   Spacegroup         TotProb SysAbsProb     Reindex         Conditions

    P 21 21 21 ( 19)    0.896  0.924                         h00: h=2n, 0k0: k=2n, 00l: l=2n (zones 1,2,3)
    ..........
     P 2 21 21 ( 18)    0.044  0.045                         0k0: k=2n, 00l: l=2n (zones 2,3)
    ..........
     P 21 21 2 ( 18)    0.015  0.015                         h00: h=2n, 0k0: k=2n (zones 1,2)
    ..........
     P 21 2 21 ( 18)    0.014  0.014                         h00: h=2n, 00l: l=2n (zones 1,3)


---------------------------------------------------------------


Space group confidence (= Sqrt(Score * (Score - NextBestScore))) =     0.87

Laue group confidence  (= Sqrt(Score * (Score - NextBestScore))) =     0.97

Selecting space group P 21 21 21 as there is a single space group with the highest score

<!--SUMMARY_BEGIN--> $TEXT:Result: $$ $$
Best Solution:    space group P 21 21 21

   Reindex operator:                   [h,k,l]                 
   Laue group probability:             0.970
   Systematic absence probability:     0.924
   Total probability:                  0.896
   Space group confidence:             0.874
   Laue group confidence               0.966

   Unit cell:   38.30  79.10  79.10     90.00  90.00  90.00

   79.10 to   2.47   - Resolution range used for Laue group search

   79.10 to   1.80   - Resolution range in file, used for systematic absence check

thus we now know the spacegroup.

Round 2: using the REFERENCE_DATA_SET obtained from one cluster

The processing script integrate.rc is changed a bit, to a) use the REFERENCE_DATA_SET, b) prevent adjustment of variances by CORRECT (this should rather be done by XSCALE) , c) allow some radiation damage correction in XSCALE:

#!/bin/bash -f
for f in `seq 1 100`;
do
 export OUT=wedge0`printf "%03d" $f`
 export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
 rm -rf $OUT
 mkdir $OUT
 cd $OUT
 generate_XDS.INP $NAMES
 echo REFERENCE_DATA_SET=../reference.hkl >> XDS.INP
 echo MINIMUM_I/SIGMA=50 >>XDS.INP
 sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
 sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=19/" XDS.INP
 sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
 sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
 sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
 /usr/local/bin/xds_par
 cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 19
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=3 CORRECTIONS=ALL"}' >> XSCALE.INP

and we get as XSCALE.LP :

       NOTE:      Friedel pairs are treated as different reflections.

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.04        2960     473       476       99.4%       6.2%      5.5%     2955   29.90      6.7%    99.8*    86*   2.824     166
     5.68        5486     890       894       99.6%       4.9%      5.9%     5478   27.38      5.3%    99.7*    86*   2.384     363
     4.64        6934    1136      1138       99.8%       4.9%      5.8%     6918   27.64      5.4%    99.8*    76*   1.829     480
     4.02        8066    1363      1367       99.7%       5.3%      5.9%     8045   26.67      5.9%    99.6*    57*   1.426     590
     3.59        9121    1535      1539       99.7%       6.1%      6.3%     9092   25.58      6.7%    99.6*    50*   1.298     666
     3.28       10222    1690      1694       99.8%       6.8%      6.8%    10203   24.69      7.5%    99.4*    36*   1.204     751
     3.04       10990    1831      1834       99.8%       8.5%      8.0%    10970   21.40      9.3%    99.3*    22*   1.086     827
     2.84       12065    1993      1999       99.7%      11.2%     11.1%    12038   17.68     12.2%    99.0*    24*   1.085     894
     2.68       12771    2120      2124       99.8%      14.7%     15.1%    12738   14.78     16.1%    98.4*    14*   0.960     952
     2.54       13054    2196      2198       99.9%      18.9%     20.2%    13026   12.53     20.8%    97.7*    13*   0.867     995
     2.42       14290    2372      2375       99.9%      24.9%     27.1%    14261   10.34     27.3%    96.1*     6    0.813    1083
     2.32       14704    2432      2438       99.8%      29.8%     32.5%    14676    9.21     32.6%    95.1*     8    0.843    1115
     2.23       15623    2582      2593       99.6%      33.0%     35.0%    15587    8.83     36.1%    93.0*     6    0.831    1180
     2.15       15732    2610      2613       99.9%      37.1%     39.2%    15697    8.10     40.6%    91.0*     8    0.818    1203
     2.08       16782    2788      2795       99.7%      44.1%     47.0%    16741    7.01     48.3%    88.3*     4    0.797    1276
     2.01       16783    2802      2809       99.8%      46.8%     48.7%    16747    6.54     51.2%    89.5*     3    0.807    1293
     1.95       18262    3043      3051       99.7%      56.5%     58.0%    18221    5.61     61.9%    85.9*     0    0.803    1402
     1.89       17810    2979      2988       99.7%      68.3%     69.8%    17769    4.63     74.8%    80.0*     7    0.864    1374
     1.84       18503    3112      3117       99.8%      87.5%     90.3%    18454    3.55     96.0%    69.6*     3    0.838    1435
     1.80       16130    2988      3185       93.8%     101.2%    110.5%    15959    2.77    111.7%    62.9*     2    0.798    1276
    total      256288   42935     43227       99.3%      13.4%     14.0%   255575   11.63     14.6%    99.6*    21*   0.975   19321

The substructure (locating 4 Se with anom data to 3Å) and structure (198 residues) can now easily be solved with hkl2map:

Result

SHELXC: anomalous CC1/2

 

SHELXD: CCall versus CCweak, and histogram

 

 

SHELXE: contrast versus cycle, and PDB with structure

 

 

Further optimization of processing may be possible, but is left as an exercise to the reader.