2VB1: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
No edit summary
Line 33: Line 33:
=== Optimization ===
=== Optimization ===


The main target of optimization is the asympototic (i.e. best) I/sigma (ISa) (Diederichs (2010) [http://dx.doi.org/10.1107/S0907444910014836 Acta Cryst. D 66, 733-40]) as printed out by CORRECT. A higher ISa means better data. However: ISa also rises if more reflections are thrown out as outliers ("misfits") so it is not considered to be optimization if just WFAC1 is reduced.  
The main target of optimization is the asympototic (i.e. best) I/sigma (ISa) (Diederichs (2010) [http://dx.doi.org/10.1107/S0907444910014836 Acta Cryst. D 66, 733-40]) as printed out by CORRECT (and XSCALE). A higher ISa means better data.  
 
However: ISa also rises if more reflections are thrown out as outliers ("misfits") so it is not considered to be optimization if just WFAC1 is reduced. Please note that the default WFAC1 is 1; this should result in the rejection of about 1% of observations. If you feel that 1% is too much then just increase WFAC1, to, say, 1.5 - that should result in rejection of less than 0.1%. This will slightly increase completeness, but will reduce I/sigma and ISa, and increase R-factors.
 
The following quantities may be tested for their influence on ISa:
The following quantities may be tested for their influence on ISa:
* copying GXPARM.XDS to XPARM.XDS
* copying GXPARM.XDS to XPARM.XDS
Line 45: Line 48:


... and including the changes concerning ORGX= 3130 ORGY= 3040, MAXIMUM_NUMBER_OF_PROCESSORS=2
... and including the changes concerning ORGX= 3130 ORGY= 3040, MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8, and TRUSTED_REGION=0.00 1.5 :
MAXIMUM_NUMBER_OF_JOBS=8, TRUSTED_REGION=0.00 1.5, and ROTATION_AXIS=-1 0 0 :


<pre>
<pre>
Line 100: Line 103:


=== [[CORRECT.LP]] main table; 1st pass ===
=== [[CORRECT.LP]] main table; 1st pass ===


=== [[XDS.INP]]; optimized ===
=== [[XDS.INP]]; optimized ===

Revision as of 13:21, 22 February 2011

This reports processing of triclinic hen egg-white lysozyme data @ 0.65Å resolution (PDB id 2VB1). Data (sweeps a to h, each comprising 60 to 360 frames of 72MB) were collected by Zbigniew Dauter at APS 19-ID and are available from here. Details of data collection, processing and refinement are published.

XDS processing

  1. use generate_XDS.INP to obtain a good starting point
  2. edit XDS.INP and change the following:
ORGX=3130 ORGY=3040  ! for ADSC, header values are subject to interpretation; better inspect the table in IDXREF.LP!
TRUSTED_REGION=0 1.5 ! we want the whole detector area
ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
  1. for faster processing on a machine with many cores, use (e.g. for 16 cores):
MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8

For all the sweeps, processing stopped with an error message after the IDXREF step. By inspecting IDXREF.LP, one should make sure that everything works as it should, i.e. that a large percentage of reflections was actually indexed nicely:

...
  63879 OUT OF   72321 SPOTS INDEXED.
...

***** DIFFRACTION PARAMETERS USED AT START OF INTEGRATION *****

REFINED VALUES OF DIFFRACTION PARAMETERS DERIVED FROM  63879 INDEXED SPOTS
REFINED PARAMETERS:   DISTANCE BEAM AXIS CELL ORIENTATION    
STANDARD DEVIATION OF SPOT    POSITION (PIXELS)     0.53
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES)    0.12

It may be possible to adjust some parameters (for COLSPOT) so that the error message does not occur, but it is not worth the effort. So we just change

JOBS=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT

to

JOBS=DEFPIX INTEGRATE CORRECT

and run "xds_par" again. It completes after about 5 minutes on a fast machine, and we may inspect CORRECT.LP .

Optimization

The main target of optimization is the asympototic (i.e. best) I/sigma (ISa) (Diederichs (2010) Acta Cryst. D 66, 733-40) as printed out by CORRECT (and XSCALE). A higher ISa means better data.

However: ISa also rises if more reflections are thrown out as outliers ("misfits") so it is not considered to be optimization if just WFAC1 is reduced. Please note that the default WFAC1 is 1; this should result in the rejection of about 1% of observations. If you feel that 1% is too much then just increase WFAC1, to, say, 1.5 - that should result in rejection of less than 0.1%. This will slightly increase completeness, but will reduce I/sigma and ISa, and increase R-factors.

The following quantities may be tested for their influence on ISa:

  • copying GXPARM.XDS to XPARM.XDS
  • including the information from the first integration pass into XDS.INP - just do "grep _E INTEGRATE.LP|tail -2" and get e.g.
BEAM_DIVERGENCE=   0.386  BEAM_DIVERGENCE_E.S.D.=   0.039
REFLECTING_RANGE=  0.669  REFLECTING_RANGE_E.S.D.=  0.096

copy these two lines into XDS.INP

Example: sweep e

XDS.INP; as generated by generate_XDS.INP

... and including the changes concerning ORGX= 3130 ORGY= 3040, MAXIMUM_NUMBER_OF_PROCESSORS=2 MAXIMUM_NUMBER_OF_JOBS=8, TRUSTED_REGION=0.00 1.5, and ROTATION_AXIS=-1 0 0 :

JOB= XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT
MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8
ORGX= 3130 ORGY= 3040  ! check these values with adxv !
DETECTOR_DISTANCE= 99.9954
OSCILLATION_RANGE= 0.500
X-RAY_WAVELENGTH=   0.6525486
NAME_TEMPLATE_OF_DATA_FRAMES=../../APS/19-ID/2vb1/p1lyso_e.0???.img
! REFERENCE_DATA_SET=xxx/XDS_ASCII.HKL ! e.g. to ensure consistent indexing  
DATA_RANGE=1 360
SPOT_RANGE=1 180
! BACKGROUND_RANGE=1 10 ! rather use defaults (first 5 degree of rotation)

SPACE_GROUP_NUMBER=0                   ! 0 if unknown
UNIT_CELL_CONSTANTS= 70 80 90 90 90 90 ! put correct values if known
INCLUDE_RESOLUTION_RANGE=50 0  ! after CORRECT, insert high resol limit; re-run CORRECT


FRIEDEL'S_LAW=FALSE     ! This acts only on the CORRECT step
! If the anom signal turns out to be, or is known to be, very low or absent,
! use FRIEDEL'S_LAW=TRUE instead (or comment out the line); re-run CORRECT

! remove the "!" in the following line:
! STRICT_ABSORPTION_CORRECTION=TRUE
! if the anomalous signal is strong: in that case, in CORRECT.LP the three
! "CHI^2-VALUE OF FIT OF CORRECTION FACTORS" values are significantly> 1, e.g. 1.5
!
! exclude (mask) untrusted areas of detector, e.g. beamstop shadow :
! UNTRUSTED_RECTANGLE= 1800 1950 2100 2150 ! x-min x-max y-min y-max ! repeat
! UNTRUSTED_ELLIPSE= 2034 2070 1850 2240 ! x-min x-max y-min y-max ! if needed
!
! parameters with changes wrt default values:
TRUSTED_REGION=0.00 1.5  ! partially use corners of detectors; 1.41421=full use
VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=7000. 30000. ! often 8000 is ok
MINIMUM_ZETA=0.05        ! integrate close to the Lorentz zone; 0.15 is default
STRONG_PIXEL=6           ! COLSPOT: only use strong reflections (default is 3)
MINIMUM_NUMBER_OF_PIXELS_IN_A_SPOT=3 ! default of 6 is sometimes too high
REFINE(INTEGRATE)=CELL BEAM ORIENTATION ! AXIS DISTANCE 

! parameters specifically for this detector and beamline:
DETECTOR= ADSC MINIMUM_VALID_PIXEL_VALUE= 1 OVERLOAD= 65000
NX= 6144 NY= 6144  QX= 0.051294  QY= 0.051294 ! to make CORRECT happy if frames are unavailable
DIRECTION_OF_DETECTOR_X-AXIS=1 0 0
DIRECTION_OF_DETECTOR_Y-AXIS=0 1 0
INCIDENT_BEAM_DIRECTION=0 0 1
ROTATION_AXIS=-1 0 0    ! at e.g. SERCAT ID-22 this needs to be -1 0 0
FRACTION_OF_POLARIZATION=0.98   ! better value is provided by beamline staff!
POLARIZATION_PLANE_NORMAL=0 1 0

CORRECT.LP main table; 1st pass

XDS.INP; optimized

CORRECT.LP main table; optimization pass

XSCALE results

a few sweeps were optimized by copying the two lines containing mosaicity and beam divergence values from INTEGRATE.LP to XDS.INP

main table

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    2.91       15799    2114      2147       98.5%       2.3%      2.5%    15787   73.42     2.6%     1.1%   -15%   0.705    1969
    2.06       39607    3830      3856       99.3%       2.5%      2.8%    39602   81.49     2.6%     0.9%   -11%   0.750    3794
    1.68       64423    5068      5087       99.6%       3.1%      3.3%    64415   82.27     3.3%     1.0%    -3%   0.843    5018
    1.45       72869    6147      6163       99.7%       3.2%      3.5%    72867   77.43     3.4%     1.0%     0%   0.833    6055
    1.30       71079    6652      6657       99.9%       3.3%      3.5%    71079   70.69     3.4%     1.1%     8%   0.865    6506
    1.19       74584    7287      7298       99.8%       3.2%      3.4%    74575   66.78     3.4%     1.2%     5%   0.870    7060
    1.10       84893    8268      8278       99.9%       3.5%      3.7%    84865   62.98     3.6%     1.3%     5%   0.858    7983
    1.03       87893    8585      8603       99.8%       4.2%      4.4%    87859   56.04     4.4%     1.5%     4%   0.828    8238
    0.97       92833    9457      9465       99.9%       5.2%      5.6%    92810   48.70     5.5%     1.7%     6%   0.802    9010
    0.92       83981    9911      9927       99.8%       5.7%      6.3%    83954   41.48     6.0%     2.1%     5%   0.785    9362
    0.88       74101    9620      9621      100.0%       6.3%      7.2%    74083   35.53     6.7%     2.6%     5%   0.785    9041
    0.84       81383   11511     11518       99.9%       6.8%      7.7%    81361   30.26     7.3%     3.3%     1%   0.760   10616
    0.81       67616   10240     10247       99.9%       7.1%      7.8%    67596   25.84     7.7%     4.2%     1%   0.782    9368
    0.78       74077   11807     11817       99.9%       7.2%      7.3%    74049   22.26     7.8%     5.2%     1%   0.797   10697
    0.75       86236   13831     13839       99.9%       8.5%      8.7%    86206   18.77     9.3%     6.7%     2%   0.809   12497
    0.73       64601   10481     10488       99.9%      10.4%     10.5%    64573   15.77    11.3%     8.2%     2%   0.810    9375
    0.71       71886   11727     11741       99.9%      12.8%     13.0%    71835   13.05    14.0%    10.6%     2%   0.800   10420
    0.69       80233   13156     13163       99.9%      16.5%     16.9%    80130   10.32    18.1%    13.7%     1%   0.796   11661
    0.67       84259   14746     14766       99.9%      22.0%     22.5%    84056    7.61    24.1%    19.6%     3%   0.789   12468
    0.65       60775   15579     16551       94.1%      27.5%     30.3%    59893    4.49    31.7%    32.3%     1%   0.723    8936
   total     1433128  190017    191232       99.4%       3.3%      3.5%  1431595   33.18     3.5%     3.5%     2%   0.801  170074


Comparison of data processing: published (2006) vs XDS results

resolution (highest resolution range) observations unique reflections Multiplicity Completeness (%) R merge (%) mean I/sigma
published(2006) 30-0.65Å (0.67-0.65Å) 1331953 (12764) 187165 (6353) 7.1 (2.7) 97.6 (67.3) 4.5 (18.4) 36.2 (4.2)
XDS 30-0.65Å (0.67-0.65Å) 1433128 (60775) 190017 (15579) 7.5 (3.9) 99.4 (94.1) 3.3 (27.5) 33.2 (4.5)

timings for processing sweep "e" as a function of MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS

The following is going to be rather technical! If you are only interested in crystallography, skip this.

Using

MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8

we observe for the INTEGRATE step:

total cpu time used               2063.6 sec
total elapsed wall-clock time      296.1 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=1
MAXIMUM_NUMBER_OF_JOBS=16

the times are

total cpu time used               2077.1 sec
total elapsed wall-clock time      408.2 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=4

the times are

total cpu time used               2102.8 sec
total elapsed wall-clock time      315.6 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=16 ! the default for xds_par on a 16-core machine
MAXIMUM_NUMBER_OF_JOBS=1 ! the default

the times are

total cpu time used               2833.4 sec
total elapsed wall-clock time      566.5 sec

but please note that this actually only uses 10 processors, since the default DELPHI=5 and the OSCILLATION_RANGE is 0.5°.

Using

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=8

(thus overcommitting the available cores by a factor of 2) the times are

total cpu time used               2263.5 sec
total elapsed wall-clock time      320.8 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6

(thus overcommitting the available cores, but less severely) the times are

total cpu time used               2367.6 sec
total elapsed wall-clock time      267.2 sec

Thus,

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6

performs best for a 2-Xeon X5570 (HT enabled, thus 24 cores) machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches" before the XDS run).