Optimisation: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
 
(16 intermediate revisions by 2 users not shown)
Line 3: Line 3:
* read the article [[XDS.INP]]
* read the article [[XDS.INP]]
* for good indexing, follow [[XDS.INP#Keywords which affect whether indexing will succeed]]  
* for good indexing, follow [[XDS.INP#Keywords which affect whether indexing will succeed]]  
* for good completeness read [[MINIMUM_ZETA]]
* mask shaded areas of the detector using the UNTRUSTED_RECTANGLE, UNTRUSTED_ELLIPSE and UNTRUSTED_QUADRILATERAL keywords. This is very easy with the [[XDSGUI]] program.
* at least use XDS-Viewer on FRAME.cbf to check the agreement between predicted and observed spots on last frame of dataset. It would be wise to also use XDS-Viewer on MODPIX.cbf and DECAY.cbf to get an impression about systematic effects in your data.
 
== Further optimization based on [[XDSSTAT]] output ==
 
* inspect the table of R_meas values (the lines ending with 'L' and decide whether you want to remove any specific frames (by appending .bad to the filenames, and re-running INTEGRATE and CORRECT)
* inspect the table of R_d values (the lines ending with 'DIFFERENCE') and find out if you have systematically rising R_d which would be an indication of strong radiation damage. This works best in high-symmetry space groups.
* inspect the table of R_meas ''versus'' PEAK and ln(intensity) and consider adjusting MINPK (the threshold for rejecting overlaps) to a higher value. For better data, you want to raise MINPK to say 85, 90 or even 95, but of course this will reduce the completeness. Find the right compromise between completeness and data quality for your purposes! Experimental phasing relies on high accuracy (in particular of the strong reflections), whereas maps and refinement benefit from good completeness and high resolution.
* make sure that in the same table, the R-factors at high intensity are really low (around 2%). If they are not, then you should reevaluate the spacegroup, or reconsider the value of OVERLOAD=.
* inspect the .pck files written by XDSSTAT, in particular scales.pck, rf.pck, anom.pck and decide if you are happy with e.g. your low-resolution cutoff! It is normal for scales.pck to have alternating white and black at high resolution, and it is also normal for rf.pck to be bright (high R-factors) at high resolution. But you definitively don't want such indications at low resolution.


== Final polishing ==
== Final polishing ==


=== Re-INTEGATEing with the correct spacegroup and refined geometry ===
=== Re-INTEGRATEing with the correct spacegroup, refined geometry and fine-slicing of profiles ===
After running through all steps of XDS (including space group determination), one might want to
After running through all steps of XDS (including space group determination), one might want to
  cp GXPARM.XDS XPARM.XDS
  cp GXPARM.XDS XPARM.XDS
Line 13: Line 22:
  egrep -v 'JOB|REIDX' XDS.INP > XDS.INP.new
  egrep -v 'JOB|REIDX' XDS.INP > XDS.INP.new
  echo "! JOB=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT" > XDS.INP
  echo "! JOB=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT" > XDS.INP
  echo "JOB=INTEGRATE CORRECT" >> XDS.INP
  echo "JOB=DEFPIX INTEGRATE CORRECT" >> XDS.INP
echo NUMBER_OF_PROFILE_GRID_POINTS_ALONG_ALPHA/BETA=13 >> XDS.INP ! default is 9
echo NUMBER_OF_PROFILE_GRID_POINTS_ALONG_GAMMA=13      >> XDS.INP ! default is 9
  cat XDS.INP.new >> XDS.INP
  cat XDS.INP.new >> XDS.INP
and thereby re-run the INTEGRATE and CORRECT steps. This has the advantage that the refined geometry parameters (from CORRECT) are recycled into INTEGRATE, which sometimes leads to better R-factors. It also results in the spacegroup's restraints on the unit cell parameters being used for the prediction of spot positions; these are therefore slightly more accurate.
xds_par
and thereby re-run the INTEGRATE and CORRECT steps. This has the advantage that the refined geometry parameters (from CORRECT) are recycled into INTEGRATE, which sometimes leads to better R-factors. It also results in the spacegroup's restraints on the unit cell parameters being used for the prediction of spot positions; these are therefore slightly more accurate. Fine-slicing of profiles has been found to be advantageous at least for Pilatus detectors ([http://dx.doi.org/10.1107/S0907444911049833 Müller, Wang and Schulze-Briese (2012), Acta Cryst D68, 42]), but this should not be specific for Pilatus.
 
You may also want to change the INCLUDE_RESOLUTION_RANGE= line in XDS.INP, in particular to adapt the upper resolution parameter. A very good rule is to set this to the resolution value of the highest shell that still has a "*" appended to the CC1/2 value in [[CORRECT.LP]] .
 
=== using the refined values for beam divergence and mosaicity for re-integration ===
If the beam divergence and mosaicity is fairly constant throughout the experiment, or there are few reflections on each frame to estimate these parameters from, then it might be beneficial to use the average values for beam divergence and mosaicity for the second integration. The values from the first integration can be found in INTEGRATE.LP, and the relevant two lines e.g,
BEAM_DIVERGENCE=  2.067  BEAM_DIVERGENCE_E.S.D.=  0.207
REFLECTING_RANGE=  2.303  REFLECTING_RANGE_E.S.D.=  0.329
should be pasted into XDS.INP, and the INTEGRATE CORRECT jobs re-run (see [[Difficult datasets]]). This can be combined with the above optimization.


=== Wilson outliers (aliens) ===
=== Wilson outliers (aliens) ===
* Look through the list of reflections labeled as "aliens" at the bottom of [[CORRECT.LP]]. Decide whether they follow a slowly decaying non-Wilson distribution (resulting in many reflections with Z > 8 instead of almost none in the case of a Wilson distribution), or whether the top ones are true outliers. The latter arise most often from ice reflections (these may even be there when no ice rings are visible). Outliers should be put (i.e. copied) into REMOVE.HKL, and [[CORRECT]] then should be re-run.<br /> My personal rule of thumb is that when the integer parts of Z ("int(Z)") are the numbers 8, 9, ... n, but there are no aliens (or just a single one) with int(Z) = n+1, then I consider all aliens with Z > n+1 as outliers. <br /> A different rule of thumb would be to simply consider aliens with Z of 20 or more as outliers (see [[Wishlist]]). This may be accomplished by
* Look through the list of reflections labeled as "aliens" at the bottom of [[CORRECT.LP]]. Decide whether they follow a slowly decaying non-Wilson distribution (resulting in many reflections with Z > 10 instead of almost none in the case of a Wilson distribution), or whether the top ones are true outliers. The latter arise most often from ice reflections (these may even be there when no ice rings are visible).  
awk '/alien/ { if (strtonum($5) > 19) print $0 }' CORRECT.LP >> REMOVE.HKL
* Only if you have a good reason to consider a reflection as an outlier should that reflection actually be discarded. It is not good practice to mechanically discard any reflection with Z>10.
It is useful to inspect the list of aliens after re-running CORRECT; maybe a few more of those should be put into REMOVE.HKL. But this process of rejecting Wilson outliers usually converges very quickly.
* True outliers should be put (i.e. copied) into REMOVE.HKL, and [[CORRECT]] then should be re-run.<br /> My personal rule of thumb is that when the integer parts of Z ("int(Z)") are the numbers 8, 9, ... n, but there are no aliens (or just a single one) with int(Z) = n+1, then I consider all aliens with Z > n+1 as outliers. <br /> A different rule of thumb would be to simply consider aliens with Z of 20 or more as outliers - this is the default since January 2010 (the cutoff may be modified with the REJECT_ALIEN keyword).  
* Another way to judge Wilson outliers is to identify resolution ranges that deviate from 1. in the table '''HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF ACENTRIC DATA''' in [[CORRECT.LP]]. "Aliens" that are put into REMOVE.HKL will lower the values in these resolution ranges!
* Another way to judge Wilson outliers is to identify resolution ranges that deviate from 1. in the table '''HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF ACENTRIC DATA''' in [[CORRECT.LP]]. "Aliens" that are put into REMOVE.HKL will lower the values in these resolution ranges!
* SCALEPACK users: don't confuse this process of rejecting Wilson outliers with the iterative procedure of rejecting scaling outliers that is usually done when using SCALEPACK. Scaling outliers are handled non-iteratively in [[XDS]]; the only way to influence [[XDS]] in this respect is by modifying [[WFAC1]].
* SCALEPACK users: don't confuse this process of rejecting Wilson outliers with the iterative procedure of rejecting scaling outliers that is usually done when using SCALEPACK. Scaling outliers are handled automatically in [[XDS]] (and [[XSCALE]]); the only way to influence [[XDS]] in this respect is by modifying [[FAQ#reducing_WFAC1_below_its_default_of_1_improves_my_data.2C_right.3F|WFAC1]].
* if CORRECT rejects many "aliens" in a very weak high resolution shell because they have Z>20 then this is due to the fact that the reflections do not obey Wilson statistics. If this happens, the REJECT_ALIEN parameter should be set much higher (e.g. 100).


=== Optimizing the anomalous signal ===
=== Optimizing the anomalous signal ===
Re-run CORRECT with FRIEDEL'S_LAW=TRUE, and read [[Tips_and_Tricks#SAD.2FMAD_data_reduction]] concerning STRICT_ABSORPTION_CORRECTION.
It may be helpful to increase WFAC1 from its default 1.0 to 1.5, to avoid rejection of the strongest Bijvoet pairs (this is not necessary when STRICT_ABSORPTION_CORRECTION=TRUE which is ''not'' the default).
Read [[Tips_and_Tricks#SAD.2FMAD_data_reduction]] concerning STRICT_ABSORPTION_CORRECTION.

Latest revision as of 18:14, 29 June 2020

General guidelines for obtaining a good result from XDS

  • read the article XDS.INP
  • for good indexing, follow XDS.INP#Keywords which affect whether indexing will succeed
  • mask shaded areas of the detector using the UNTRUSTED_RECTANGLE, UNTRUSTED_ELLIPSE and UNTRUSTED_QUADRILATERAL keywords. This is very easy with the XDSGUI program.
  • at least use XDS-Viewer on FRAME.cbf to check the agreement between predicted and observed spots on last frame of dataset. It would be wise to also use XDS-Viewer on MODPIX.cbf and DECAY.cbf to get an impression about systematic effects in your data.

Further optimization based on XDSSTAT output

  • inspect the table of R_meas values (the lines ending with 'L' and decide whether you want to remove any specific frames (by appending .bad to the filenames, and re-running INTEGRATE and CORRECT)
  • inspect the table of R_d values (the lines ending with 'DIFFERENCE') and find out if you have systematically rising R_d which would be an indication of strong radiation damage. This works best in high-symmetry space groups.
  • inspect the table of R_meas versus PEAK and ln(intensity) and consider adjusting MINPK (the threshold for rejecting overlaps) to a higher value. For better data, you want to raise MINPK to say 85, 90 or even 95, but of course this will reduce the completeness. Find the right compromise between completeness and data quality for your purposes! Experimental phasing relies on high accuracy (in particular of the strong reflections), whereas maps and refinement benefit from good completeness and high resolution.
  • make sure that in the same table, the R-factors at high intensity are really low (around 2%). If they are not, then you should reevaluate the spacegroup, or reconsider the value of OVERLOAD=.
  • inspect the .pck files written by XDSSTAT, in particular scales.pck, rf.pck, anom.pck and decide if you are happy with e.g. your low-resolution cutoff! It is normal for scales.pck to have alternating white and black at high resolution, and it is also normal for rf.pck to be bright (high R-factors) at high resolution. But you definitively don't want such indications at low resolution.

Final polishing

Re-INTEGRATEing with the correct spacegroup, refined geometry and fine-slicing of profiles

After running through all steps of XDS (including space group determination), one might want to

cp GXPARM.XDS XPARM.XDS
mv CORRECT.LP CORRECT.LP.old
egrep -v 'JOB|REIDX' XDS.INP > XDS.INP.new
echo "! JOB=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT" > XDS.INP
echo "JOB=DEFPIX INTEGRATE CORRECT" >> XDS.INP
echo NUMBER_OF_PROFILE_GRID_POINTS_ALONG_ALPHA/BETA=13 >> XDS.INP ! default is 9
echo NUMBER_OF_PROFILE_GRID_POINTS_ALONG_GAMMA=13      >> XDS.INP ! default is 9
cat XDS.INP.new >> XDS.INP
xds_par

and thereby re-run the INTEGRATE and CORRECT steps. This has the advantage that the refined geometry parameters (from CORRECT) are recycled into INTEGRATE, which sometimes leads to better R-factors. It also results in the spacegroup's restraints on the unit cell parameters being used for the prediction of spot positions; these are therefore slightly more accurate. Fine-slicing of profiles has been found to be advantageous at least for Pilatus detectors (Müller, Wang and Schulze-Briese (2012), Acta Cryst D68, 42), but this should not be specific for Pilatus.

You may also want to change the INCLUDE_RESOLUTION_RANGE= line in XDS.INP, in particular to adapt the upper resolution parameter. A very good rule is to set this to the resolution value of the highest shell that still has a "*" appended to the CC1/2 value in CORRECT.LP .

using the refined values for beam divergence and mosaicity for re-integration

If the beam divergence and mosaicity is fairly constant throughout the experiment, or there are few reflections on each frame to estimate these parameters from, then it might be beneficial to use the average values for beam divergence and mosaicity for the second integration. The values from the first integration can be found in INTEGRATE.LP, and the relevant two lines e.g,

BEAM_DIVERGENCE=   2.067  BEAM_DIVERGENCE_E.S.D.=   0.207
REFLECTING_RANGE=  2.303  REFLECTING_RANGE_E.S.D.=  0.329

should be pasted into XDS.INP, and the INTEGRATE CORRECT jobs re-run (see Difficult datasets). This can be combined with the above optimization.

Wilson outliers (aliens)

  • Look through the list of reflections labeled as "aliens" at the bottom of CORRECT.LP. Decide whether they follow a slowly decaying non-Wilson distribution (resulting in many reflections with Z > 10 instead of almost none in the case of a Wilson distribution), or whether the top ones are true outliers. The latter arise most often from ice reflections (these may even be there when no ice rings are visible).
  • Only if you have a good reason to consider a reflection as an outlier should that reflection actually be discarded. It is not good practice to mechanically discard any reflection with Z>10.
  • True outliers should be put (i.e. copied) into REMOVE.HKL, and CORRECT then should be re-run.
    My personal rule of thumb is that when the integer parts of Z ("int(Z)") are the numbers 8, 9, ... n, but there are no aliens (or just a single one) with int(Z) = n+1, then I consider all aliens with Z > n+1 as outliers.
    A different rule of thumb would be to simply consider aliens with Z of 20 or more as outliers - this is the default since January 2010 (the cutoff may be modified with the REJECT_ALIEN keyword).
  • Another way to judge Wilson outliers is to identify resolution ranges that deviate from 1. in the table HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF ACENTRIC DATA in CORRECT.LP. "Aliens" that are put into REMOVE.HKL will lower the values in these resolution ranges!
  • SCALEPACK users: don't confuse this process of rejecting Wilson outliers with the iterative procedure of rejecting scaling outliers that is usually done when using SCALEPACK. Scaling outliers are handled automatically in XDS (and XSCALE); the only way to influence XDS in this respect is by modifying WFAC1.
  • if CORRECT rejects many "aliens" in a very weak high resolution shell because they have Z>20 then this is due to the fact that the reflections do not obey Wilson statistics. If this happens, the REJECT_ALIEN parameter should be set much higher (e.g. 100).

Optimizing the anomalous signal

It may be helpful to increase WFAC1 from its default 1.0 to 1.5, to avoid rejection of the strongest Bijvoet pairs (this is not necessary when STRICT_ABSORPTION_CORRECTION=TRUE which is not the default). Read Tips_and_Tricks#SAD.2FMAD_data_reduction concerning STRICT_ABSORPTION_CORRECTION.