# Optimisation

## General guidelines for obtaining a good result from XDS

• for good indexing, follow XDS.INP#Keywords which affect whether indexing will succeed
• for good completeness read MINIMUM_ZETA
• at least VIEW FRAME.pck to check the agreement between predicted and observed spots on last frame of dataset. It would be wise to also VIEW MODPIX.pck, VIEW DECAY.pck to get an impression about systematic effects in your data. When doing this, take the scale in the right bar into account!

## Further optimization based on XDSSTAT output

• inspect the table of R_meas values (the lines ending with 'L' and decide whether you want to remove any specific frames (by appending .bad to the filenames, and re-running INTEGRATE and CORRECT)
• inspect the table of R_d values (the lines ending with 'DIFFERENCE') and find out if you have systematically rising R_d which would be an indication of strong radiation damage. This works best in high-symmetry space groups.
• inspect the table of R_meas versus PEAK and ln(intensity) and consider adjusting MINPK (the threshold for rejecting overlaps) to a higher value. For better data, you want to raise MINPK to say 85, 90 or even 95, but of course this will reduce the completeness. Find the right compromise between completeness and data quality for your purposes! Experimental phasing relies on high accuracy (in particular of the strong reflections), whereas maps and refinement benefit from good completeness and high resolution.
• make sure that in the same table, the R-factors at high intensity are really low (around 2%). If they are not, then you should reevaluate the spacegroup, or reconsider the value of OVERLOAD=.
• inspect the .pck files written by XDSSTAT, in particular scales.pck, rf.pck, anom.pck and decide if you are happy with e.g. your low-resolution cutoff! It is normal for scales.pck to have alternating white and black at high resolution, and it is also normal for rf.pck to be bright (high R-factors) at high resolution. But you definitively don't want such indications at low resolution.

## Final polishing

### Re-INTEGRATEing with the correct spacegroup and refined geometry

After running through all steps of XDS (including space group determination), one might want to

cp GXPARM.XDS XPARM.XDS
mv CORRECT.LP CORRECT.LP.old
egrep -v 'JOB|REIDX' XDS.INP > XDS.INP.new
echo "! JOB=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT" > XDS.INP
echo "JOB=INTEGRATE CORRECT" >> XDS.INP
cat XDS.INP.new >> XDS.INP
xds_par


and thereby re-run the INTEGRATE and CORRECT steps. This has the advantage that the refined geometry parameters (from CORRECT) are recycled into INTEGRATE, which sometimes leads to better R-factors. It also results in the spacegroup's restraints on the unit cell parameters being used for the prediction of spot positions; these are therefore slightly more accurate.

### Wilson outliers (aliens)

• Look through the list of reflections labeled as "aliens" at the bottom of CORRECT.LP. Decide whether they follow a slowly decaying non-Wilson distribution (resulting in many reflections with Z > 8 instead of almost none in the case of a Wilson distribution), or whether the top ones are true outliers. The latter arise most often from ice reflections (these may even be there when no ice rings are visible). Outliers should be put (i.e. copied) into REMOVE.HKL, and CORRECT then should be re-run.
My personal rule of thumb is that when the integer parts of Z ("int(Z)") are the numbers 8, 9, ... n, but there are no aliens (or just a single one) with int(Z) = n+1, then I consider all aliens with Z > n+1 as outliers.
A different rule of thumb would be to simply consider aliens with Z of 20 or more as outliers (see Wishlist). This may be accomplished by
awk '/alien/ { if (strtonum($5) > 19) print$0 }' CORRECT.LP >> REMOVE.HKL


It is useful to inspect the list of aliens after re-running CORRECT; maybe a few more of those should be put into REMOVE.HKL. But this process of rejecting Wilson outliers usually converges very quickly.

• Another way to judge Wilson outliers is to identify resolution ranges that deviate from 1. in the table HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF ACENTRIC DATA in CORRECT.LP. "Aliens" that are put into REMOVE.HKL will lower the values in these resolution ranges!
• SCALEPACK users: don't confuse this process of rejecting Wilson outliers with the iterative procedure of rejecting scaling outliers that is usually done when using SCALEPACK. Scaling outliers are handled non-iteratively in XDS; the only way to influence XDS in this respect is by modifying WFAC1.