SHELX C/D/E: Difference between revisions

From CCP4 wiki
Jump to navigation Jump to search
m (SHELXC/D/E literature reference added)
No edit summary
(38 intermediate revisions by 4 users not shown)
Line 1: Line 1:
SHELXC, SHELXD and SHELXE are stand-alone executables that do not require environment variables or parameter files etc., so all that is needed to install them is to put them in a directory that is in the ‘path’ (e.g. /usr/local/bin or ~/bin under Linux). If you are interested in beta-testing the new version of SHELXE that includes autotracing of protein backbones, please request the password and downloading instructions by sending an email to gsheldr[at]shelx.uni-ac.gwdg.de. This version is described in the paper: <i>"Experimental phasing with SHELXC/D/E: combining chain tracing with density modification"</i>. Sheldrick, G.M. (2010). <i>Acta Cryst.</i> <b>D66</b>, 479-485. It is <a href="http://dx.doi.org/10.1107/S0907444909038360">available</a> as "Open Access" and should be cited when these programs are used.
SHELXC, SHELXD and SHELXE are stand-alone executables that do not require environment variables or parameter files etc., so all that is needed to install them is to put them in a directory that is in the ‘path’ (e.g. /usr/local/bin or ~/bin under Linux). There is a detailed description of these programs in the paper: <i>"Experimental phasing with SHELXC/D/E: combining chain tracing with density modification"</i>. Sheldrick, G.M. (2010). <i>Acta Cryst.</i> <b>D66</b>, 479-485. It is  
available as "Open Access" at http://dx.doi.org/10.1107/S0907444909038360 and should be cited whenever these programs are used.
 
[[hkl2map]] is a graphical user interface that makes it easy to use these programs.
 
[[xds:xdsgui|XDSGUI]] is a graphical user interface for XDS that also makes it easy to use these programs.
 




Line 6: Line 12:
'''SHELXC''' is designed to provide a simple and fast way of setting up the files for the programs '''SHELXD''' (heavy atom location) and '''SHELXE''' (phasing and density modification) for macromolecular phasing by the MAD, SAD, SIR and SIRAS methods. These three programs may be run in batch mode or called from a GUI such as [[CCP4i]] or (better) [[hkl2map]]. SHELXC is much less versatile than the Bruker AXS XPREP program for this purpose, but if you are sure of the space group and there are no problems with the indexing or twinning and the f’ and f” parts of the scattering factors do not need to be refined, SHELXC should be adequate.  
'''SHELXC''' is designed to provide a simple and fast way of setting up the files for the programs '''SHELXD''' (heavy atom location) and '''SHELXE''' (phasing and density modification) for macromolecular phasing by the MAD, SAD, SIR and SIRAS methods. These three programs may be run in batch mode or called from a GUI such as [[CCP4i]] or (better) [[hkl2map]]. SHELXC is much less versatile than the Bruker AXS XPREP program for this purpose, but if you are sure of the space group and there are no problems with the indexing or twinning and the f’ and f” parts of the scattering factors do not need to be refined, SHELXC should be adequate.  


SHELXC can read either HKL2000 format .sca files or SHELX .hkl files (F<sup>2</sup> unless the -f switch is used to specify F). To transfer data from CCP4 it is advisable to generate .sca files using 'output unmerged polish' from SCALA or to use the program mtz2sca written by Tim Grüne and supplied with SHELX. The current version of SHELXC outputs extra useful diagnostic statistics if fed unmerged data.
The starting phases for density modification are estimated as (heavy atom phase + &alpha;) in the simplified approach used by SHELXE, &alpha; is calculated by SHELXC from the anomalous and dispersive differences. For SAD &alpha; is 90º (I<sub>+</sub> > I<sub>–</sub>) or 270º (I<sub>+</sub> < I<sub>–</sub>), for SIR and RIP &alpha; is 0º or 180º and for SIRAS or MAD &alpha; may be anywhere in the range 0º to 360º.
 
SHELXC reads a filename stem (denoted here by 'xx') on the command line
plus some instructions from 'standard input'. It writes some statistics to  
'standard output' and prepares the three files needed to run SHELXD and
SHELXE. SHELXC can be called from a GUI by a command line such as:


SHELXC reads a filename stem on the command line plus some instructions from 'standard input'. It writes some statistics to 'standard output' and prepares the three files needed to run SHELXD and SHELXE. It can be called from a GUI using a single command line such as:
  shelxc xx <t
  shelxc xx <t
which would read the instructions from the file t and write the files xx.hkl (h,k,l,I,&sigma;(I) in SHELX HKLF4 format for density modification by SHELXE), xx_fa.ins (cell, symmetry etc. for heavy atom location using SHELXD) and xx_fa.hkl (h,k,l,F<sub>A</sub>,&sigma;(F<sub>A</sub>),&alpha; for both SHELXD and SHELXE). The starting phases for density modification are estimated as (heavy atom phase + &alpha;) in the simplified approach used by SHELXE, &alpha; is calculated by SHELXC from the anomalous and dispersive differences. For SAD &alpha; is 90º (I<sub>+</sub> > I<sub>–</sub>) or 270º (I<sub>+</sub> < I<sub>–</sub>), for SIR and RIP &alpha; is 0º or 180º and for SIRAS or MAD &alpha; may be anywhere in the range 0º to 360º.  
 
<p>The above command line could be used under UNIX or Windows; under UNIX the commands to run SHELXC, SHELXD and SHELXE and the instructions for SHELXC may also be combined into a single script file as shown in the following examples. In these scripts, the instructions start on the line after '<<EOF' and are terminated by 'EOF'. The instructions may be given in any order; CELL (unit-cell), SPAG (space group in PDB notation, spaces are ignored) and FIND (followed by the number of heavy atoms) must be given; the optional instructions SFAC, MIND, NTRY, SHEL, ESEL and DSUL, if present, are copied to the SHELXD input file. <br><br>
which would read the instructions from the file t, or (under most UNIX
systems) by a simple shell script that includes the instructions, e.g.
 
shelxc xx <<EOF
CELL 49.70 57.90 74.17 90 90 90
SPAG P212121
SAD elastase.sca
FIND 12
<<EOF
shelxd xx_fa
shelxe xx xx_fa -s0.37 -m20 -h -b
shelxe xx xx_fa -s0.37 -m20 -h -b -i
 
which would also run shelxd to locate the sulfur atoms and shelxe (for
both substructure enantiomers) to solve elastase by sulfur-SAD phasing.
The reflection data may be in SHELX (.hkl), HKL2000 (.sca) or XDS
XDS_ASCII.HKL format. Any names may be used for XDS reflection files,
SHELXC recognises them by reading the first record.
 
This script would read data from the .sca file and write the files
xx.hkl (h,k,l,I,sig(I) in SHELX HKLF4 format for density modification
by SHELXE or refinement with SHELXL), xx_fa.ins (cell, symmetry etc. for
heavy atoms location by SHELXD) and xx_fa.hkl (h,k,l,FA,sig(FA),alpha
for both SHELXD and SHELXE). The starting phases for density
modification are estimated as given above.
 
For SIR or SIRAS, two input reflections files are specified by the
keywords NAT and SIR or SIRA; for MAD at least two of the reflection files
HREM, LREM, PEAK and INFL are required and NAT may also be given if higher
resolution native data are available (e.g. SMet for SeMet MAD). Reflection
data should be in SHELX .hkl or SCALEPACK .sca format; many other programs,
including SCALA and XPREP, can output .sca format too. The keywords CELL,
SPAG (space group) and FIND (number of heavy atoms) are
always required, SFAC, MIND, NTRY, SHEL, ESEL and DSUL may be given and are
written to the file xx_fa.ins for SHELXD. MAXM can be used to reserve
memory in units of 1M reflections. For RIP phasing, NAT (or BEFORE) denotes
the file before radiation damage and RIP (or AFTER) after radiation
damage. For RIPAS the 'after' file must be called 'RIPA' and a keyword RIPW
(default 0.6) gives the weight w to be assigned to the 'NAT' data in the
estimation of the anomalous signal (a weight of 1-w is applied to the 'RIPA'
data). DSCA (default 0.98) gives the factor to multiply the native data
for SIR and SIRAS or the 'after' data for RIP after the data have been
put on the same scale (this allows for the extra scattering power of the
heavy atoms etc.); this can be critical for RIP phasing.
 
ASCA (default 1.0) is a scale factor applied to the anomalous signal in a
MAD experiment; to apply MAD to a small molecule, ASCA and DSCA should both
be between 0 and 1, the best values have to be found by trial and error.
Finally SMAD (without a number) sets the dispersive term to zero in a MAD
experiment, equivalent to SAD using weighted mean anomalous differences
from all the MAD datasets. This should always be tried whenever radiation
damage is suspected.
 
SHELXC also tests for and if necessary corrects the more common cases of
inconsistent indexing when more than one dataset is involved. In addition,
the mean value of |E^2-1| is calculated for each dataset to detect twinning.


== SHELXD ==
== SHELXD ==


=== critical parameters ===


In general the critical parameters for locating heavy atoms with SHELXD are:
In general the critical parameters for locating heavy atoms with SHELXD are:


# The resolution cutoff. In the MAD case this is best determined by finding where the correlation coefficient between the signed anomalous differences for wavelengths with the highest anomalous signal (PEAK and HREM or PEAK and INFL) falls below about 30%. For SAD a less reliable guide is where the mean value of |&Delta;F|/&sigma;(&Delta;F) falls below about 1.2 (a value of 0.8 would indicate pure noise), and for S-SAD with CuK&alpha; the data can be truncated where I/&sigma; for the native data falls below 30. If unmerged data are used, SHELXC calculates a correlation coefficient between two randomly selected subsets of the signed anomalous differences; this is a better indicator because it does not require that the intensity esds are on an absolute scale, but it does require a reasonable redundancy and again the data can be truncated where it drops to below 30% (the CCP4 program SCALA prints a similar statistic).
=== Resolution cutoff (SHEL) ===
# The estimated number of sites (FIND) should be within about 20% of the true number. For SeMet or S-SAD phasing there should be a sharp drop in the occupancy after the last true site. For iodide soaks, a good rule of thumb is to start with a number of iodide sites equal to the number of amino-acids in the asymmetric unit divided by 15. If after SHELXD occupancy refinement the occupancy of the last site is more than 0.2 it might be worth increasing this number, and vice versa.
In the MAD case this is best determined by finding where the correlation coefficient between the signed anomalous differences for wavelengths with the highest anomalous signal (PEAK and HREM or PEAK and INFL) falls below about 30%. For SAD a less reliable guide is where the mean value of |&Delta;F|/&sigma;(&Delta;F) falls below about 1.2 (a value of 0.8 would indicate pure noise), and for S-SAD with CuK&alpha; the data can be truncated where I/&sigma; for the native data falls below 30. If unmerged data are used, SHELXC calculates a correlation coefficient between two randomly selected subsets of the signed anomalous differences; this is a better indicator because it does not require that the intensity esds are on an absolute scale, but it does require a reasonable redundancy and again the data can be truncated where it drops to below 30% (XDS and the CCP4 programs aimless/SCALA print a similar statistic).
# A common 'user error' is to set MIND -3.5 even though the distances between heavy atoms are less than 3.5 Å.  For example, in a Fe<sub>4</sub>S<sub>4</sub> cluster the Fe...Fe distance is about 2.7 Å, so MIND -2 would be appropriate. A disulfide bond has a length of 2.03 Å so then MIND -1.5 could be used to resolve the sulfur atoms, however if DSUL is used for this purpose MIND -3.5 is required.
# If heavy atoms can lie on special positions (as is the case with an iodide soak in a space group with twofold axes) the rejection of atoms on special positions should be switched off by giving the second MIND parameter as -0.1 (as in the above thaumatin example).
# In cubic space groups the Patterson seeding (PATS) is slow and less effective, it is recommended that 'PATS' is replaced by 'WEED 0.3'.<br>


=== Number of sites (FIND) ===
The estimated number of sites (FIND) should be within about 20% of the true number. For SeMet or S-SAD phasing there should be a sharp drop in the occupancy after the last true site. For iodide soaks, a good rule of thumb is to start with a number of iodide sites equal to the number of amino-acids in the asymmetric unit divided by 15. If after SHELXD occupancy refinement the occupancy of the last site is more than 0.2 it might be worth increasing this number, and vice versa.
It should be noted that the number of sites that SHELXD will search for is 40% higher than what is asked for by the user, in FIND. The reason for this is that there are often additional minor sites arising from heavy atoms, like Cl or Ca. So if you don't adjust FIND downwards, after an initial SHELXD run, such that the Nth site in the .res file has occupancy > 0.2, then you could either edit the .res file and remove the sites with occupancy < 0.2, or run SHELXE with -hN where N is the site number which has occupancy > 0.2 .
=== Disulfides (DSUL) ===
If the resolution d (second parameter on SHEL card) is > 2.0Å the disulfide bonds may not fully resolved, but in the range 2.8>d>2.0 the DSUL instruction may be used to fit S−S units to the density. This can dramatically improve the final phase quality. If DSUL is used, the first MIND parameter should be set to -3.5 (so that each disulfide is found once only) and disulfides should be counted as single (super-sulfur) atoms for FIND (i.e. each disulfide given in DSUL counts as two atoms for FIND).
=== Minimum distance between atoms (MIND) ===
A common 'user error' is to set MIND -3.5 even though the distances between heavy atoms are less than 3.5 Å.  For example, in a Fe<sub>4</sub>S<sub>4</sub> cluster the Fe...Fe distance is about 2.7 Å, so MIND -2 would be appropriate. A disulfide bond has a length of 2.03 Å so then MIND -1.5 could be used to resolve the sulfur atoms, however if DSUL is used for this purpose MIND -3.5 is required.
If heavy atoms can lie on special positions (as is the case with an iodide soak in a space group with twofold axes) the rejection of atoms on special positions should be switched off by giving the second MIND parameter as -0.1 (as in the above thaumatin example).
=== Interpretation of results ===
For MAD, a CC of 40 to 50% indicates a good solution, for SAD etc. values around 30% may well be correct, especially if the same solution or group of solutions has the highest values of CC, CC(Weak) and PATFOM, and they are well separated from the values for the non-solutions.  The CC values tend to increase as the resolution is lowered.  Heavy atom soaks truncated to low resolution often give spuriously high CC values, but these 'solutions' can be recognized as false by their low CC(weak) values.<br>
For MAD, a CC of 40 to 50% indicates a good solution, for SAD etc. values around 30% may well be correct, especially if the same solution or group of solutions has the highest values of CC, CC(Weak) and PATFOM, and they are well separated from the values for the non-solutions.  The CC values tend to increase as the resolution is lowered.  Heavy atom soaks truncated to low resolution often give spuriously high CC values, but these 'solutions' can be recognized as false by their low CC(weak) values.<br>


Line 33: Line 108:
== SHELXE ==
== SHELXE ==


=== Modes of operation  ===
=== Usage ===
 
A typical SHELXE job for SAD, MAD, SIR or SIRAS phasing could be:
 
shelxe xx xx_fa -s0.5 -z -a3
 
where xx.hkl contains native data and xx_fa.hkl, which should have
been created by SHELXC or XPREP, contains FA and alpha. The heavy
atoms are read from xx_fa.res, which can be generated by SHELXD or
ANODE. 'xx' and 'xx_fa' may be replaced by any strings that make
legal file names. If these heavy atom are present in the native
structure (e.g. for sulfur-SAD but not SIRAS for an iodide soak)
-h is required (or e.g. -h8 to use only the first 8). -z optimizes
the substructure at the start of the phasing. -z9 limits the number
of heavy atoms to 9. If -z is specified without a number,
no limit is imposed. Normally the heavy atom enantiomorph is not
known, so SHELXE should also be run with the -i switch to invert
the heavy atoms and if necessary the space group; this writes
files xx_i.phs instead of xx.phs etc., so may be run in parallel.
 
-a sets the number of global autotracing cycles. -n imposes NCS
during tracing, e.g. -n6 for six-fold NCS or -n if the number of
copies is not known.
 
To start from a MR model without other phase information, the PDB
file from MR should be renamed xx.pda and input to SHELXE, e.g.
 
shelxe xx.pda -s0.5 -a20
 
The number of tracing cycles is usually more here to reduce model
bias. -O enables local rigid group optimization of the domains
defined in the .pda file. If -O and/or -o (-O acts before -o) are
used to improve a model in xx.pda, the revised model is output to
xx.pdo. To refine rigid group domains separately with -O, insert
'REMARK DOMAIN N' records into the .pda file to split the model
into domains, where N (default 1) is the rigid group number of
the following atoms (until the next 'REMARK DOMAIN N'). -ON makes
N simplex trials with starting positions within a cube (edge set
by -Z) around the positions in xx.pda. The first search (the only
one for -O or -O1) starts from the initial position. If the MR
model is large but does not fit well, -o should be included to
prune it before density modification.
 
Tracing from an MR model requires a favorable combination of model
quality, solvent content and data resolution. If e.g. SAD phase
information is available, even if it is too weak for phasing on
its own, the two approaches may be combined:
 
shelxe xx.pda xx_fa -s0.5 -a10 -h -z


SHELXE has following modes of action (xx and yy are filename stems):<br>
The phases from the MR model are used to generate the heavy atom
substructure. This is used to derive experimental phases that are
then combined with the phases from the MR model (MRSAD). The -h,
-O, -o and -z flags are often needed for this mode.


shelxe xx [reads xx.hkl and xx.ins, phases from atoms]
If approximate phases are available, SHELXE may be used to refine
shelxe xx yy [normal mode: reads xx.hkl, yy.hkl, yy.res]
them and make a poly-Ala trace:
shelxe xx yy zz.pdb [as above plus partial structure from zz.pdb]  [NEW!]
shelxe xx.phi [reads xx.phi, xx.hkl, xx.ins]
shelxe xx.phi yy.pdb [reads xx.phi, xx.hkl, xx.ins, partial structure yy.pdb]  [NEW!]
shelxe xx.fcf [reads only xx.fcf]
shelxe xx.phi yy [reads xx.phi, xx.hkl, xx.ins, yy.hkl]
shelxe xx.fcf yy [reads xx.fcf, yy.hkl, yy.res]


xx.hkl contains native data, yy.hkl contains F<sub>A</sub> and &alpha; and should have been created using SHELXC or XPREP. xx.phi has .phs format (h,k,l,F,fom,&phi; in free format) and can be made by renaming a .phs output file from SHELXE, but only the starting phases are read from it; if a .phi file is read, the cell and symmetry are read from xx.ins and the native F-values are read from xx.hkl. xx.fcf (from a SHELXL structure refinement) provides cell, symmetry and starting phases. The output phases are written to xx.phs, the log file is written to xx.lst and, if -b is set, improved substructure phases are output to xx.pha and revised heavy atoms to xx.hat.<br>
shelxe xx.zzz -s0.5 -a3


The first six modes provide density modification starting from atoms and/or phases, the seventh is an inverse cross-Fourier for finding heavy atoms for a second derivative (yy) with the same origin as the first (xx), and the last mode is useful to confirm the heavy atom substructure from the final refined phases. This is useful as a post-mortem if SAD or MAD phasing fails but the structure could be solved by other means. For these last two modes, the phases for the inverse Fourier are (&phi;<sub>nat</sub> – &alpha;), where &phi;<sub>nat</sub>  may be refined (-m etc.) and &alpha; is taken from yy.hkl. A few cycles of phase refinement may reduce the noise in such maps by improving the weights.<br>
where zzz is phi (phs file format), fcf (from SHELXL) or hlc
(Hendrickson-Lattman coefficients, e.g. from SHARP or BP3).
 
In all cases, native data are read from xx.hkl in SHELX format,
and the density modified phases are output to xx.phs (or xx_i.phs
if -i was set). The listing file is xx.lst (or xx_i.lst). If
xx_fa.hkl is read, substructure phases are output to xx.pha (or
xx_i.pha) and the revised substructure is written to xx.hat
(or xx_i.hat).
 
=== Full list of SHELXE options (defaults in brackets) ===
 
-aN - N cycles autotracing [off]
-AX - maximum random initial rotation in deg. for -O [-A3.0]
-bX - B-value to weight anomalous map (xx.pha and xx.hat) [-b5.0]
-cX - fraction of pixels in crossover region [-c0.4]
-dX - truncate reflection data to X Angstroms [off]
-eX - add missing 'free lunch' data up to X Angstroms [dmin+0.2]
-f  - read F rather than intensity from native .hkl file [off]
-FX - fract. weight for phases from previous global cycle [-F0.8]
-gX - solvent gamma flipping factor [-g1.1]
-GX - threshold for accepting new peptide when tracing [-G0.7]
-h or -hN - (N) heavy atoms also present in native structure [-h0]
-i  - invert space group and input (sub)structure or phases [off]
-IN - in cycle 1 only, do N cycles DM (free lunch if -e) [off]
-kX - minimum height/sigma for heavy atom sites in xx.hat [-k4.5]
-KN - keep starting fragment unchanged for N global cycles [off]
-K  - keep fragment unchanged throughout
-lN - reserve space for 1000000N reflections [-l2]
-LN - minimum chain length (at least 3 chains are retained) [-L6}
-mN - N iterations of density modification per global cycle [-m20]
-n or -nN - apply N-fold NCS to traces [off]
-O or -ON - N random-start rigid-group domain searches [off]
-o or -oN - prune up to N residues to optimize CC for xx.pda [off]
-q  - search for alpha-helices [off]
-rX - FFT grid set to X times maximum indices [-r3.0]
-sX - solvent fraction [-s0.45]
-tX - time factor for helix and peptide search [-t1.0]
-uX - allocable memory in MB for fragment optimization [-u500]
-UX - abort if less than X% of initial CA stay within 0.7A [-U0]
-vX - density sharpening [default set by resol., 0 if .pda read]
-wX - add experimental phases with weight X each iteration [-w0.2]
-x  - diagnostics, requires PDB reference file xx.ent [off]
-yX - highest resol. in Ang. for calc. phases from xx.pda [-y1.8]
-YX - SAD phase shift factor [-Y0.5]
-zN - substructure optimization for a maximum of N atoms [off]
-z - substructure optimization, number of atoms not limited [off]
-ZX - maximum shift in Ang. from initial position for -O [-Z1.0]
 
Meaning of additional output when using the -x option:
 
MPE and wMPE are given as two numbers, the one after the '/' is for centric reflections only.
 
The first nine numbers in the row after locating a strand or in the 'Global chain diagnostics' are the percentages of CA within 0-0.1, 0.1-0.2, 0.2-0.3Å etc from the nearest CA in the reference structure. The tenth number is the percentage further than 0.9Å from the nearest CA.
 
The next number is 100 times the number of CA found divided by the number expected for the whole structure. The last number is the mean distance of a CA atom from the nearest CA in the reference structure, whereby distances greater than 2.5Å are replaced by 2.5. One should always look at the second number from the right; for a good trace it should be as low as possible. If you are expanding from a MR solution the program also tells you the percentages of starting atoms retained.


=== Phasing and density modification ===
=== Phasing and density modification ===
Line 58: Line 233:
would do 20 cycles density modification with a solvent content of 0.45, phasing from the first 8 heavy atoms in the yy.res file from SHELXD assuming that they are also present in the native structure (-h8), and then use the modified density to generate improved heavy atoms (-b). The switch -i may be added to invert the substructure (and if necessary the space group), this writes xx_i.phs instead of xx.phs etc., and so may be run in parallel. <br>
would do 20 cycles density modification with a solvent content of 0.45, phasing from the first 8 heavy atoms in the yy.res file from SHELXD assuming that they are also present in the native structure (-h8), and then use the modified density to generate improved heavy atoms (-b). The switch -i may be added to invert the substructure (and if necessary the space group), this writes xx_i.phs instead of xx.phs etc., and so may be run in parallel. <br>


A big difference in the contrast between the two heavy-atom enantiomorphs usually indicates a good SHELXE solution. However in the case of SIR, both have the same contrast but one gives the inverted protein structure. The contrast is also the same for both if the heavy-atom substructure is centrosymmetric. In the case of SAD both heavy atom enantiomers then give the correct structure, for SIR the result is an uninterpretable double image. <br>
A big difference in the contrast between the two heavy-atom enantiomorphs usually indicates a good SHELXE solution. However in the case of SIR, both have the same contrast but one gives the inverted protein structure. The contrast is also the same for both if the heavy-atom substructure is centrosymmetric (there is a [http://cci.lbl.gov/cctbx/phase_o_phrenia.html server] to find out). In the case of SAD both heavy atom enantiomers then give the correct structure, for SIR the result is an uninterpretable double image. <br>


The pseudo-free correlation coefficient (based on the comparison of E<sub>o</sub> and E<sub>c</sub> for 10% of the data left out at random in the calculation of a map that is then density modified and Fourier back-transformed in the usual way) is now printed out before every Nth cycle (set by -j, the default is -j5); a value above 70% usually indicates an interpretable map. The pseudo-free CC (which is also reported in the [[hkl2map]] plot of contrast against cycle number) is also a good indication as to when the phase refinement has converged. <br>
The pseudo-free correlation coefficient (based on the comparison of E<sub>o</sub> and E<sub>c</sub> for 10% of the data left out at random in the calculation of a map that is then density modified and Fourier back-transformed in the usual way) is now printed out before every Nth cycle (set by -j, the default is -j5); a value above 70% usually indicates an interpretable map. The pseudo-free CC (which is also reported in the [[hkl2map]] plot of contrast against cycle number) is also a good indication as to when the phase refinement has converged. <br>
Line 66: Line 241:


Good quality MAD data, a high solvent content and/or high resolution for the native data can lead to maps of high quality that can be autotraced (e.g. with wARP) immediately. The .phs files contain h, k, l, F, fom, &phi; and &sigma;(F) in free format and can be read directly into [[Coot]] or converted to CCP4 .mtz format using [[f2mtz]], e.g. for further density modification exploiting NCS using the CCP4 program [[ccp4dev:Automated phase improvement with Pirate|Pirate]]. Note that if the inverted heavy atom enantiomorph is the correct one, the corresponding phases are in the *_i.phs file and SHELXE may have inverted the space group (e.g. P4<sub>1</sub> to P4<sub>3</sub>), which should be taken into account when moving to other programs!<br>
Good quality MAD data, a high solvent content and/or high resolution for the native data can lead to maps of high quality that can be autotraced (e.g. with wARP) immediately. The .phs files contain h, k, l, F, fom, &phi; and &sigma;(F) in free format and can be read directly into [[Coot]] or converted to CCP4 .mtz format using [[f2mtz]], e.g. for further density modification exploiting NCS using the CCP4 program [[ccp4dev:Automated phase improvement with Pirate|Pirate]]. Note that if the inverted heavy atom enantiomorph is the correct one, the corresponding phases are in the *_i.phs file and SHELXE may have inverted the space group (e.g. P4<sub>1</sub> to P4<sub>3</sub>), which should be taken into account when moving to other programs!<br>
A writeup for a case study, by GMS, as of Jan 13, 2013, is at [https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1301&L=ccp4bb&F=&S=&X=160F8F2598F868FF2A&P=83871].


=== The free lunch algorithm (FLA) ===
=== The free lunch algorithm (FLA) ===


The new switch -e may be used to extrapolate the data to the specified resolution (the '''''free lunch algorithm'''''), based closely on work by the Bari group (Caliandro ''et al''., ''Acta Crystallogr''. (2005) '''D61''', 556-565) and independently implemented in the program [[Acorn]] (Yao ''et al''., (2005) ''Acta Crystallogr''. '''D61''', 1465-1475): -e1.0 can produce spectacular results when applied to data collected to 1.6 to 2.0 Å, but since a large number of cycles is required (-m400) and the 'contrast' and 'connectivity' become unreliable (the pseudo-free CC is the only reliable map quality indicator when the FLA is used), it may be best to establish the substructure enantiomorph and solvent content without -e first. The default setting when -e is not specified is to fill in missing low and medium resolution data but not to extrapolate to higher resolution than actually measured (to switch off this filling in, use -e999). The resolution requirements for the FLA still need to be explored, but so far there have been no reports of it causing a deterioration in map quality, and in a few cases the mean phase error was reduced by as much as 30º relative to density modification without it.<br>
The switch -e may be used to extrapolate the data to the specified resolution (the '''''free lunch algorithm'''''), based closely on work by the Bari group (Caliandro ''et al''., ''Acta Crystallogr''. (2005) '''D61''', 556-565) and independently implemented in the program [[Acorn]] (Yao ''et al''., (2005) ''Acta Crystallogr''. '''D61''', 1465-1475): -e1.0 can produce spectacular results when applied to data collected to 1.6 to 2.0 Å, but since a large number of cycles is required (-m400) and the 'contrast' and 'connectivity' become unreliable (the pseudo-free CC is the only reliable map quality indicator when the FLA is used), it may be best to establish the substructure enantiomorph and solvent content without -e first. The default setting when -e is not specified is to fill in missing low and medium resolution data but not to extrapolate to higher resolution than actually measured (to switch off this filling in, use -e999). The resolution requirements for the FLA still need to be explored, but so far there have been no reports of it causing a deterioration in map quality, and in a few cases the mean phase error was reduced by as much as 30º relative to density modification without it.<br>


=== notes about the beta-test version of SHELXE ===
=== How to find out if a molecular replacement solution is correct or wrong ===


There is a beta-test version of SHELXE available upon request from George Sheldrick.  
From a [https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=ccp4bb&F=&S=&P=41951 November 2011 posting of George Sheldrick on CCP4BB]: We have unintentionally discovered a very simple way of telling whether
an MR solution is correct or not, provided that (as in this case) native
data have been measured to about 2.1A or better. This uses the current
beta-test of SHELXE that does autotracing (available on email request).


Among other features and improvements, this version does autotracing: a poly-Ala trace will be written to the output .pdb file. In general the structure is solved if the CC for the trace is over 25% or the ratio of residues traced to the number of chains is greater than about 10.
First rename the PDB file from MR to name.pda and generate a SHELX
format file name.hkl, e.g. using Tim Gruene's mtz2hkl, where 'name' may
be chosen freely but should be the same for both input files. Then run
SHELXE with a large number of autotracing cycles (here 50), e.g.


For this version, the -a option sets the number of global autotracing cycles; -a on its own is equivalent to -a3.
shelxe name.pda -a50 -s0.5 -y2


Use with [[hkl2map]]: let hkl2map find which enantiomorph is probably correct and then run a final SHELXE job from the command line, using the options -a3, -q (unless it is known that there are no helices) and (if the resolution is 2A or better) -e1.0 (or highest resolution -0.8, if more).
-s sets the solvent content and -y a resolution limit for generating
starting phases. If the .hkl file contains F rather than intensity the
-f switch is also required.


NCS can be taken into account by adding the -n option.
If the model is wrong the CC value for the trace will gradually
decrease as the model disintegrates. If the model is good the CC will
increase, and if it reaches 30% or better the structure is solved. In
cases with a poor but not entirely wrong starting fragment, the CC may
vary erratically for 10-30 cycles before it locks in to the correct
solution and the CC increases over three or four cycles to the value
for a solved structure (25 to 50%). The solution with the best CC is
written to name.pdb and its phases to name.phs for input to e.g. Coot.


Use for RIP: the beta-test SHELXE inserts HKLF 4 and END before the first negative peak when writing the revised substructure to the .hat file. Normally this is a good way of finding where the noise begins, but for RIP if you want to recycle the negative peaks these lines should be removed.
=== How to tell SHELXE about NCS in a molecular replacement solution PDB file ===
 
(communicated by Isabel Uson) Insert a line
REMARK 299 NCS GROUP BEGIN
before the ATOM (or HETATM) lines of each NCS group (e.g. chain), and insert the line
REMARK 299 NCS GROUP END
after the last of these. The -n option is not needed then. The output of SHELXE should tell you about the fact that it understood the NCS specification.


== RIP with SHELXC/D/E ==
== RIP with SHELXC/D/E ==
Line 92: Line 290:
RIP (without using anomalous scattering) or RIPAS (like SIRAS, assuming that the anomalous atoms are also those most sensitive to radiation damage) can be capable of solving difficult structures. A typical procedure on a third generation synchrotron beamline is to collect the 'before' dataset with an attenuator in the beam, then to fry the crystal for a couple of minutes with the unattenuated beam, and finally to collect an 'after' dataset with the attenuator in. In the SHELXC instructions, the 'before' data are called 'NAT' or 'BEFORE' and the 'after' data are called 'RIP' or 'AFTER'. The critical parameter is the scale factor applied to the 'after' data after both datasets have been brought onto a common scale. This is set by the SHELXC instruction 'DSCA' and should usually be in the range 0.9 to 1.05. This scale factor may also be used for SIR and SIRAS, where it is applied to the native data, but it appears to be less critical than for RIP. For RIPAS, the 'after' data should be called 'RIPA' and the 'RIPW' instruction specifies the weight w (default 0.6) for the anomalous contribution from the 'before' dataset (a weight 1–w is applied to the 'after' data).
RIP (without using anomalous scattering) or RIPAS (like SIRAS, assuming that the anomalous atoms are also those most sensitive to radiation damage) can be capable of solving difficult structures. A typical procedure on a third generation synchrotron beamline is to collect the 'before' dataset with an attenuator in the beam, then to fry the crystal for a couple of minutes with the unattenuated beam, and finally to collect an 'after' dataset with the attenuator in. In the SHELXC instructions, the 'before' data are called 'NAT' or 'BEFORE' and the 'after' data are called 'RIP' or 'AFTER'. The critical parameter is the scale factor applied to the 'after' data after both datasets have been brought onto a common scale. This is set by the SHELXC instruction 'DSCA' and should usually be in the range 0.9 to 1.05. This scale factor may also be used for SIR and SIRAS, where it is applied to the native data, but it appears to be less critical than for RIP. For RIPAS, the 'after' data should be called 'RIPA' and the 'RIPW' instruction specifies the weight w (default 0.6) for the anomalous contribution from the 'before' dataset (a weight 1–w is applied to the 'after' data).


In RIP or RIPAS phase determination is usually necessary to recycle the 'heavy atom' sites by renaming the output .hat (or _i.hat) file as .res and rerunning SHELXE. It is advisable to edit this file so as to retain the stronger negative sites, these may well correspond to the new positions of displaced atoms. SHELXE can read negative occupancies but SHELXD can only search for positive atoms.  
In RIP or RIPAS phase determination is usually necessary to recycle the 'heavy atom' sites by renaming the output .hat (or _i.hat) file as .res and rerunning SHELXE. It is advisable to edit this file so as to retain the stronger negative sites, these may well correspond to the new positions of displaced atoms. SHELXE can read negative occupancies but SHELXD can only search for positive atoms. SHELXE inserts HKLF 4 and END before the first negative peak when writing the revised substructure to the .hat file. Normally this is a good way of finding where the noise begins, but for RIP if you want to recycle the negative peaks these lines should be removed.


It should be noted that in a pure RIP experiment, both hands of the radiation damage substructure will give the same figures of merit, but one will lead to an electron density map that is a mirror image of the true map (the helices will go the wrong way round). <br>
It should be noted that in a pure RIP experiment, both hands of the radiation damage substructure will give the same figures of merit, but one will lead to an electron density map that is a mirror image of the true map (the helices will go the wrong way round). <br>
Line 162: Line 360:


Here the resolution cutoff has been reduced from 2.1 Å (which SHELXC would have suggested) to 2.0 Å to improve the chances of resolving the sulfurs. The SHEL, FIND, MIND and NTRY instructions are transferred to the file thau_fa.ins for the sulfur atom location with SHELXD. Note that the phases can be improved further in this case by using more SHELXE cycles than the usual 20. <br><br>
Here the resolution cutoff has been reduced from 2.1 Å (which SHELXC would have suggested) to 2.0 Å to improve the chances of resolving the sulfurs. The SHEL, FIND, MIND and NTRY instructions are transferred to the file thau_fa.ins for the sulfur atom location with SHELXD. Note that the phases can be improved further in this case by using more SHELXE cycles than the usual 20. <br><br>
==  SAD/MAD with automatic backbone building ==
shelxe exp1 exp1_fa -a -q -h -s0.6 -m20 -b
will use exp1.hkl, exp1_fa.hkl, exp1.ins (as above) and will try 3 cycles of backbone building.


=== SIRAS ===
=== SIRAS ===
Line 180: Line 385:
  shelxe thaui thaui_fa -s0.5 –m20 -i
  shelxe thaui thaui_fa -s0.5 –m20 -i


== Obtaining the SHELX programs ==


== Obtaining the SHELX programs ==
SHELXC/D/E and test data may be downloaded from the [http://shelx.uni-goettingen.de/bin/ SHELX fileserver]. First fill the application form at  http://shelx.uni-goettingen.de/register.php  Password and downloading instructions will then be emailed to the address given on the form.  The programs are free to academics but a small license fee is required for 'for-profit' use. 


SHELXC/D/E and test data may be downloaded from the SHELX fileserver. The application form should be printed out from http://shelx.uni-ac.gwdg.de/SHELX/  This form should be completed and faxed to +49-551-392582.  Downloading instructions will then be emailed to the address given on the form, so please write the email address CLEARLY.  The programs are free to academics but a small license fee is required for 'for-profit' use. <br>
Beta-test versions are also available from time to time. They are announced by George Sheldrick and are available from the beta-test directory. The username and password for accessing these may be obtained from GS.


[[hkl2map]] can be downloaded from a website at EMBL Hamburg.


== References ==
== References ==
Line 190: Line 397:
If these programs prove useful, you may wish to cite (and read!):<br>
If these programs prove useful, you may wish to cite (and read!):<br>


Sheldrick, G.M. (2008). "A short history of SHELX", ''Acta Crystallogr''. '''D64''', 112-122 [''Standard reference for all SHELX... programs''].<br>
[http://scripts.iucr.org/cgi-bin/paper?sc5010 Sheldrick, G.M. (2008). "A short history of SHELX", ''Acta Crystallogr''. '''D64''', 112-122] [''Standard reference for all SHELX* programs''].<br>


Sheldrick, G.M., Hauptman, H.A., Weeks, C.M., Miller, R. & Usón, I. (2001). "Ab initio phasing". In ''International Tables for Crystallography'', Vol. F, Eds. Rossmann, M.G. & Arnold, E., IUCr and Kluwer Academic Publishers, Dordrecht pp. 333-351 [''Full background to the dual-space recycling used in SHELXD''].<br>
Sheldrick, G.M., Hauptman, H.A., Weeks, C.M., Miller, R. & Usón, I. (2001). "Ab initio phasing". In ''International Tables for Crystallography'', Vol. F, Eds. Rossmann, M.G. & Arnold, E., IUCr and Kluwer Academic Publishers, Dordrecht pp. 333-351 [''Full background to the dual-space recycling used in SHELXD''].<br>
Line 196: Line 403:
Schneider, T.R. & Sheldrick, G.M. (2002). "Substructure Solution with SHELXD", ''Acta Crystallogr''. '''D58''', 1772-1779 [''Heavy atom location with SHELXD''].<br>
Schneider, T.R. & Sheldrick, G.M. (2002). "Substructure Solution with SHELXD", ''Acta Crystallogr''. '''D58''', 1772-1779 [''Heavy atom location with SHELXD''].<br>


Sheldrick, G.M. (2002), "Macromolecular phasing with SHELXE", ''Z. Kristallogr''. '''217''', 644-650 [''The definitive reference for SHELXE, usually cited wrongly''].<br>
Sheldrick, G.M. (2002), "Macromolecular phasing with SHELXE", ''Z. Kristallogr''. '''217''', 644-650  


Nanao, M.H., Sheldrick, G.M. & Ravelli, R.B.G. (2005). "Improving radiation-damage substructures for RIP", ''Acta Crystallogr''. '''D61''', 1227-1237 [''Practical details of RIP phasing with SHELXC/D/E''].<br>
Nanao, M.H., Sheldrick, G.M. & Ravelli, R.B.G. (2005). "Improving radiation-damage substructures for RIP", ''Acta Crystallogr''. '''D61''', 1227-1237 [''Practical details of RIP phasing with SHELXC/D/E''].<br>


Uson, I., Stevenson, C.E.M., Lawson, D.M. & Sheldrick, G.M. (2007). "Structure determination of the O-methyltransferase NovP using the `free lunch algorithm' as implemented in SHELXE", ''Acta Crystallogr''. '''D63''', 1069-1074 [''Implementation of the FLA in SHELXE''].<br>
Uson, I., Stevenson, C.E.M., Lawson, D.M. & Sheldrick, G.M. (2007). "Structure determination of the O-methyltransferase NovP using the `free lunch algorithm' as implemented in SHELXE", ''Acta Crystallogr''. '''D63''', 1069-1074 [''Implementation of the FLA in SHELXE''].<br>
[http://dx.doi.org/10.1107/S0907444909038360 Sheldrick, G.M. (2010). "Experimental phasing with SHELXC/D/E: combining chain tracing with density modification", ''Acta Cryst'' '''D66''', 479-485.]
[https://doi.org/10.1107/S0907444913027534 A. Thorn and Sheldrick, G.M. (2013) Extending molecular-replacement solutions with SHELXE. ''Acta Cryst'' '''D69''', 2251-2256.]
<br>
<br>
See also the SHELX homepage at: http://shelx.uni-ac.gwdg.de/SHELX/
See also the [http://shelx.uni-goettingen.de/ SHELX homepage]
<br>
<br>

Revision as of 20:38, 30 January 2020

SHELXC, SHELXD and SHELXE are stand-alone executables that do not require environment variables or parameter files etc., so all that is needed to install them is to put them in a directory that is in the ‘path’ (e.g. /usr/local/bin or ~/bin under Linux). There is a detailed description of these programs in the paper: "Experimental phasing with SHELXC/D/E: combining chain tracing with density modification". Sheldrick, G.M. (2010). Acta Cryst. D66, 479-485. It is available as "Open Access" at http://dx.doi.org/10.1107/S0907444909038360 and should be cited whenever these programs are used.

hkl2map is a graphical user interface that makes it easy to use these programs.

XDSGUI is a graphical user interface for XDS that also makes it easy to use these programs.


SHELXC

SHELXC is designed to provide a simple and fast way of setting up the files for the programs SHELXD (heavy atom location) and SHELXE (phasing and density modification) for macromolecular phasing by the MAD, SAD, SIR and SIRAS methods. These three programs may be run in batch mode or called from a GUI such as CCP4i or (better) hkl2map. SHELXC is much less versatile than the Bruker AXS XPREP program for this purpose, but if you are sure of the space group and there are no problems with the indexing or twinning and the f’ and f” parts of the scattering factors do not need to be refined, SHELXC should be adequate.

The starting phases for density modification are estimated as (heavy atom phase + α) in the simplified approach used by SHELXE, α is calculated by SHELXC from the anomalous and dispersive differences. For SAD α is 90º (I+ > I) or 270º (I+ < I), for SIR and RIP α is 0º or 180º and for SIRAS or MAD α may be anywhere in the range 0º to 360º.

SHELXC reads a filename stem (denoted here by 'xx') on the command line plus some instructions from 'standard input'. It writes some statistics to 'standard output' and prepares the three files needed to run SHELXD and SHELXE. SHELXC can be called from a GUI by a command line such as:

shelxc xx <t

which would read the instructions from the file t, or (under most UNIX systems) by a simple shell script that includes the instructions, e.g.

shelxc xx <<EOF
CELL 49.70 57.90 74.17 90 90 90
SPAG P212121
SAD elastase.sca
FIND 12
<<EOF
shelxd xx_fa
shelxe xx xx_fa -s0.37 -m20 -h -b
shelxe xx xx_fa -s0.37 -m20 -h -b -i

which would also run shelxd to locate the sulfur atoms and shelxe (for both substructure enantiomers) to solve elastase by sulfur-SAD phasing. The reflection data may be in SHELX (.hkl), HKL2000 (.sca) or XDS XDS_ASCII.HKL format. Any names may be used for XDS reflection files, SHELXC recognises them by reading the first record.

This script would read data from the .sca file and write the files xx.hkl (h,k,l,I,sig(I) in SHELX HKLF4 format for density modification by SHELXE or refinement with SHELXL), xx_fa.ins (cell, symmetry etc. for heavy atoms location by SHELXD) and xx_fa.hkl (h,k,l,FA,sig(FA),alpha for both SHELXD and SHELXE). The starting phases for density modification are estimated as given above.

For SIR or SIRAS, two input reflections files are specified by the keywords NAT and SIR or SIRA; for MAD at least two of the reflection files HREM, LREM, PEAK and INFL are required and NAT may also be given if higher resolution native data are available (e.g. SMet for SeMet MAD). Reflection data should be in SHELX .hkl or SCALEPACK .sca format; many other programs, including SCALA and XPREP, can output .sca format too. The keywords CELL, SPAG (space group) and FIND (number of heavy atoms) are always required, SFAC, MIND, NTRY, SHEL, ESEL and DSUL may be given and are written to the file xx_fa.ins for SHELXD. MAXM can be used to reserve memory in units of 1M reflections. For RIP phasing, NAT (or BEFORE) denotes the file before radiation damage and RIP (or AFTER) after radiation damage. For RIPAS the 'after' file must be called 'RIPA' and a keyword RIPW (default 0.6) gives the weight w to be assigned to the 'NAT' data in the estimation of the anomalous signal (a weight of 1-w is applied to the 'RIPA' data). DSCA (default 0.98) gives the factor to multiply the native data for SIR and SIRAS or the 'after' data for RIP after the data have been put on the same scale (this allows for the extra scattering power of the heavy atoms etc.); this can be critical for RIP phasing.

ASCA (default 1.0) is a scale factor applied to the anomalous signal in a MAD experiment; to apply MAD to a small molecule, ASCA and DSCA should both be between 0 and 1, the best values have to be found by trial and error. Finally SMAD (without a number) sets the dispersive term to zero in a MAD experiment, equivalent to SAD using weighted mean anomalous differences from all the MAD datasets. This should always be tried whenever radiation damage is suspected.

SHELXC also tests for and if necessary corrects the more common cases of inconsistent indexing when more than one dataset is involved. In addition, the mean value of |E^2-1| is calculated for each dataset to detect twinning.

SHELXD

In general the critical parameters for locating heavy atoms with SHELXD are:

Resolution cutoff (SHEL)

In the MAD case this is best determined by finding where the correlation coefficient between the signed anomalous differences for wavelengths with the highest anomalous signal (PEAK and HREM or PEAK and INFL) falls below about 30%. For SAD a less reliable guide is where the mean value of |ΔF|/σ(ΔF) falls below about 1.2 (a value of 0.8 would indicate pure noise), and for S-SAD with CuKα the data can be truncated where I/σ for the native data falls below 30. If unmerged data are used, SHELXC calculates a correlation coefficient between two randomly selected subsets of the signed anomalous differences; this is a better indicator because it does not require that the intensity esds are on an absolute scale, but it does require a reasonable redundancy and again the data can be truncated where it drops to below 30% (XDS and the CCP4 programs aimless/SCALA print a similar statistic).

Number of sites (FIND)

The estimated number of sites (FIND) should be within about 20% of the true number. For SeMet or S-SAD phasing there should be a sharp drop in the occupancy after the last true site. For iodide soaks, a good rule of thumb is to start with a number of iodide sites equal to the number of amino-acids in the asymmetric unit divided by 15. If after SHELXD occupancy refinement the occupancy of the last site is more than 0.2 it might be worth increasing this number, and vice versa.

It should be noted that the number of sites that SHELXD will search for is 40% higher than what is asked for by the user, in FIND. The reason for this is that there are often additional minor sites arising from heavy atoms, like Cl or Ca. So if you don't adjust FIND downwards, after an initial SHELXD run, such that the Nth site in the .res file has occupancy > 0.2, then you could either edit the .res file and remove the sites with occupancy < 0.2, or run SHELXE with -hN where N is the site number which has occupancy > 0.2 .

Disulfides (DSUL)

If the resolution d (second parameter on SHEL card) is > 2.0Å the disulfide bonds may not fully resolved, but in the range 2.8>d>2.0 the DSUL instruction may be used to fit S−S units to the density. This can dramatically improve the final phase quality. If DSUL is used, the first MIND parameter should be set to -3.5 (so that each disulfide is found once only) and disulfides should be counted as single (super-sulfur) atoms for FIND (i.e. each disulfide given in DSUL counts as two atoms for FIND).

Minimum distance between atoms (MIND)

A common 'user error' is to set MIND -3.5 even though the distances between heavy atoms are less than 3.5 Å. For example, in a Fe4S4 cluster the Fe...Fe distance is about 2.7 Å, so MIND -2 would be appropriate. A disulfide bond has a length of 2.03 Å so then MIND -1.5 could be used to resolve the sulfur atoms, however if DSUL is used for this purpose MIND -3.5 is required.

If heavy atoms can lie on special positions (as is the case with an iodide soak in a space group with twofold axes) the rejection of atoms on special positions should be switched off by giving the second MIND parameter as -0.1 (as in the above thaumatin example).

Interpretation of results

For MAD, a CC of 40 to 50% indicates a good solution, for SAD etc. values around 30% may well be correct, especially if the same solution or group of solutions has the highest values of CC, CC(Weak) and PATFOM, and they are well separated from the values for the non-solutions. The CC values tend to increase as the resolution is lowered. Heavy atom soaks truncated to low resolution often give spuriously high CC values, but these 'solutions' can be recognized as false by their low CC(weak) values.

In difficult cases SHELXD can be run with different SHEL instructions, e.g. truncating the data in steps of 0.1 Å, and the CC values compared. This is especially convenient if a computer farm can be used to run the jobs in parallel. If the best CC is plotted against the resolution, a local maximum (when also observed for the CC(weak) values) may indicate a correct solution.

The default weights for the CC are 1/σ(E)2. The presence of one or two reflections with very low esds can lead to unreasonably high values of the CC for wrong solutions. If the esds are unreliable it is advisable to use 'CCWT 0.1' in the .ins file for SHELXD. The precision of the heavy atom coordinates can be improved, at the cost of the CPU time, by making the Fourier grid finer (e.g. FRES 4 instead of the default 2.5).

SHELXE

Usage

A typical SHELXE job for SAD, MAD, SIR or SIRAS phasing could be:

shelxe xx xx_fa -s0.5 -z -a3

where xx.hkl contains native data and xx_fa.hkl, which should have been created by SHELXC or XPREP, contains FA and alpha. The heavy atoms are read from xx_fa.res, which can be generated by SHELXD or ANODE. 'xx' and 'xx_fa' may be replaced by any strings that make legal file names. If these heavy atom are present in the native structure (e.g. for sulfur-SAD but not SIRAS for an iodide soak) -h is required (or e.g. -h8 to use only the first 8). -z optimizes the substructure at the start of the phasing. -z9 limits the number of heavy atoms to 9. If -z is specified without a number, no limit is imposed. Normally the heavy atom enantiomorph is not known, so SHELXE should also be run with the -i switch to invert the heavy atoms and if necessary the space group; this writes files xx_i.phs instead of xx.phs etc., so may be run in parallel.

-a sets the number of global autotracing cycles. -n imposes NCS during tracing, e.g. -n6 for six-fold NCS or -n if the number of copies is not known.

To start from a MR model without other phase information, the PDB file from MR should be renamed xx.pda and input to SHELXE, e.g.

shelxe xx.pda -s0.5 -a20

The number of tracing cycles is usually more here to reduce model bias. -O enables local rigid group optimization of the domains defined in the .pda file. If -O and/or -o (-O acts before -o) are used to improve a model in xx.pda, the revised model is output to xx.pdo. To refine rigid group domains separately with -O, insert 'REMARK DOMAIN N' records into the .pda file to split the model into domains, where N (default 1) is the rigid group number of the following atoms (until the next 'REMARK DOMAIN N'). -ON makes N simplex trials with starting positions within a cube (edge set by -Z) around the positions in xx.pda. The first search (the only one for -O or -O1) starts from the initial position. If the MR model is large but does not fit well, -o should be included to prune it before density modification.

Tracing from an MR model requires a favorable combination of model quality, solvent content and data resolution. If e.g. SAD phase information is available, even if it is too weak for phasing on its own, the two approaches may be combined:

shelxe xx.pda xx_fa -s0.5 -a10 -h -z

The phases from the MR model are used to generate the heavy atom substructure. This is used to derive experimental phases that are then combined with the phases from the MR model (MRSAD). The -h, -O, -o and -z flags are often needed for this mode.

If approximate phases are available, SHELXE may be used to refine them and make a poly-Ala trace:

shelxe xx.zzz -s0.5 -a3

where zzz is phi (phs file format), fcf (from SHELXL) or hlc (Hendrickson-Lattman coefficients, e.g. from SHARP or BP3).

In all cases, native data are read from xx.hkl in SHELX format, and the density modified phases are output to xx.phs (or xx_i.phs if -i was set). The listing file is xx.lst (or xx_i.lst). If xx_fa.hkl is read, substructure phases are output to xx.pha (or xx_i.pha) and the revised substructure is written to xx.hat (or xx_i.hat).

Full list of SHELXE options (defaults in brackets)

-aN - N cycles autotracing [off]
-AX - maximum random initial rotation in deg. for -O [-A3.0]
-bX - B-value to weight anomalous map (xx.pha and xx.hat) [-b5.0]
-cX - fraction of pixels in crossover region [-c0.4]
-dX - truncate reflection data to X Angstroms [off]
-eX - add missing 'free lunch' data up to X Angstroms [dmin+0.2]
-f  - read F rather than intensity from native .hkl file [off]
-FX - fract. weight for phases from previous global cycle [-F0.8]
-gX - solvent gamma flipping factor [-g1.1]
-GX - threshold for accepting new peptide when tracing [-G0.7]
-h or -hN - (N) heavy atoms also present in native structure [-h0]
-i  - invert space group and input (sub)structure or phases [off]
-IN - in cycle 1 only, do N cycles DM (free lunch if -e) [off]
-kX - minimum height/sigma for heavy atom sites in xx.hat [-k4.5]
-KN - keep starting fragment unchanged for N global cycles [off]
-K  - keep fragment unchanged throughout
-lN - reserve space for 1000000N reflections [-l2]
-LN - minimum chain length (at least 3 chains are retained) [-L6}
-mN - N iterations of density modification per global cycle [-m20]
-n or -nN - apply N-fold NCS to traces [off]
-O or -ON - N random-start rigid-group domain searches [off]
-o or -oN - prune up to N residues to optimize CC for xx.pda [off]
-q  - search for alpha-helices [off]
-rX - FFT grid set to X times maximum indices [-r3.0]
-sX - solvent fraction [-s0.45]
-tX - time factor for helix and peptide search [-t1.0]
-uX - allocable memory in MB for fragment optimization [-u500]
-UX - abort if less than X% of initial CA stay within 0.7A [-U0]
-vX - density sharpening [default set by resol., 0 if .pda read]
-wX - add experimental phases with weight X each iteration [-w0.2]
-x  - diagnostics, requires PDB reference file xx.ent [off]
-yX - highest resol. in Ang. for calc. phases from xx.pda [-y1.8]
-YX - SAD phase shift factor [-Y0.5]
-zN - substructure optimization for a maximum of N atoms [off]
-z - substructure optimization, number of atoms not limited [off]
-ZX - maximum shift in Ang. from initial position for -O [-Z1.0]

Meaning of additional output when using the -x option:

MPE and wMPE are given as two numbers, the one after the '/' is for centric reflections only.

The first nine numbers in the row after locating a strand or in the 'Global chain diagnostics' are the percentages of CA within 0-0.1, 0.1-0.2, 0.2-0.3Å etc from the nearest CA in the reference structure. The tenth number is the percentage further than 0.9Å from the nearest CA.

The next number is 100 times the number of CA found divided by the number expected for the whole structure. The last number is the mean distance of a CA atom from the nearest CA in the reference structure, whereby distances greater than 2.5Å are replaced by 2.5. One should always look at the second number from the right; for a good trace it should be as low as possible. If you are expanding from a MR solution the program also tells you the percentages of starting atoms retained.

Phasing and density modification

SHELXE normally requires a few command line switches, e.g.

shelxe xx yy -m20 -s0.45 -h8 -b

would do 20 cycles density modification with a solvent content of 0.45, phasing from the first 8 heavy atoms in the yy.res file from SHELXD assuming that they are also present in the native structure (-h8), and then use the modified density to generate improved heavy atoms (-b). The switch -i may be added to invert the substructure (and if necessary the space group), this writes xx_i.phs instead of xx.phs etc., and so may be run in parallel.

A big difference in the contrast between the two heavy-atom enantiomorphs usually indicates a good SHELXE solution. However in the case of SIR, both have the same contrast but one gives the inverted protein structure. The contrast is also the same for both if the heavy-atom substructure is centrosymmetric (there is a server to find out). In the case of SAD both heavy atom enantiomers then give the correct structure, for SIR the result is an uninterpretable double image.

The pseudo-free correlation coefficient (based on the comparison of Eo and Ec for 10% of the data left out at random in the calculation of a map that is then density modified and Fourier back-transformed in the usual way) is now printed out before every Nth cycle (set by -j, the default is -j5); a value above 70% usually indicates an interpretable map. The pseudo-free CC (which is also reported in the hkl2map plot of contrast against cycle number) is also a good indication as to when the phase refinement has converged.

The solvent content (-s) is by far the most critical parameter for SHELXE, and it is often worth varying it in steps of about 0.05 to maximize the difference in contrast between the two enantiomorphs and the 'pseudo-free CC' (another application for a computer farm!). Usually the optimal solvent content is higher than the calculated value at low resolution (disordered side-chains?) and lower at high resolution (ordered solvent?). Sometimes it is necessary to use many (several hundred) cycles (-m) if the starting phase information is weak but the resolution is very high. For low resolution data, the use of more than 20 refinement cycles is normally counter-productive. The current values of all parameters are output at the start of the SHELXE output, the default values of other parameters will rarely need changing. The -b switch in SHELXE causes updated heavy atom positions to be written to the file name.hat (or name_i.hat). This file can be copied or renamed to the .res file (which should be saved first!) and used to recycle the heavy atoms. The graphics program Coot should be able to deduce the space group name from the symmetry operators in this file, and so a very convenient way to obtain a map after running SHELXE is to start Coot, read in 'coordinates' from the .hat or _i.hat file, and then input the phases from the .phs or _i.phs files and the phases of the heavy atom substructure from the .pha or _i.pha files. It is normally necessary to increase the σ level of the latter map (by hitting '+' several times). This procedure even works correctly when the space group has been inverted by SHELXE!

Good quality MAD data, a high solvent content and/or high resolution for the native data can lead to maps of high quality that can be autotraced (e.g. with wARP) immediately. The .phs files contain h, k, l, F, fom, φ and σ(F) in free format and can be read directly into Coot or converted to CCP4 .mtz format using f2mtz, e.g. for further density modification exploiting NCS using the CCP4 program Pirate. Note that if the inverted heavy atom enantiomorph is the correct one, the corresponding phases are in the *_i.phs file and SHELXE may have inverted the space group (e.g. P41 to P43), which should be taken into account when moving to other programs!

A writeup for a case study, by GMS, as of Jan 13, 2013, is at [1].

The free lunch algorithm (FLA)

The switch -e may be used to extrapolate the data to the specified resolution (the free lunch algorithm), based closely on work by the Bari group (Caliandro et al., Acta Crystallogr. (2005) D61, 556-565) and independently implemented in the program Acorn (Yao et al., (2005) Acta Crystallogr. D61, 1465-1475): -e1.0 can produce spectacular results when applied to data collected to 1.6 to 2.0 Å, but since a large number of cycles is required (-m400) and the 'contrast' and 'connectivity' become unreliable (the pseudo-free CC is the only reliable map quality indicator when the FLA is used), it may be best to establish the substructure enantiomorph and solvent content without -e first. The default setting when -e is not specified is to fill in missing low and medium resolution data but not to extrapolate to higher resolution than actually measured (to switch off this filling in, use -e999). The resolution requirements for the FLA still need to be explored, but so far there have been no reports of it causing a deterioration in map quality, and in a few cases the mean phase error was reduced by as much as 30º relative to density modification without it.

How to find out if a molecular replacement solution is correct or wrong

From a November 2011 posting of George Sheldrick on CCP4BB: We have unintentionally discovered a very simple way of telling whether an MR solution is correct or not, provided that (as in this case) native data have been measured to about 2.1A or better. This uses the current beta-test of SHELXE that does autotracing (available on email request).

First rename the PDB file from MR to name.pda and generate a SHELX format file name.hkl, e.g. using Tim Gruene's mtz2hkl, where 'name' may be chosen freely but should be the same for both input files. Then run SHELXE with a large number of autotracing cycles (here 50), e.g.

shelxe name.pda -a50 -s0.5 -y2

-s sets the solvent content and -y a resolution limit for generating starting phases. If the .hkl file contains F rather than intensity the -f switch is also required.

If the model is wrong the CC value for the trace will gradually decrease as the model disintegrates. If the model is good the CC will increase, and if it reaches 30% or better the structure is solved. In cases with a poor but not entirely wrong starting fragment, the CC may vary erratically for 10-30 cycles before it locks in to the correct solution and the CC increases over three or four cycles to the value for a solved structure (25 to 50%). The solution with the best CC is written to name.pdb and its phases to name.phs for input to e.g. Coot.

How to tell SHELXE about NCS in a molecular replacement solution PDB file

(communicated by Isabel Uson) Insert a line

REMARK 299 NCS GROUP BEGIN

before the ATOM (or HETATM) lines of each NCS group (e.g. chain), and insert the line

REMARK 299 NCS GROUP END

after the last of these. The -n option is not needed then. The output of SHELXE should tell you about the fact that it understood the NCS specification.

RIP with SHELXC/D/E

RIP (radiation damage induced phasing) can be regarded as a sort of isomorphous replacement where the 'after' dataset has lost a few atoms that are particularly susceptible to radiation damage. In fact, many structures have been solved unintentionally with a helping hand from RIP! In a MAD experiment, provided that the 'inflection point' dataset is collected last from the same crystal, the radiation damage has the effect of making f' for the MAD element at this wavelength even more negative than usual, enhancing the dispersive part of the MAD signal. This is especially true of bromine MAD on bromouracil derivatives, because the radiation near the bromine absorption edge appears to be particularly effective at breaking the bromine-carbon bonds irreversibly. Of course if the inflection data are collected first the RIP and dispersive component of the MAD signal will tend to cancel one another, causing the MAD analysis to fail, although SAD may still be able to solve the structure (also a common scenario).

RIP (without using anomalous scattering) or RIPAS (like SIRAS, assuming that the anomalous atoms are also those most sensitive to radiation damage) can be capable of solving difficult structures. A typical procedure on a third generation synchrotron beamline is to collect the 'before' dataset with an attenuator in the beam, then to fry the crystal for a couple of minutes with the unattenuated beam, and finally to collect an 'after' dataset with the attenuator in. In the SHELXC instructions, the 'before' data are called 'NAT' or 'BEFORE' and the 'after' data are called 'RIP' or 'AFTER'. The critical parameter is the scale factor applied to the 'after' data after both datasets have been brought onto a common scale. This is set by the SHELXC instruction 'DSCA' and should usually be in the range 0.9 to 1.05. This scale factor may also be used for SIR and SIRAS, where it is applied to the native data, but it appears to be less critical than for RIP. For RIPAS, the 'after' data should be called 'RIPA' and the 'RIPW' instruction specifies the weight w (default 0.6) for the anomalous contribution from the 'before' dataset (a weight 1–w is applied to the 'after' data).

In RIP or RIPAS phase determination is usually necessary to recycle the 'heavy atom' sites by renaming the output .hat (or _i.hat) file as .res and rerunning SHELXE. It is advisable to edit this file so as to retain the stronger negative sites, these may well correspond to the new positions of displaced atoms. SHELXE can read negative occupancies but SHELXD can only search for positive atoms. SHELXE inserts HKLF 4 and END before the first negative peak when writing the revised substructure to the .hat file. Normally this is a good way of finding where the noise begins, but for RIP if you want to recycle the negative peaks these lines should be removed.

It should be noted that in a pure RIP experiment, both hands of the radiation damage substructure will give the same figures of merit, but one will lead to an electron density map that is a mirror image of the true map (the helices will go the wrong way round).

Examples

RIP

shelxc jia <<EOF
BEFORE jia_nat.hkl
AFTER jia_burnt.sca
CELL 96.00 120.00 166.13 90 90 90
SPAG C2221
FIND 8
DSCA 0.97
NTRY 1000 
EOF
shelxd jia_fa
shelxe jia jia_fa -h -s0.6 -m20 -b
shelxe jia jia_fa -h -s0.6 -m20 -b -i

The critical point for RIP is that you have to try many (about 100) different DSCA values in the range 0.9 to 1.05. The DSCA value that results in the highest CCweak should be chosen.

The -h option is included for SHELXE because the native has heavy atoms. Recycling of the positive and difference peaks produced by –b is normally necessary (rename jia.hat or jia_i.hat to jia_fa.res).

MAD

shelxc jia <<EOF
NAT jia_nat.hkl
HREM jia_hrem.sca
PEAK jia_peak.sca
INFL jia_infl.sca
LREM jia_lrem.sca
CELL 96.00 120.00 166.13 90 90 90
SPAG C2221
FIND 8
NTRY 10 
EOF
shelxd jia_fa
shelxe jia jia_fa -s0.6 -m20
shelxe jia jia_fa -s0.6 -m20 -i

In this example (kindly donated by Zbigniew Dauter; Li et al., Nature Struct. Biol. 7 (2000) 555-559), Se-Met MAD data at four wavelengths are used to calculated the FA-values and phase shifts that are written to the file jia_fa.hkl. The native (S-Met) data are read from jia_nat.hkl and written to jia.hkl. The file jia_fa.ins is prepared using the given cell, space group, FIND and NTRY instructions as well as a suitable SHEL command to truncate the resolution. SHELXD then searches for 8 (FIND) selenium atoms using 10 attempts (NTRY), and SHELXE is run for 20 cycles (-m) of density modification for both heavy atom enantiomorphs (-i inverts) with a solvent content (-s) of 0.6. The protein phases are written to jia.phs and jia_i.phs resp. If NAT is not specified, SHELXC would analyze the four MAD datasets to generate the (SeMet) native data jia.hkl, in which case -h should be specified for SHELXE since the selenium atoms are present in the ‘native’ structure. For MAD at least two wavelengths are required, at least one of which should be PEAK or INFL.

If the MAD experiment fails, one should insert the line 'SMAD' somewhere in the SHELXC input instructions and run the job again. This makes a MAD experiment into a SAD experiment in which a suitably weighted mean of the anomalous differences is employed and the dispersive differences are ignored. If the CC values in SHELXD come out better, this SAD approach is likely to give a better solution, but it may be then worth trying commenting out one or more of the PEAK, INFL, HREM and LREM commands to see if there is a further improvement (if just one remains, it should be renamed SAD).

SAD

This example of thaumatin phasing by means of the native sulfur anomalous signal (Debreczeni et al., Acta Crystallogr. D59 (2003) 688-696) uses 1.55 Å in-house CuKα data:

shelxc thau <<EOF
SAD thau-nat.hkl
CELL 58.036 58.036 151.29 90 90 90
SPAG P41212
FIND 9
DSUL 8
MIND –3.5
NTRY 100
EOF
shelxd thau_fa
shelxe thau thau_fa -h -s0.5 -m20
shelxe thau thau_fa -h -s0.5 -m20 –i

The anomalous differences are extracted from the native data so only one data file is required. The sites specified by FIND consist of one methionine and 8 super-sulfurs, which are then resolved into disulfides using the DSUL instruction that is passed on to SHELXD (Debreczeni et al., Acta Crystallogr. D59 (2003) 2125-2132). Alternatively one could try to find the individual sulfurs with:

SHEL 999 2.0
FIND 17
MIND –1.7

Here the resolution cutoff has been reduced from 2.1 Å (which SHELXC would have suggested) to 2.0 Å to improve the chances of resolving the sulfurs. The SHEL, FIND, MIND and NTRY instructions are transferred to the file thau_fa.ins for the sulfur atom location with SHELXD. Note that the phases can be improved further in this case by using more SHELXE cycles than the usual 20.


SAD/MAD with automatic backbone building

shelxe exp1 exp1_fa -a -q -h -s0.6 -m20 -b

will use exp1.hkl, exp1_fa.hkl, exp1.ins (as above) and will try 3 cycles of backbone building.

SIRAS

This involves the solution of the thaumatin structure using the above 1.55 Å data as native and 2.0 Å CuKα data from a quick iodide soak. SIRAS usually gives the best results for iodide soaks, but it is also possible in this case to use SIR (change ‘SIRA’ to ‘SIR’) or iodine SAD (change ‘SIRA’ to ‘SAD’).

shelxc thaui <<EOF
NAT thau-nat.hkl
SIRA thau-iod.hkl
CELL 58.036 58.036 151.29 90 90 90
SPAG P41212
FIND 17
NTRY 10 
MIND –3.5 –0.1
EOF
shelxd thaui_fa
shelxe thaui thaui_fa -s0.5 –m20
shelxe thaui thaui_fa -s0.5 –m20 -i

Obtaining the SHELX programs

SHELXC/D/E and test data may be downloaded from the SHELX fileserver. First fill the application form at http://shelx.uni-goettingen.de/register.php Password and downloading instructions will then be emailed to the address given on the form. The programs are free to academics but a small license fee is required for 'for-profit' use.

Beta-test versions are also available from time to time. They are announced by George Sheldrick and are available from the beta-test directory. The username and password for accessing these may be obtained from GS.

hkl2map can be downloaded from a website at EMBL Hamburg.

References

If these programs prove useful, you may wish to cite (and read!):

Sheldrick, G.M. (2008). "A short history of SHELX", Acta Crystallogr. D64, 112-122 [Standard reference for all SHELX* programs].

Sheldrick, G.M., Hauptman, H.A., Weeks, C.M., Miller, R. & Usón, I. (2001). "Ab initio phasing". In International Tables for Crystallography, Vol. F, Eds. Rossmann, M.G. & Arnold, E., IUCr and Kluwer Academic Publishers, Dordrecht pp. 333-351 [Full background to the dual-space recycling used in SHELXD].

Schneider, T.R. & Sheldrick, G.M. (2002). "Substructure Solution with SHELXD", Acta Crystallogr. D58, 1772-1779 [Heavy atom location with SHELXD].

Sheldrick, G.M. (2002), "Macromolecular phasing with SHELXE", Z. Kristallogr. 217, 644-650

Nanao, M.H., Sheldrick, G.M. & Ravelli, R.B.G. (2005). "Improving radiation-damage substructures for RIP", Acta Crystallogr. D61, 1227-1237 [Practical details of RIP phasing with SHELXC/D/E].

Uson, I., Stevenson, C.E.M., Lawson, D.M. & Sheldrick, G.M. (2007). "Structure determination of the O-methyltransferase NovP using the `free lunch algorithm' as implemented in SHELXE", Acta Crystallogr. D63, 1069-1074 [Implementation of the FLA in SHELXE].

Sheldrick, G.M. (2010). "Experimental phasing with SHELXC/D/E: combining chain tracing with density modification", Acta Cryst D66, 479-485.

A. Thorn and Sheldrick, G.M. (2013) Extending molecular-replacement solutions with SHELXE. Acta Cryst D69, 2251-2256.
See also the SHELX homepage