- 1 Refinement of macromolecules with SHELXL
- 2 Input files for SHELXL
- 3 SHELXL Output files
- 4 Constraints and restraints
- 5 Chiral volumes
- 6 Least-squares refinement algebra
- 7 Full-matrix estimates of standard uncertainties
- 8 Refinement of anisotropic displacement parameters
- 9 Modeling disorder
- 10 Twinned crystals
- 11 Unstable refinements and other problems
- 12 Obtaining the SHELX programs
- 13 Installing of the multiprocessor version on a Mac
- 14 References and other sources of information
Refinement of macromolecules with SHELXL
SHELXL is a very general crystal structure refinement program that is equally suitable for the refinement of minerals, organometallic structures, oligonucleotides, or proteins (or any mixture thereof) against X-ray or neutron single (or twinned!) crystal data. The price of this generality is that it is somewhat slower than programs specifically written only for protein structure refinement, on the other hand a multiple-CPU version (adapted by Kay Diederichs) compensates for this. Any protein- (or DNA-) specific information must be input to SHELXL by the user in the form of refinement restraints, etc.
Despite this generality, it must be emphasized that SHELXL is not suitable for refinements at resolutions lower than about 2.0 Å because, unlike Refmac and phenix.refine, it does not provide (side-chain) torsion angle restraints, and that a least-squares refinement program such as SHELXL will suffer more from model bias than a program based on maximumlikelihood. Also the Babinet bulk solvent model used in SHELXL is in need of improvement. Almost always the initial refinement will have been performed with another program and SHELXL will be used for the final refinement, perhaps involving extension to very high resolution, modeling of disorder, anisotropic refinement and the least-squares estimation of parameter errors. Thus the starting point for a SHELXL refinement will usually be a PDB format file from the previous refinement. Even when SHELXL has to be used for the refinement of a non-merohedrally twinned structure at lower resolution, the starting model is likely to be in the form of a PDB file from a molecular replacement solution.
Input files for SHELXL
SHELXL usually requires two input files: an .ins file containing crystal data, instructions and atoms, and an .hkl file containing h, k, l, F2 and σ(F2) in fixed ‘HKLF 4’ format [alternatively F and σ(F) may input; this requires the instruction ‘HKLF 3’]. The .ins file will usually be generated from a PDB format file using the ‘I’ option in SHELXPRO. This sets up the TITL...UNIT instructions followed by standard refinement instructions, restraints, instructions for generating hydrogen atoms (commented out until needed) and atoms in crystal coordinates. For residues other than the 20 standard amino-acids, suitable restraints (see below) must be added by hand (see below). The ‘I’ option in SHELXPRO provides a way of renumbering the residues; since SHELXL does not (currently) recognize chain identifiers, chains must be emulated by (for example) adding 1000, 2000 etc. to the residue numbers. SHELXPRO can also perform the reverse operation when preparing a PDB file for deposition (the ‘B’ option). After each refinement job, the output .res file is edited or renamed to a new .ins file that serves as the input for the next refinement job. The updating of the .res file to .ins may also be performed by ‘U’ option in SHELXPRO; do not use the "I" option and the .pdb file for this, because all the special instructions in the .ins file will be lost.
The .hkl file contains the reflection intensity data. It is not necessary to sort the data, eliminate systematic absences or merge equivalents, SHELXL can do this anyway. If it is desired to refine (using complex scattering factors) against separate F2-values for h,k,l and –h,-k,-l some care is needed; there are problems using data processing software (such as CCP4) that does not keep these measurements separate, and ‘MERG 2’ must be specified in the .ins file to prevent SHELXL from merging the Friedel opposites (and setting all f” values to zero). A further problem on continuing a refinement started with another program is to ensure consistent flagging of the free-R reflections. For this reason it is strongly recommended that Tim Grüne's program mtz2hkl is used for this conversion. The Bruker XPREP program provides general facilities for setting Rfree flags and for transferring and extending free-R flags consistently from one reflection file to another taking space group symmetry into account. When twinning or NCS are present, it is better to flag thin resolution shells, otherwise random reflections should be flagged.
SHELXL Output files
SHELXL writes a updated parameter file with the extension .res in the same format as the input .ins file, and an output .fcf file containing phased reflection data in CIF format. This file can be used for depositing the reflection data with the PDB, and both the .res and the .fcf file can be read by Coot to enable the refined atoms and σA-weighted maps to be displayed directly.
Constraints and restraints
In refining macromolecular structures, it is almost always necessary to supplement the diffraction data with chemical information in the form of restraints. A typical restraint is the condition that a bond length should approximate to a target value with a given estimated standard deviation; restraints are treated as extra experimental data items. Even if the crystal diffracts to 1.0 Å, there may well be poorly defined disordered regions for which restraints are essential to obtain a chemically sensible model (the same can be true of small molecules too!).
For some purposes (e.g., riding hydrogen atoms, rigid group refinement, or occupancies of atoms in disordered side-chains), constraints, exact conditions that lead to a reduction in the number of variable parameters, may be more appropriate than restraints; SHELXL allows such constraints and restraints to be mixed freely, i.e. an atoms may be simultaneously subject to several different constraints and restraints. Riding hydrogen atoms (set using HFIX or AFIX instructions) are defined such that the C-H vector remains constant in magnitude and direction, but the carbon atom is free to move; the same shifts are applied to both atoms, and both atoms contribute to the least-squares derivative sums. This model may be combined with anti-bumping restraints that involve hydrogen atoms, which helps to avoid unfavorable side-chain conformations. SHELXL also provides, e.g., methyl groups that can rotate about their local three-fold axes; for small molecules the initial torsion angle may be found using a difference electron density synthesis calculated around the circle of possible hydrogen positions (HFIX 137). In macromolecules, methyl groups are rarely so well defined, so a staggered riding model is usually better (HFIX 33).
Restraints and constraints provide good examples of the way in which individual residues can be referenced by SHELXL. For example,
ANIS_* FE SG SD
makes atoms called FE, SD and SG in any residue anisotropic;
DFIX_1 C1 N 1.329
restrains a specific bond length (for the N-terminal formyl group). Note that when no esd is given, the default (here 0.2Å from the DEFS instruction) is assumed.
DFIX_ALA 1.525 C CA
restrains the C-CA bond in all alanine residues.
SADI_54 0.04 FE SG_6 FE SG_9 FE SG_39 FE SG_42
restrains the bond lengths in the FeS4 unit to be equal, but without a target value, with an esd of 0.04 Å. The central iron atom is in residue number 54 and the four cystein sulfurs are all in different residues. The SADI 'similar distance' restraints provide a convenient way of restaining all sulfate ions in the structure to be regular tetrahedra with approximately equal S-O distances:
SADI_SO4 S O1 S O1 S O3 S O4
SADI_SO4 O1 O2 O1 O3 O1 O4 O2 O3 O2 O4 O3 O4
For a disordered sufate on a symmetry axis it may be necessary to use the EQIV instruction to enable symmetry equivalent to be included in such restraints (explained in Sheldrick (2008) Acta Crystallogr. A64, 112-122).
FLAT_* 0.3 O_- CA_- N C_- CA
restrains N and CA of each amino-acid and O, CA and C of the preceding residue to lie in a plane with a relatively large esd (0.3) (peptide planarity).
The PRODRG server: http://davapc1.bioch.dundee.ac.uk/programs/prodrg/ is recommended for generating restraints in SHELX format for ligands etc; the "J" option in SHELXPRO can also be useful for this if a model is already available. File of DNA and RNA restraints are available from the SHELX download site.
SHELXL defines a chiral volume as the volume of the 'unit-cell' that can be constructed using the three interatomic vectors from the atom in question; this can be calculated as a determinant using orthogonal cartesian coordinates. SHELXL restricts chiral volumes to cases where an atom makes exactly three bonds to other non-hydrogen atoms; hydrogen atoms are ignored. The sign is determined by evaluating the determinant with the rows representing the three vectors in the order of their ASCII codes, and so is independent of the order of the atoms in the input file. This means that the alpha carbon in the 19 standard chiral L-amino-acids will always have a chiral volume of about +2.5 Å3 (using the Cahn-Ingold-Prelog R and S convention would have required L-Cys to have the opposite sign). CB of Ile has a chiral volume of 2.495 but CB of Thr is -2.628. However the CHIV instruction in SHELXL also has other uses, e.g.
CHIV_VAL 2.516 CA
CHIV_VAL -2.622 CB
This restrains the chiral volume of the carbonyl carbon to be zero (the default) with a default esd (0.1 Å3), i.e. restrains it to be planar. CB is not chiral for valine, but the above restraint makes sure that CG1 and CG2 are named conventionally (the RSCB now use this idea to check the naming of H-atoms in -CH2- groups, which is one of the reasons why the hydrogens should be removed before depositing the structure (they are always recalculated anyway before use, e.g. by MolProbity). And if you wanted all the alpha-carbons for the alanines to have the same chiral volume but would like to refine its value, a SHELXL 'free-variable' can be used (here #3):
CHIV_ALA 31 CA
(i.e. 1*fv(3)); if there is a D-Ala in the structure as well:
CHIV_DAL 29 CA
Least-squares refinement algebra
The original SHELX refinement algorithms were modeled closely on those described by Cruickshank (1970). For macromolecular refinement, an alternative to (blocked) full-matrix refinement is provided by the conjugate-gradient solution of the least-squares normal equations as described by Hendrickson & Konnert (1980), including preconditioning of the normal matrix that enables positional and displacement parameters to be refined in the same cycle. The structure factor derivatives contribute only to the diagonal elements of the normal matrix, but all restraints contribute fully to both the diagonal and non-diagonal elements, although neither the Jacobian nor the normal matrix itself are ever generated by SHELXL. The parameter shifts are modified by comparison with those in the previous cycle to accelerate convergence whilst reducing oscillations. Thus, a larger shift is applied to a parameter when the current shift is similar to the previous shift, and a smaller shift is applied when the current and previous shifts have opposite signs.
SHELXL refines against F2 rather than F, which enables all data to be used in the refinement with weights that include contributions from the experimental uncertainties, rather than having to reject F-values below a preset threshold; there is a choice of appropriate weighting schemes. Provided that reasonable estimates of σ(F2) are available, this enables more experimental information to be employed in the refinement; it also facilitates refinement against data from twinned crystals.
Full-matrix estimates of standard uncertainties
Inversion of the full normal matrix (or of large matrix blocks, e.g., for all positional parameters) enables the precision of individual parameters to be estimated, either with or without the inclusion of the restraints in the matrix. The standard uncertainties in dependent quantities (e.g., torsion angles or distances from mean planes) are calculated in SHELXL using the full least-squares correlation matrix.
If high resolution data are available, it may be possible to obtain rigorous standard uncertaintiess by matrix inversion. The structure should first be refined to convergence with CGLS, setting the second parameter to –1 to keep the free-R data separate, then a further refinement should be performed against all data by deleting the second CGLS parameter, and finally a single full-matrix cycle should be performed (‘L.S. 1’) with zero damping and a zero shift multiplier (‘DAMP 0 0’). Often ‘BLOC 1’ will be used so that the (anisotropic) displacement parameters are fixed in this final cycle, which makes the matrix appreciably smaller and more stable on inversion, but still allows the estimation of realistic standard deviations on all geometrical parameters. BOND, RTAB, HTAB and MPLA instructions may be needed to define the dependent parameters for which esds are required, and the connectivity table used by BOND may need to have extra 'bonds' (e.g. to metal ions) added by BIND if they are not generated automatically (rare).
Given high-resolution data (better than say 1.5 Å) all restraints should be removed: lines begining with SIMU, DELU, ISOR, BUMP, DFIX, DANG, CHIV, FLAT and NCSY should be deleted or preceded by "REM". Alternatively, one can determine standard uncertainties in the Bayesian sense that take all available knowledge into account by retaining all the restraints. This may be done at more modest resolution (say 2.5A or better). To obtain mean values and s.u. of e.g. distances or chiral volumes that occur several times in a structure, use DFIX or CHIV with "free variables" (see below).
Refinement of anisotropic displacement parameters
The motion of macromolecules is clearly anisotropic, but the data-to-parameter ratio rarely permits the refinement of the six independent anisotropic displacement parameters (ADPs) per atom; even for small-molecules and data-to-atomic resolution, the anisotropic refinement of disordered regions requires the use of restraints. SHELXL employs three types of ADP-restraint (Sheldrick 1993; Sheldrick & Schneider, 1997). The rigid bond restraint, first suggested by Rollett (1970), assumes that the components of the ADPs of two atoms connected via one (or two) chemical bonds are equal within a specified standard deviation. This has been shown to hold accurately (Hirshfeld, 1976; Trueblood & Dunitz, 1983) for precise structures of small-molecules, so it can be applied as a ’hard’ restraint with small estimated standard deviation. The similar ADP restraint assumes that atoms that are spatially close (but not necessarily bonded because they may be different components of a disordered group) have similar Uij components. An approximately isotropic restraint is useful for isolated solvent molecules. These latter two restraints are only approximate and so should be applied with low weights, i.e., high estimated standard deviations.
The transition from isotropic to anisotropic roughly doubles the number of parameters and almost always results in an appreciable reduction in the R-factor. However, this represents an improvement in the model only when it is accompanied by a significant reduction in the free R-factor (Brünger, 1992). Since the free R-factor is itself subject to uncertainty because of the small sample used, a drop of at least 1% in the free R is needed to justify anisotropic refinement. There should also be a reduction in the goodness of fit, and the resulting thermal ellipsoids should make chemical sense and not be ‘non-positive-definite’!
There are many ways of modeling disorder using SHELXL, but for macromolecules the most convenient is to retain the same atom and residue names for the two or more components and assign a different "part number" (analogous to the PDB alternative site flag) to each component. With this technique, no change is required to the input restraints, etc. Atoms in the same component will normally have a common occupancy that is assigned to a free variable (fv). The starting values for the free variables are given, in order, on the FVAR instruction; note that there is no free variable number 1 (adding 10 fixes a parameter); the first FVAR parameter is the overall scale factor. Residues Glu_12 and Cys_38 have disordered side-chains in the example; their occupancies are tied to fv(2) (for the atoms in component [PART] 1) and to 1-fv(2) for the atoms in component 2 for Glu_12, and similarly fv(4) and 1-fv(4) for Cys_38. This ensures that the sum of occupancies for both components is held at unity. ’21.0’ is interpreted as 1.0 times fv(2), and –21.0 as 1.0 times [1-fv(2)]. This notation is not very intuitive, but it is concise and very flexible. A common example is the use of a single free variable to describe the occupancies of all the atoms in both components of a disordered sidechain, e.g.
CB 1 ... ... ... 31 ...
OG 4 ... ... ... 31 ...
CB 1 ... ... ... -31 ...
OG 4 ... ... ... -31 ...
For a disordered serine. The starting value of the occupancy p is given as the third FVAR parameter, the two components will be assigned occupancies p and 1-p. Note that it is desirable to split CB even if no splitting can be seen in the maps so that when hydrogens are added later with e.g.
HFIX_SER 23 CB
(before the first atom) the correct disordered hydrogens will be generated fully automatically. If there are three or more disorder components, then each of the common occupancies must be assigned to a separate free variable (e.g. as 51, 61 and 71), and their sum can be restrained to unity by the use of a SUMP restraint, e.g.:
SUMP 1 0.01 1 5 1 6 1 7
Free variables may also be used in DFIX and CHIV restraints. Thus
CHIV_PRO 31 CA
would cause the chiral volumes of all proline CA atoms to be restrained to free variable number 3, which itself is allowed to refine. In this way reasonable geometrical restraints can be applied even when the target values are unknown. By restraining distances to be equal to a free variable using DFIX, a standard deviation of the mean distance may be calculated rigorously using full-matrix least-squares algebra.
SHELXL provides facilities for refining against data from merohedral, pseudo-merohedral, and non-merohedral twins (Herbst-Irmer & Sheldrick, 1998). Refinement against data from merohedrally twinned crystals is particularly straightforward, requiring only the twin law (a 3x3 matrix) and starting values for the volume fractions of the twin components. Failure to recognize such twinning not only results in high R-factors and poor quality maps, it can also lead to incorrect biochemical conclusions (Luecke, Richter & Lanyi, 1998). Twinning can often be detected by statistical tests (Yeates & Fam, 1999), and it is probably much more widespread in macromolecular crystals than is generally appreciated!
No changes are needed to the .hkl file for merohedral twinning, but the data should be merged in the lower of the two relevant Laue groups). For non-merohedral twinning a special (‘HKLF 5’) format is required for the intensity data file.
Unstable refinements and other problems
However much care is taken in setting up a refinement, it can happen that the refinement becomes unstable and diverges. Usually the program detects this in time but in extreme cases, especially when full-matrix refinement is performed with a poorly conditioned matrix, it can crash. It is much more difficult to identify the cause of such problem when a large number of changes have been made in updating a .res file to the .ins file for the next job, so it is often more effective to improve the model in small steps. The .lst file contains a great deal of useful diagnostic information (which can be increased by using MORE 3); however the best place to start looking for problems is the list of ‘disagreeable restraints’; these often pinpoint the atoms or restraints that need changing. Also the presence of unrestrained atoms (which are commented on by the program) is a common cause of instability. In general, the more parameters that are refined, the less stable the refinement becomes; typical examples are the inclusion of dubious solvent water molecules or making all atoms anisotropic when there are not enough data.
Anti-bumping restraints are very useful in maintaining a chemically sensible structure, especially at lower resolution, but can also set traps for the unwary. For example if two atoms that should be bonded are too far apart for the program to include them automatically in the connectivity array, an anti-bumping restraint may be generated automatically to push them apart and this will fight against a DFIX or DANG restraint that is trying to bring them together! The remedy is to join the two atoms by hand so that they are bonded in the connectivity array, e.g.
BIND CB_23 CG_23
Even if the side-chain of residue 23 in this example is disordered and the bond is only broken in one component, this will have the desired effect. An incorrect connectivity can also affect the operation of a CHIV instruction (which requires the specified atom to be bonded to three and only three non-hydrogen atoms) and the automatic generation of hydrogen atoms (HFIX). Superfluous bonds may be removed from the connectivity array using e.g.
FREE CB_23 CD_23
Usually if the connectivity array (included in the .lst file except for MORE 0) is correct, the restraints will ensure that a sensible geometry is obtained during the refinement.
A common problem is that if the automatic hydrogen atom placement puts two hydrogen atoms in the same hydrogen bond, they will repel each other with the anitbumping restraint and because of the way the riding model works, this can severely distort the structure. The remedy in not to include the O-H and histidine N-H hydgogens in the refinement, they usually contribute little to the total scattering anyway.
Obtaining the SHELX programs
SHELXC/D/E and test data may be downloaded from the SHELX fileserver. Users should register online at http://shelx.uni-ac.gwdg.de/SHELX/ . Downloading instructions will then be emailed. The programs are free to academics but a small license fee is required for 'for-profit' use.
Installing of the multiprocessor version on a Mac
The mp version of SHELXL runs on all 16 processors of a Mac (two quad core with hyperthreading). In a test case, the refinement with total processor time of 70.7 seconds was finished within less than six seconds:-)
The following packages need to be installed before the compilation:
- XCode 312_2621_developerdvd.dmg (downloaded from apple - 996 MB)
- Intel fortran compiler Professional 31 day evaluation version)
- m_cprof_p_11.0.059.dmg (downloaded from intel - 343 MB)
- m_cprof_ifort_redist_p_11.0.059.dmg (downloaded from intel - 20,3 MB)
the compilation works smoothly, but instead of -static flag, it is necessary to use a -static-intel flag. A 64 bit compilation is invoked with:
ifort -axPT -openmp -ip -static-intel shelxh_omp.f shelxlv_omp.f -o shelxl_omp.64bit
Update 6/2010: Problems exist with Xcode 3.2.2 . The workaround is to add the -use-asm flag. See http://software.intel.com/en-us/articles/intel-fortran-for-mac-os-x-incompatible-with-xcode-322/
References and other sources of information
Sheldrick, G.M. (2008). "A short history of SHELX", Acta Crystallogr. D64, 112-122 [Standard reference for all SHELX... programs].
Gruene, T. et al. (2014). "Refinement of Macromolecular Structures against Neutron Data with SHELXL-2013". J. Appl. Cryst.. 47, 462-466 [Reference for refinement against neutron data and for hydrogen restraints].
Sheldrick, G.M. & Schneider, T.R. (1997). Methods Enzymol. 277, 319-343 [Macromolecular refinement with SHELXL].
The following additional sources of information may be found via the SHELX homepage (http://shelx.uni-ac.gwdg.de/SHELX): "SHELX-97 Manual as PDF", "Mini-protein refinement tutorial". "P1-Lysozyme refinement tutorial", "Thomas Schneider's FAQs" and "FAQs: Macromolecules"