SHELXL: Difference between revisions

From CCP4 wiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 1: Line 1:
== Refinement of proteins (e.g. to get standard uncertainties on distances) ==
 
== Refinement of macromolecules with SHELXL ==
 
SHELXL is a very general crystal structure refinement program that is equally suitable for the refinement of minerals, organometallic structures, oligonucleotides, or proteins (or any mixture thereof) against X-ray or neutron single (or twinned!) crystal data. The price of this generality is that it is somewhat slower than programs specifically written only for protein structure refinement. Any protein- (or DNA-) specific information must be input to SHELXL by the user in the form of refinement restraints, etc. Refinement of macromolecules using SHELXL has been discussed by Sheldrick & Schneider (1997).<br>
 
Despite this generality, it must be emphasized that SHELXL is not suitable for refinements at resolutions lower than about 2.0Å because, unlike [[Refmac]] and [[phenux.refne]], it does not provide (side-chain) torsion angle restraints, and that a least-squares refinement program such as SHELXL will suffer more from model bias than a program based on maximum likelihood. Also the Babinet bulk solvent model used in SHELXL is in need of improvement. Thus almost always the initial refinement will have been performed with another program and SHELXL will be used for the final refinement, perhaps involving extension to very high resolution, modeling of disorder, anisotropic refinement and the least-squares estimation of parameter errors. Thus the starting point for a SHELXL refinement will usually be a PDB format file from the previous refinement. Even when SHELXL is used for the refinement of a twinned structure at lower resolution, the starting model is likely to be in the form of a PDB file from a molecular replacement solution.<br>
 
 
== Input files for SHELXL ==
 
SHELXL usually requires two input files: an .ins file containing crystal data, instructions and atoms, and an .hkl file containing h, k, l, F<sup>2</sup> and &sigma;(F<sup>2</sup>) in fixed ‘HKLF 4’ format [F and &sigma;(F) may also be used and require the instruction ‘HKLF 3’]. The .ins file will usually be generated from a PDB format file using the ‘I’ option in SHELXPRO. This sets up the TITL...UNIT instructions followed by standard refinement instructions, restraints, instructions for generating hydrogen atoms (commented out until needed) and atoms in crystal coordinates. For residues other than the 20 standard amino-acids, suitable restraints (see below) must be added by hand (DNA and RNA restraints are provided in files on the SHELX ftp site). The ‘I’ option in SHELXPRO provides a way of renumbering the residues; since SHELXL does not (currently) recognize chain identifiers, chains must be emulated by (for example) adding 1000, 2000 etc. to the residue numbers. SHELXPRO can also perform the reverse operation when preparing a PDB file for deposition (the ‘B’ option). After each refinement job, the output .res file is edited or renamed to a new .ins file that serves as the input for the next refinement job. The updating of the .res file to .ins may also be performed by ‘U’ option in SHELXPRO.<br>
 
The .hkl file may be generated directly by the data reduction programs or by CCP4. It is not necessary to sort the data, eliminate systematic absences or merge equivalents, SHELXL can do this anyway. If it is desired to refine (using complex scattering factors) against separate F<sup>2</sup>-values for h,k,l and –h,-k,-l some care is needed; there are problems using data processing software (such as CCP4) that does not keep these measurements separate, and ‘MERG 2’ must be specified in the .ins file to prevent SHELXL from merging the Friedel opposites (and setting all f” values to zero). A further problem on continuing a refinement started with another program is to ensure consistent flagging of the free-R reflections. For this reason it is strongly recommended that Tim Gr&uuml;ne's program [[mtz2hkl]] (available from the SHELX download site) is used for this conversion. The Bruker XPREP program provides general facilities for setting Rfree flags and for transferring and extending free-R flags consistently from one reflection file to another taking space group symmetry into account. When twinning or NCS are present, it is better to flag thin resolution shells, otherwise random reflections should be flagged.
 
 
== Output files for SHELXL ==
 
SHELXL writes a updated parameter file with the extension .res in the same format as the input .ins file, a .pdb file with the new atom coordinates (unfortunately one has to add the space group to the CRYST1 record before Coot can read this file) and an output .fcf file containing phased reflection data in CIF format. This file can be used for depositing the reflection data with the PDB, and both the .res and the .fcf file can be read by Coot to enable the refined atoms and &sigma;<sub>A</sub> weighted maps to be displayed directly. <br?
 
 
== Constraints and restraints ==
 
In refining macromolecular structures, it is almost always necessary to supplement the diffraction data with chemical information in the form of '''''restraints'''''. A typical restraint is the condition that a bond length should approximate to a target value with a given estimated standard deviation; restraints are treated as extra experimental data items. Even if the crystal diffracts to 1.0Å, there may well be poorly defined disordered regions for which restraints are essential to obtain a chemically sensible model (the same can be true of small molecules too!).<br>
 
For some purposes (e.g., riding hydrogen atoms, rigid group refinement, or occupancies of atoms in disordered side-chains), '''''constraints''''', exact conditions that lead to a reduction in the number of variable parameters, may be more appropriate than restraints; SHELXL allows such constraints and restraints to be mixed freely, i.e. an atoms may be simultaneously subject to several different constraints and restraints. Riding hydrogen atoms (set using HFIX or AFIX instructions) are defined such that the C-H vector remains constant in magnitude and direction, but the carbon atom is free to move; the same shifts are applied to both atoms, and both atoms contribute to the least-squares derivative sums. This model may be combined with anti-bumping restraints that involve hydrogen atoms, which helps to avoid unfavorable side-chain conformations. SHELXL also provides, e.g., methyl groups that can rotate about their local three-fold axes; for small molecules the initial torsion angle may be found using a difference electron density synthesis calculated around the circle of possible hydrogen positions (HFIX 137). In macromolecules, methyl groups are rarely so well defined, so a staggered riding model is usually better (HFIX 33). <br>
 
Restraints and constraints provide good examples of the way in which individual residues can be referenced by SHELXL. For example, <br>
 
<b>ANIS_* FE SG SD </b><br>
   
makes atoms called FE, SD and SG in any residue anisotropic;<br>
 
<b>DFIX_1 C1 N 1.329 </b><br>
 
restrains a specific bond length (for the N-terminal formyl group). Note that when no esd is given, the default (here 0.2Å from the DEFS instruction) is assumed.<br>
 
<b>DFIX_ALA 1.525 C CA </b><br>
 
restrains the C-CA bond in all alanine residues. <br>
 
<b>SADI_54 0.04 FE SG_6 FE SG_9 FE SG_39 FE SG_42</b><br>   
 
restrains the bond lengths in the FeS<sub>4</sub> unit to be equal, but without a target value, with an esd of 0.04Å. The central iron atom is in residue number 54 and the four cystein sulfurs are all in different residues.<br>
 
<b>FLAT_* 0.3 O_- CA_- N C_- CA</b><br>   
 
restrains N and CA of each amino-acid and O, CA and C of the  preceding residue to lie in a plane with a relatively large esd (0.3) (peptide planarity).





Revision as of 22:55, 14 March 2008

Refinement of macromolecules with SHELXL

SHELXL is a very general crystal structure refinement program that is equally suitable for the refinement of minerals, organometallic structures, oligonucleotides, or proteins (or any mixture thereof) against X-ray or neutron single (or twinned!) crystal data. The price of this generality is that it is somewhat slower than programs specifically written only for protein structure refinement. Any protein- (or DNA-) specific information must be input to SHELXL by the user in the form of refinement restraints, etc. Refinement of macromolecules using SHELXL has been discussed by Sheldrick & Schneider (1997).

Despite this generality, it must be emphasized that SHELXL is not suitable for refinements at resolutions lower than about 2.0Å because, unlike Refmac and phenux.refne, it does not provide (side-chain) torsion angle restraints, and that a least-squares refinement program such as SHELXL will suffer more from model bias than a program based on maximum likelihood. Also the Babinet bulk solvent model used in SHELXL is in need of improvement. Thus almost always the initial refinement will have been performed with another program and SHELXL will be used for the final refinement, perhaps involving extension to very high resolution, modeling of disorder, anisotropic refinement and the least-squares estimation of parameter errors. Thus the starting point for a SHELXL refinement will usually be a PDB format file from the previous refinement. Even when SHELXL is used for the refinement of a twinned structure at lower resolution, the starting model is likely to be in the form of a PDB file from a molecular replacement solution.


Input files for SHELXL

SHELXL usually requires two input files: an .ins file containing crystal data, instructions and atoms, and an .hkl file containing h, k, l, F2 and σ(F2) in fixed ‘HKLF 4’ format [F and σ(F) may also be used and require the instruction ‘HKLF 3’]. The .ins file will usually be generated from a PDB format file using the ‘I’ option in SHELXPRO. This sets up the TITL...UNIT instructions followed by standard refinement instructions, restraints, instructions for generating hydrogen atoms (commented out until needed) and atoms in crystal coordinates. For residues other than the 20 standard amino-acids, suitable restraints (see below) must be added by hand (DNA and RNA restraints are provided in files on the SHELX ftp site). The ‘I’ option in SHELXPRO provides a way of renumbering the residues; since SHELXL does not (currently) recognize chain identifiers, chains must be emulated by (for example) adding 1000, 2000 etc. to the residue numbers. SHELXPRO can also perform the reverse operation when preparing a PDB file for deposition (the ‘B’ option). After each refinement job, the output .res file is edited or renamed to a new .ins file that serves as the input for the next refinement job. The updating of the .res file to .ins may also be performed by ‘U’ option in SHELXPRO.

The .hkl file may be generated directly by the data reduction programs or by CCP4. It is not necessary to sort the data, eliminate systematic absences or merge equivalents, SHELXL can do this anyway. If it is desired to refine (using complex scattering factors) against separate F2-values for h,k,l and –h,-k,-l some care is needed; there are problems using data processing software (such as CCP4) that does not keep these measurements separate, and ‘MERG 2’ must be specified in the .ins file to prevent SHELXL from merging the Friedel opposites (and setting all f” values to zero). A further problem on continuing a refinement started with another program is to ensure consistent flagging of the free-R reflections. For this reason it is strongly recommended that Tim Grüne's program mtz2hkl (available from the SHELX download site) is used for this conversion. The Bruker XPREP program provides general facilities for setting Rfree flags and for transferring and extending free-R flags consistently from one reflection file to another taking space group symmetry into account. When twinning or NCS are present, it is better to flag thin resolution shells, otherwise random reflections should be flagged.


Output files for SHELXL

SHELXL writes a updated parameter file with the extension .res in the same format as the input .ins file, a .pdb file with the new atom coordinates (unfortunately one has to add the space group to the CRYST1 record before Coot can read this file) and an output .fcf file containing phased reflection data in CIF format. This file can be used for depositing the reflection data with the PDB, and both the .res and the .fcf file can be read by Coot to enable the refined atoms and σA weighted maps to be displayed directly. <br?


Constraints and restraints

In refining macromolecular structures, it is almost always necessary to supplement the diffraction data with chemical information in the form of restraints. A typical restraint is the condition that a bond length should approximate to a target value with a given estimated standard deviation; restraints are treated as extra experimental data items. Even if the crystal diffracts to 1.0Å, there may well be poorly defined disordered regions for which restraints are essential to obtain a chemically sensible model (the same can be true of small molecules too!).

For some purposes (e.g., riding hydrogen atoms, rigid group refinement, or occupancies of atoms in disordered side-chains), constraints, exact conditions that lead to a reduction in the number of variable parameters, may be more appropriate than restraints; SHELXL allows such constraints and restraints to be mixed freely, i.e. an atoms may be simultaneously subject to several different constraints and restraints. Riding hydrogen atoms (set using HFIX or AFIX instructions) are defined such that the C-H vector remains constant in magnitude and direction, but the carbon atom is free to move; the same shifts are applied to both atoms, and both atoms contribute to the least-squares derivative sums. This model may be combined with anti-bumping restraints that involve hydrogen atoms, which helps to avoid unfavorable side-chain conformations. SHELXL also provides, e.g., methyl groups that can rotate about their local three-fold axes; for small molecules the initial torsion angle may be found using a difference electron density synthesis calculated around the circle of possible hydrogen positions (HFIX 137). In macromolecules, methyl groups are rarely so well defined, so a staggered riding model is usually better (HFIX 33).

Restraints and constraints provide good examples of the way in which individual residues can be referenced by SHELXL. For example,

ANIS_* FE SG SD

makes atoms called FE, SD and SG in any residue anisotropic;

DFIX_1 C1 N 1.329

restrains a specific bond length (for the N-terminal formyl group). Note that when no esd is given, the default (here 0.2Å from the DEFS instruction) is assumed.

DFIX_ALA 1.525 C CA

restrains the C-CA bond in all alanine residues.

SADI_54 0.04 FE SG_6 FE SG_9 FE SG_39 FE SG_42

restrains the bond lengths in the FeS4 unit to be equal, but without a target value, with an esd of 0.04Å. The central iron atom is in residue number 54 and the four cystein sulfurs are all in different residues.

FLAT_* 0.3 O_- CA_- N C_- CA

restrains N and CA of each amino-acid and O, CA and C of the preceding residue to lie in a plane with a relatively large esd (0.3) (peptide planarity).


  • go to http://shelx.uni-ac.gwdg.de/SHELX and read "SHELX-97 Manual as PDF", "Mini-protein refinement tutorial" as well as "P1-Lysozyme refinement tutorial", "Thomas Schneider's FAQs" and "FAQs: Macromolecules"
  • run the option "I" in shelxpro to obtain .ins file from .pdb file; a ligand etc. may require the "J" option or http://davapc1.bioch.dundee.ac.uk/programs/prodrg/ to get restraints in SHELX format
  • use "CGLS x y" refinement until convergence; the last run should be "CGLS x" only.
  • a final job to get standard uncertainties (s.u., formerly e.s.d.) on all geometric parameters (see Q21 in "FAQs: Macromolecules"):
    • change CGLS x y to REM CGLS x y
    • insert lines L.S. 1, DAMP 0 0 and BLOC 1 (or e.g. BLOC N_1 > LAST )
    • remove all restraints: lines begining with SIMU, DELU, ISOR, BUMP, DFIX, DANG, CHIV, FLAT and NCSY (from "Mini-protein refinement tutorial"). This is only useful for high-resolution work (let's say 1.4 A). Alternatively, one can determine standard uncertainties in the Bayesian sense that take all available knowledge into account by retaining all the restraints. This may be done at more modest resolution (say 2.5A or better). To obtain mean values and s.u. of e.g. distances or chiral volumes that occur several times in a structure, use DFIX or CHIV with "free variables". BOND, RTAB, HTAB and MPLA instructions may be needed to define the dependent parameters for which esds are required (from "FAQs: Macromolecules"). As an example, BIND FE_5001 NE2_123 together with BOND FE_5001 NE2_123 would enter the distance between FE_5001 and NE2_123 into the connectivity table, and would print out the distance and its s.u. into the .lst file.