Refinement: Difference between revisions

From CCP4 wiki
Jump to navigation Jump to search
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Theory ==
== Theory ==
For now, see http://www.usm.maine.edu/~rhodes/ModQual/index.html#RefineXray
See [[ccp4dev:Refinement]].
For an explanation of terms, see http://www.usm.maine.edu/~rhodes/ModQual/index.html#RefineXray


== Programs ==
== Programs ==
* Refmac
* [[ccp4dev:Refinement_with_Refmac5|Refmac]]
* [[CNS]]
* [[CNS]]
* [[PHENIX|phenix.refine]]
* [[PHENIX|phenix.refine]]
* [[SHELXL]]
* [http://www.globalphasing.com/buster/ Buster]
== restraints for ligands ==
All refinement programs come with a set of ligands known to them, i.e. the files describing the topology and parameters of these ligands are part of the distribution. Both Refmac and phenix.refine use one large file called mon_lib_list.cif . CNS uses files in the $CNS_TOPPAR directory.
If you have a ligand that is unknown to the refinement program, you could either
* identify a similar ligand among the known ones and modify it
* use the [http://davapc1.bioch.dundee.ac.uk/prodrg/index.html PRODRG server] to obtain the ligand description
* use [http://xray.bmc.uu.se/hicup G. Kleywegt's HIC-Up] to obtain the ligand description
* try to identify the ligand in the list of chemical compounds occurring in [http://www.rcsb.org PDB] files, at http://www.wwpdb.org/ccd.html - maybe it is known under a different name than you thought, and you just have to adjust your PDB file


== what can go wrong in refinement? ==
== what can go wrong in refinement? ==
Line 15: Line 27:
* [[twinning]] (this happens more often than you'd like, see [http://dx.doi.org/10.1107/S0907444905036759 A. A. Lebedev, A. A. Vagin and G. N. Murshudov (2006) Intensity statistics in twinned crystals with examples from the PDB. Acta Cryst. D62, 83-95])
* [[twinning]] (this happens more often than you'd like, see [http://dx.doi.org/10.1107/S0907444905036759 A. A. Lebedev, A. A. Vagin and G. N. Murshudov (2006) Intensity statistics in twinned crystals with examples from the PDB. Acta Cryst. D62, 83-95])
* bad data - check the statistics of the data reduction program  
* bad data - check the statistics of the data reduction program  
* model incomplete or wrong
* model incomplete or wrong: remove suspicious parts (or just give them an occupancy of 0), refine everything else, and check whether these parts re-appear in the map.
* other refinement options should be exploited, e.g. try TLS refinement
* other refinement options should be exploited, e.g. try TLS refinement


Line 23: Line 35:
=== R_free much higher than R ===
=== R_free much higher than R ===


=== how large should the difference between R_free and R be? ===  
==== how large should the difference between R_free and R be? ====


For now, see [http://www.ncbi.nlm.nih.gov/pubmed/11937051 Kleywegt GJ, Jones TA."Homo crystallographicus--quo vadis?" Structure. 2002 Apr;10(4):465-72.]
see [[R-factors#Relation_between_R_and_Rfree_as_a_function_of_resolution]]


=== Wrong space group ===
=== Wrong space group ===
Line 31: Line 43:
Sometimes crystal symmetry combines with non-crystallographic symmetry (NCS) and produces a diffraction pattern resembling higher symmetry space group than what you really have.  NCS in this case closely resembles crystallographic symmetry.  If resolution is not high enough, the difference in spot positions may be too small to give any detectable problems with indexing, integration and scaling.  Even phasing (e.g. molecular replacement) may be successful.  But if your R-factor hangs fairly high and you have problems building parts of your structure, it is worth trying to check other space groups.  The most straightforward approach is to try processing data in P1, because if that does not bring R-factor down significantly, other space group choices will not solve the problem either.
Sometimes crystal symmetry combines with non-crystallographic symmetry (NCS) and produces a diffraction pattern resembling higher symmetry space group than what you really have.  NCS in this case closely resembles crystallographic symmetry.  If resolution is not high enough, the difference in spot positions may be too small to give any detectable problems with indexing, integration and scaling.  Even phasing (e.g. molecular replacement) may be successful.  But if your R-factor hangs fairly high and you have problems building parts of your structure, it is worth trying to check other space groups.  The most straightforward approach is to try processing data in P1, because if that does not bring R-factor down significantly, other space group choices will not solve the problem either.


This occurs most often at moderate resolution.  However,  [http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pbio.0040099 the structure of the ketosteroid isomerase] had to be refined in P1 at atomic resolution, although it refines well in C2221 at lower resolution such as 1.5A.
This occurs most often at moderate resolution.  However,  [https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0040099 the structure of the ketosteroid isomerase] had to be refined in P1 at atomic resolution, although it refines well in C2221 at lower resolution such as 1.5A.


=== Refining low resolution structures ===
=== Refining low resolution structures ===


Maintaining the secondary structure of your model when refining against weak data can be really challenging.
Maintaining the secondary structure of your model when refining against weak data can be really challenging. When building manually, you may end up with a fairly large number of [[Ramachandran plot]] outliers.
There are some options, but in the end you might have to accept a fairly large number of [[Ramachandran plot]] outliers.


Try [[PHENIX|phenix.refine]] with the keyword "discard_psi_phi=False". Then the psi and phi dihedral angles should be restrained according to the CCP4 monomer library definitions. There was a [http://www.phenix-online.org/pipermail/phenixbb/2007-July/000357.html discussion of it in the phenixbb in July 2007]. Also see the [http://www.dl.ac.uk/list-archive-public/ccp4bb/msg19554.html discussion in the ccp4bb from December 2006].
Try [[PHENIX|phenix.refine]] with the keyword "discard_psi_phi=False". Then the psi and phi dihedral angles should be restrained according to the CCP4 monomer library definitions. There was a [http://www.phenix-online.org/pipermail/phenixbb/2007-July/000357.html discussion of it in the phenixbb in July 2007]. Also see the [http://www.dl.ac.uk/list-archive-public/ccp4bb/msg19554.html discussion in the ccp4bb from December 2006].
Line 42: Line 53:
Remember that [[Ramachandran plot|phi-psi angles]] are excellent for [[validation]] purposes but only when they are unrestrained. If you restrain them, you lose this option!
Remember that [[Ramachandran plot|phi-psi angles]] are excellent for [[validation]] purposes but only when they are unrestrained. If you restrain them, you lose this option!


You can also try restraining alpha-helices hydrogen bonding, and beta-sheet cross-strand hydrogen bonds. This can be done in [[REFMAC]], [[PHENIX|phenix.refine]] and [[CNS]] (it's documented for all of them).  
You can also try restraining alpha-helices hydrogen bonding, and beta-sheet cross-strand hydrogen bonds. This can be done in [[REFMAC]] (using ProSMART) and [[PHENIX|phenix.refine]] (using a reference model).  


If you are really desperate, another option could be to use harmonic restraints in [[CNS]] to keep your backbone fairly fixed in parts of the map where you believe the secondary structure is correct (most likely alpha-helices). You could also fix main-chain elements completely (in any refinement program), but it is definitely preferable to leave some room for change in the xyz positions, and harmonic restraints are a nice way of doing exactly that.
If you are really desperate, another option could be to use harmonic restraints in [[CNS]] to keep your backbone fairly fixed in parts of the map where you believe the secondary structure is correct (most likely alpha-helices). You could also fix main-chain elements completely (in any refinement program), but it is definitely preferable to leave some room for change in the xyz positions, and harmonic restraints are a nice way of doing exactly that.
=== Bulk solvent correction produces difference density ===
Sometimes people observe strong residual difference density in a cavity of the protein. E.g. there was a paper by Brian Matthews' group (Marcus D. Collins, Michael L. Quillin, Gerhard Hummer, Brian W. Matthews, Sol M. Gruner, Structural Rigidity of a Large Cavity-containing Protein Revealed by High-pressure Crystallography, Journal of Molecular Biology, Volume 367, Issue 3, 30 March 2007, Pages 752-763, [http://dx.doi.org/10.1016/j.jmb.2006.12.021]) on a high pressure form of lysozyme where they found a large hydrophobic void. Bulk water could only be compelled to enter the void by application of very high external pressure.
Bulk solvent mask artifacts can only occur at narrow channels, where the mask radius is too big to define the channel as belonging to the bulk solvent region, leaving it "empty" and thus resulting in ''negative'' difference density.
The following advice is specific for [[ccp4dev:Refinement_with_Refmac5|Refmac]]: Changing from simple scaling to Babinet scaling is an important check to exclude mask bulk solvent artifacts, but there, you have to uncheck the "calculate contribution from the solvent region", because this is done by the Babinet scaling, already.
Alternatively, you can optimise the solvent mask parameters by running Refmac with the keyword "solvent optimise". This will write out R and R-free for different combinations of VDW probe, ion probe, and shrinkage sizes. For subsequent Refmac runs you can use the keywords "solvent vdwprobe $VDWPROBE ionprobe $IONPROBE rshrink $RSHRINK" replacing
$VDWPROBE, $IONPROBE, and $RSHRINK with the optimal values from the previous optimisation or you can set these values in the GUI.
If the peaks remain, try gradually reducing the size of the VDW probe.
For [[Phenix|phenix.refine]], the bulk solvent mask may be optimized using "phenix.refine data.hkl model.pdb optimize_mask=true" - see [http://www.phenix-online.org/documentation/refinement.htm].
In the case of ''negative'' difference density in a big hydrophobic cavity, one possible reason for a negative difference density are underestimated magnitudes of |Fobs| at very low resolution, either because they are weakened by the beam-stop (half-)shadow, or because they are overloads that have been poorly extrapolated. A simple check for wrongly determined low-resolution |Fobs| is to cut your low resolution data during refinement at a somewhat higher resolution, say 20 A instead of 80 A, and see whether the negative difference density disappears. If, yes, you should check your data processing again.
The other possibility of course is that the data is good, that this is an accurate experimental result and there really is a void, or at least a cavity where the mean bulk density is lower than in bulk water.  One way to test the void theory would be to fill the cavity with O atoms of zero (or very small, say 0.01) occupancy.  Hopefully (!) that will prevent Refmac filling the cavity with bulk solvent.  One could then try giving these O atoms large B factors, say 200, to smear them out, and then increase the occupancies to titrate the actual bulk density.
Since 2016, so-called [https://www.phenix-online.org/documentation/reference/polder.html Polder maps] in Phenix allow to calculate omit density without filling in water which may obscure a ligand.
== Model correctly placed, but difference density remains after refinement ==
# Fourier truncation ripples:
#* [http://www.ccp4.ac.uk/newsletters/newsletter42/content.html CCP4 Newsletter] "On the Fourier series truncation peaks at subatomic resolution" by Anne Bochow, Alexandre Urzhumtsev
#* Pages 52-55 here: [http://www.phenix-online.org/presentations/latest/pavel_validation.pdf]
#* Oliver Einsle, et al. Science, 1696 (2002) 297
#* Page 267 Figure 4 in Acta Cryst. (2004). D60, 260-274
# Try:
#* refine individual anisotropic ADP for these atoms (and isotropic for the rest);
#* refine occupancy;
#* define charge in input PDB file;
#* if it is anomalous scatterer use and refine f' and f''.

Latest revision as of 12:19, 8 January 2020

Theory[edit | edit source]

See ccp4dev:Refinement. For an explanation of terms, see http://www.usm.maine.edu/~rhodes/ModQual/index.html#RefineXray

Programs[edit | edit source]

restraints for ligands[edit | edit source]

All refinement programs come with a set of ligands known to them, i.e. the files describing the topology and parameters of these ligands are part of the distribution. Both Refmac and phenix.refine use one large file called mon_lib_list.cif . CNS uses files in the $CNS_TOPPAR directory.

If you have a ligand that is unknown to the refinement program, you could either

  • identify a similar ligand among the known ones and modify it
  • use the PRODRG server to obtain the ligand description
  • use G. Kleywegt's HIC-Up to obtain the ligand description
  • try to identify the ligand in the list of chemical compounds occurring in PDB files, at http://www.wwpdb.org/ccd.html - maybe it is known under a different name than you thought, and you just have to adjust your PDB file

what can go wrong in refinement?[edit | edit source]

R-factor does not go down[edit | edit source]

If this happens in the R-factor range of 30-40, here are a couple of possible reasons:

help, my protein has high B-factors![edit | edit source]

This is also a FAQ on CCP4BB. The answer is: there's probably nothing wrong with it. If your crystals diffract to 3 A at a synchrotron, then the average B-factor should most likely be on the order of 100 A^2. If your crystals diffract to 2 A, then the average B-factor is most likely on the order of 40 A^2 or so. Use B. Rupp's calculator ([1]) to find out the dependance of scattering power on B-factor.

R_free much higher than R[edit | edit source]

how large should the difference between R_free and R be?[edit | edit source]

see R-factors#Relation_between_R_and_Rfree_as_a_function_of_resolution

Wrong space group[edit | edit source]

Sometimes crystal symmetry combines with non-crystallographic symmetry (NCS) and produces a diffraction pattern resembling higher symmetry space group than what you really have. NCS in this case closely resembles crystallographic symmetry. If resolution is not high enough, the difference in spot positions may be too small to give any detectable problems with indexing, integration and scaling. Even phasing (e.g. molecular replacement) may be successful. But if your R-factor hangs fairly high and you have problems building parts of your structure, it is worth trying to check other space groups. The most straightforward approach is to try processing data in P1, because if that does not bring R-factor down significantly, other space group choices will not solve the problem either.

This occurs most often at moderate resolution. However, the structure of the ketosteroid isomerase had to be refined in P1 at atomic resolution, although it refines well in C2221 at lower resolution such as 1.5A.

Refining low resolution structures[edit | edit source]

Maintaining the secondary structure of your model when refining against weak data can be really challenging. When building manually, you may end up with a fairly large number of Ramachandran plot outliers.

Try phenix.refine with the keyword "discard_psi_phi=False". Then the psi and phi dihedral angles should be restrained according to the CCP4 monomer library definitions. There was a discussion of it in the phenixbb in July 2007. Also see the discussion in the ccp4bb from December 2006.

Remember that phi-psi angles are excellent for validation purposes but only when they are unrestrained. If you restrain them, you lose this option!

You can also try restraining alpha-helices hydrogen bonding, and beta-sheet cross-strand hydrogen bonds. This can be done in REFMAC (using ProSMART) and phenix.refine (using a reference model).

If you are really desperate, another option could be to use harmonic restraints in CNS to keep your backbone fairly fixed in parts of the map where you believe the secondary structure is correct (most likely alpha-helices). You could also fix main-chain elements completely (in any refinement program), but it is definitely preferable to leave some room for change in the xyz positions, and harmonic restraints are a nice way of doing exactly that.

Bulk solvent correction produces difference density[edit | edit source]

Sometimes people observe strong residual difference density in a cavity of the protein. E.g. there was a paper by Brian Matthews' group (Marcus D. Collins, Michael L. Quillin, Gerhard Hummer, Brian W. Matthews, Sol M. Gruner, Structural Rigidity of a Large Cavity-containing Protein Revealed by High-pressure Crystallography, Journal of Molecular Biology, Volume 367, Issue 3, 30 March 2007, Pages 752-763, [2]) on a high pressure form of lysozyme where they found a large hydrophobic void. Bulk water could only be compelled to enter the void by application of very high external pressure.

Bulk solvent mask artifacts can only occur at narrow channels, where the mask radius is too big to define the channel as belonging to the bulk solvent region, leaving it "empty" and thus resulting in negative difference density.

The following advice is specific for Refmac: Changing from simple scaling to Babinet scaling is an important check to exclude mask bulk solvent artifacts, but there, you have to uncheck the "calculate contribution from the solvent region", because this is done by the Babinet scaling, already. Alternatively, you can optimise the solvent mask parameters by running Refmac with the keyword "solvent optimise". This will write out R and R-free for different combinations of VDW probe, ion probe, and shrinkage sizes. For subsequent Refmac runs you can use the keywords "solvent vdwprobe $VDWPROBE ionprobe $IONPROBE rshrink $RSHRINK" replacing $VDWPROBE, $IONPROBE, and $RSHRINK with the optimal values from the previous optimisation or you can set these values in the GUI. If the peaks remain, try gradually reducing the size of the VDW probe.

For phenix.refine, the bulk solvent mask may be optimized using "phenix.refine data.hkl model.pdb optimize_mask=true" - see [3].

In the case of negative difference density in a big hydrophobic cavity, one possible reason for a negative difference density are underestimated magnitudes of |Fobs| at very low resolution, either because they are weakened by the beam-stop (half-)shadow, or because they are overloads that have been poorly extrapolated. A simple check for wrongly determined low-resolution |Fobs| is to cut your low resolution data during refinement at a somewhat higher resolution, say 20 A instead of 80 A, and see whether the negative difference density disappears. If, yes, you should check your data processing again.

The other possibility of course is that the data is good, that this is an accurate experimental result and there really is a void, or at least a cavity where the mean bulk density is lower than in bulk water. One way to test the void theory would be to fill the cavity with O atoms of zero (or very small, say 0.01) occupancy. Hopefully (!) that will prevent Refmac filling the cavity with bulk solvent. One could then try giving these O atoms large B factors, say 200, to smear them out, and then increase the occupancies to titrate the actual bulk density.

Since 2016, so-called Polder maps in Phenix allow to calculate omit density without filling in water which may obscure a ligand.

Model correctly placed, but difference density remains after refinement[edit | edit source]

  1. Fourier truncation ripples:
    • CCP4 Newsletter "On the Fourier series truncation peaks at subatomic resolution" by Anne Bochow, Alexandre Urzhumtsev
    • Pages 52-55 here: [4]
    • Oliver Einsle, et al. Science, 1696 (2002) 297
    • Page 267 Figure 4 in Acta Cryst. (2004). D60, 260-274
  1. Try:
    • refine individual anisotropic ADP for these atoms (and isotropic for the rest);
    • refine occupancy;
    • define charge in input PDB file;
    • if it is anomalous scatterer use and refine f' and f.