From CCP4 wiki
Revision as of 06:25, 5 July 2012 by Kay (talk | contribs) (advanced usage)

PHENIX (Python-based Hierarchical ENvironment for Integrated Xtallography) is a software suite for the automated determination and refinement of macromolecular structures using X-ray crystallography and other methods. It integrates well with CCP4-formatted files for I/O, is highly automated, and very straightforward to use.

The suite (Phenix home page; documentation) has a GUI program (phenix) which can be used to run the programs, but they also work from the command line.

A short help, such as usage and options, is printed out by all PHENIX command line tools: just type phenix.TOOLNAME and hit Enter (or Return). Note that you can get a complete list of jiffies with


There is also version-specific documentation, e.g. documents development version 572.

The documentation below focuses on the non-GUI commandline tools and may not be complete, nor up-to-date or even correct.


Crystallographic data

phenix.xtriage - assessing data quality

phenix.explore_metric_symmetry - investigate different settings

phenix.explore_metric_symmetry --unit_cell=145,44,67,90,110.5,90 --space_group=C2 --other_unit_cell=67,44,136,90,96,90 --other_space_group=p2

The CCP4 equivalent is "othercell".

phenix.reflection_statistics - compare datasets

There may be one or two data files.

phenix.xmanip - structure factor file manipulations

phenix.model_vs_data - statistics

Just use "phenix.model_vs_data model.pdb data.hkl" where data.hkl is a reflection file in most of known formats. phenix.model_vs_data can output the map defined as:


Examples: 2mFo-DFc, 3.2Fo-2.3Fc, Fc, anom, fo-fc_kick.

So, if you say

phenix.model_vs_data model.pdb data.mtz --map=fc

you will get an MTZ file with desired structure factors.

phenix.model_vs_data model.pdb data.mtz --comprehensive=true

will list (among other things) map CC for all atoms or per residue.

PDB deposition: phenix.model_vs_data model.pdb data.mtz will give B-factor statistics. Look for lines like this in the output:

     ADP (min,max,mean):
       all           (136 atoms): 4.4    97.6   25.3  
       side chains   (48 atoms): 4.9    96.8   21.0  
       main chains   (64 atoms): 4.4    97.6   28.3  
       macromolecule (112 atoms): 4.4    97.6   25.2  
       ligands       (1 atoms): 6.6    6.6    6.6   
       solvent       (23 atoms): 8.8    44.1   26.8  
     mean bonded (Bi-Bj) : 27.91
     number_of_anisotropic            : 0      
     number_of_non_positive_definite  : 0

phenix.real_space_correlation - statistics

Like phenix.model_vs_data plus gives you more options and controls.

phenix.fmodel - calculate structure factors from model

phenix.cif_as_mtz - convert cif to mtz format


  • identifies suitable atom selections for TLS refinement
  • similar to TLSMD, but uses cross-validation to yield one unique solution


  • visualization of reciprocal-space reflection data (similar to 'hklview' in CCP4i)
  • 3D OpenGL view of all data, or 2D view of planes (pseudo-precession photograph)

Experimental phasing

phenix.autosol - experimental phasing "wizard"

phenix.autosol uses HYSS, SOLVE, Phaser, RESOLVE, xtriage and phenix.refine to solve a structure and generate experimental phases with the MAD, MIR, SIR, or SAD methods

phenix.phaser - SAD phasing with Phaser

Phaser can do SAD phasing - it is therefore called phaser_ep (ep stands for "experimental phasing"). The most recent version of Phaser is 2.3 in Phenix, and 2.1 in CCP4. Please note that the "Phaser" link of [1] in the sentence "consult the documentation for AutoSol or Phaser, or the Phaser WIKI" points to the 2.1 documentation. The keywords are concisely (but somewhat lightly) documented at [2]. A script documenting the following features

  1. using a PDB file (with origin-centered coordinates) as a heavy atom cluster template
  2. using two different substructure atomtypes (the cluster, and Fe)
  3. using a PDB file of a preliminary protein model, for finding sites
  4. using a MTZ file of a preliminary protein model, for finding sites
  5. using known sites (from e.g. SHELXD or HYSS)
  6. 3. and 5. are combined in this example

is shown below:

phenix.phaser <<eof

# name for output files:
ROOT XXXphaser


# file with F+ F- SIGF+ SIGF- from XDS/XDSCONV using filetype CCP4_F:
CRYSTAL unknown DATASET unknown &
        LABIN Fpos = F(+) SIGFpos = SIGF(+) Fneg = F(-) SIGFneg = SIGF(-)

# use a rough model of the protein to get phases:
PARTIAL PDB    rigid.1.pdb RMS 2.
# alternatively a MTZ file can be used, but the PDB should be preferred
# PARTIAL HKLIN  rigid.1.mtz RMS 2. 

# if sites are known, use them:
ATOM CRYSTAL unknown PDB knownsites.pdb
# the next keywords are documented at
# here they are commented out since the file knownsites.pdb is from an earlier phaser job.

# the W12 cluster was found in Hicup ( and put at the origin 
# using moleman2's "xyz cen" command (I don't know if this is necessary!)
CLUSTER PDB keg-cen.pdb
# the scatterer XX is predefined and refers to the cluster! Where is this documented ??
# FP and FDP are just guesses; fortunately FDP is refined
# however it is not documented what F0 of the cluster is!




Molecular replacement

phenix.automr - interface to Phaser and Resolve

This "wizard" provides an interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding


  • automated molecular replacement using incremental exploration, with support for parallelization


Officially documented in the phaserwiki. It can be run from the commandline (and can serve as a replacement for the CCP4 phaser which is an older version!) and by the Phaser-MR GUI which supports the fine-tuning of parameters.

If you run this:

phenix.phaser params.eff

it will use the Phenix-style configuration file, but if you just run "phenix.phaser" with no arguments (or a shell redirect from a file), it will use the CCP4-style keyword input.

This is an example of params.eff.

Another example, this time for just doing rigid-body refinement (say, for transferring a model to a crystal with slightly different cell parameters):

#!/bin/csh -f
rm rigid.1.pdb rigid.1.mtz
TITLE rigid body
ROOT rigid
HKLIN myprotein.mtz
# use a preliminary refined model
ENSEMBLE ensemble1 PDB mybestmodel.pdb IDENT 1.5

phenix.sculptor - automate selection and editing of molecular replacement (MR) models

phenix.ensembler - multiple superposition tool to automate construction of ensembles for MR


phenix.reel - restraints editor especially for ligands

phenix.elbow - electronic Ligand Builder and Optimisation Workbench

Model building and completion

phenix.autobuild - "wizard" for model rebuilding and completion

phenix.phase_and_build, phenix.build_one_model are fast ways to obtain results.

phenix.ligandfit - "wizard" carrying out fitting of flexible ligands to electron density maps

phenix.find_helices - rapid helix fitting to a map

phenix.fit_loops - fill short gaps using a loop library, and longer gaps (up to 15 residues) iteratively

phenix.assign_sequence - sequence assignment and linkage of neighboring segments


Refinement with phenix.refine

Example for use of phenix.refine

basic usage

phenix.refine model.pdb data.mtz

Here "data.mtz" is your reflection data file. PHENIX automatically recognizes most of the known file formats, so it can be MTZ, CNS or ...

advanced usage

phenix.refine model.pdb data.mtz strategy=rigid_body+individual_sites+individual_adp \
   simulated_annealing=true optimize_xyz_weight=true optimize_adp_weight=true main.number_of_macro_cycles=5 \

This will do the following:

  1. Rigid body refinement first cycle only (MZ protocol = VERY high convergence radius);
  2. Refinement of individual xyz and b-factors every cycle with optimized weights (info: optimize_xyz_weight=true optimize_adp_weight=true makes the program take longer!);
  3. Simulated annealing at 2nd and one before the last cycles;
  4. find (and remove if necessary) water molecules


If some ligand in model.pdb is unknown, phenix.refine will complain:

Sorry: Fatal problems interpreting PDB file:
 Number of atoms with unknown nonbonded energy type symbols: 18
 Please edit the PDB file to resolve the problems and/or supply a
 CIF file with matching restraint definitions, along with
 apply_cif_modification and apply_cif_link parameter definitions
 if necessary (see phenix.refine documentation).
 Also note that phenix.elbow is available to create restraint
 definitions for unknown ligands.

In that case, just running

phenix.elbow model.pdb --do-all --output=all_ligands

will produce all_ligands.cif, which may be fed to phenix.refine by

phenix.refine model.pdb data.mtz all_ligands.cif ...

If no PDB file for a ligand is available, its SMILES string should be input to phenix.elbow, and phenix.ready_set should run to generate the LINK records (e.g. for a non-natural amino acid that is part of the polypeptide chain), using phenix.elbow's CIF file.

Constraints and restraints in real and reciprocal space


Use phenix.ready_set to add hydrogens to your PDB file, and (except at ultra-high resolution) the riding hydrogen model in phenix.refine (this is the default so you do not have to specify anything). phenix.ready_set internally uses phenix.elbow for ligands and phenix.reduce for the protein. phenix.pdbtools can also add hydrogens (FIXME: what are the differences?). Hydrogens should not be used in NCS and TLS groups - it might be a good idea to add and not (element H or element D) to all selection strings. See the phenix.refine documentation.


Adding "occupancy" to the "strategy" options will refine the occupancies of those parts of the model that have alternate conformations.


occupancies {
      constrained_group {
        selection = "chain A and resseq 105 and altloc A"
        selection = "chain B and resseq 105 and altloc B"

Essentially, the above selection tells: "alternative conformation A of residue 105 in chain A is coupled with alternative conformation B of (NCS related) residue 105 in chain B". The sum of refined occupancies will be 1 in this case. It is essential that altlocs in both selections are different - this turn the non-bonded interaction off so the residues will get pushed apart.


  • Automatic detection of NCS groups:
phenix.refine data.hkl model.pdb main.ncs=True
  • Manual specification of NCS groups:
phenix.refine data.hkl model.pdb ncs_groups.params main.ncs=True

where ncs_groups.params contains e.g.:

refinement.ncs.restraint_group {
  reference = chain A 
  selection = chain B 
  selection = chain C
refinement.ncs.restraint_group {
  reference = chain E
  selection = chain F
  • switching to torsion-angle NCS:
  • switch off the restraints on NCS-related B-factors:

Secondary structure restraints

phenix.refine model.pdb data.mtz main.secondary_structure_restraints=true

You can find more information about secondary structure restraints in the PHENIX Newsletter (pages 12-17).

Low resolution refinement

Use an existing high resolution model (e.g. in a different spacegroup) for restraining the dihedrals:

  phenix.refine data.hkl model.pdb main.reference_model_restraints=True reference_model.file=reference.pdb

The behaviour can be modified with the keywords reference_model.limit (default 15 degrees) and reference_model.sigma (default probably 1 degrees - the current documentation says 1 Angstrom which is probably not right).

In the case where your working model has four chains (A, B, C, D) and your reference model has only chain A, the selections would look like this:

refinement.reference_model.reference_group {
    reference = chain A
    selection = chain A
refinement.reference_model.reference_group {
    reference = chain A
    selection = chain B
refinement.reference_model.reference_group {
    reference = chain A
    selection = chain C
refinement.reference_model.reference_group {
    reference = chain A
    selection = chain D

See the documentation.

DEN refinement (similar to what is in CNS)

DEN restraints can be activated in phenix.refine from the command-line with the current version and latest nightly builds, and they are the same deformable elastic network restraints available in CNS. PHENIX developers have been working closely with Axel Brunger and Gunnar Schroder to implement DEN in Phenix.

They have not yet officially announced the DEN restraints as they are still being tested and actively developed to get the implementation just right, and the parameterization is still very much in flux. It is hoped that by the next version it will become and a stable feature, and at that point DEN will be added as an option in the GUI.

These restraints have been shown to be particularly useful at low resolution, and there has been success in using at 4-5A and below. It is unclear how useful they would be at relatively high resolution (say 2.5A or higher), as there are other restraint methods that work well at that resolution range that are far less computationally intensive.

In almost all cases it is best to optimize the gamma and weight parameters, which is quite time intensive but is most likely to give the best results. Currently this can be parallelized, but only on cores that share memory. If you do optimize the gamma and weight parameters, you cannot simultaneously optimize B factor weights, which is another limitation that will be overcome in the future.

As soon as a stable version is announced in the context of a new release, documentation will be available.

To use DEN with the current release (1.7.3), you can use a parameterization such as this:

refinement {
 main {
  den_refinement = True
  number_of_macro_cycles = 1
  nproc = 8
 refine {
  strategy = *individual_sites individual_sites_real_space rigid_body \
             *individual_adp group_adp tls occupancies group_anomalous
 den {
  reference_file = reference.pdb
  optimize = True
  annealing_type = *torsion cartesian
  final_refinement_cycle = True


  • run your model through TLSMD server to identify TLS domains (it will produce PHENIX friendly TLS groups selections);

for example:

phenix.refine model.pdb data.hkl strategy=individual_sites+individual_adp+tls  tls_selections.def

with tls_selections.def something like:

refinement.refine {
 adp {
 tls = chain 'A'
 tls = chain 'B'
  • phenix.find_tls_groups now can find TLS groups automatically, and generate a tls_selections.def file.

Rigid body

example for file rigid_body.def defining 2 rigid bodies:

refinement.refine.sites {
 rigid_body = chain 'A' or chain 'B'
 rigid_body = chain 'L' or chain 'M'

Fix His/Asn/Gln sidechain orientations


 phenix.refine data.hkl model.pdb main.nqh_flips=True

to automatically flip these sidechains to make them better fit the density and/or hydrogen bonding pattern.

Using a reference model

A good idea if refinement is done at low resolution but a high resolution model is available.

phenix.refine data.hkl model.pdb main.reference_model_restraints=True \

Use reference_model.sigma=0.5 to tighten the restraints (default 1.0 Angstrom), and use reference_model.limit=30 to enlarge the limit (default 15 degrees) up to which the reference torsion angle will be used.

Real-space refinement

good writeup at . In short, use

phenix.refine model.pdb data.hkl fix_rotamers=true 

It would probably be a good idea to also use main.nqh_flips=True (but maybe this is already integrated into fix_rotamers=true ?)

Atom selection


phenix.refine model.pdb data.mtz refine.sites.individual="not (chain A and resseq 123:156)"


phenix.refine model.pdb data.mtz strategy=individual_adp adp.individual.iso="chain A and resseq 10:20"

The latter will refine only the B-factors of A10:A20 . It should be noted that the overall B-factor can change by ± a constant. This is because the trace of overall anisotropic scale matrix is subtracted from it and added to all atoms and to Bsol.

Another example:

sel = "chain A and resseq 123 and resname LIG and name C1 and altloc A"

where "resseq 123" and "resname LIG" are probably redundant.

Switching off specific interactions

  • In specific (rare !) situations one wants to exclude specific interactions. The pdb_interpretation.custom_nonbonded_symmetry_exclusion=<selection> command line keyword was designed for this purpose.
refinement.geometry_restraints.edits {
 zn_selection = chain X and resname ZN and resid 200 and name ZN
 his117_selection = chain X and resname HIS and resid 117 and name NE2
 bond {
   action = *add
   atom_selection_1 = $zn_selection
   atom_selection_2 = $his117_selection
   distance_ideal = 2.1
   sigma = 0.02
# use slack=None if you _want_ to restrain, use large slack if not
   slack = 1

Using dummy atoms to avoid bulk solvent to be filled in

Fill the space where the ligand is supposed to be with dummy atoms (DA), e.g. water, that all have zero occupancy. And when you run phenix.refine with those dummy atoms make sure you use "refinement.mask.ignore_zero_occupancy_atoms=False" keyword. Also, make sure you exclude the DA from coordinate (refine.sites.individual="not xxx") and ADP refinement (either refine.adp.individual="not xxx" or refine.adp.individual.isotropic="not xxx"), too.

You can use phenix.grow_density to generate dummy atoms in spheres of defined radius placed in defined points.

An experimental feature currently being worked on

If there is a significant amount of model missing you can try the undocumented option "use_statistical_model_for_missing_atoms=true" - you need the latest version for this. For some details see pages #17-19 in

Refinement with mmtbx.lockit

From RWGK's posting to phenixbb on Nov 14, 2010:

We have a tool for quick real-space refinement that's geared towards making the geometry ideal in the end. I'm not sure it is useful in your situation, but may be worth a try. It works like this:

mmtbx.lockit your.pdb your_refine_001_map_coeffs.mtz \
      map.coeff_labels.f=2FOFCWT,PH2FOFCWT \
      atom_selection='resname LIG'

It works in two stages. First it attempts to maximize the real-space weight allowing for a significant (but not totally unreasonable) distortion of the geometry. This is meant to move the ligand into the density. In the second stage it scales down the "best" real-space weight and runs a number of real-space refinements until the selected atoms do not move anymore. The expected result is nearly ideal geometry.

The procedure is usually very quick. If it turns out to be useful we could integrate it into phenix.refine, to be run after reciprocal-space refinement.

The mmtbx.lockit command is not as user-friendly as phenix.refine. It only works with mtz files, you have to manually specify the mtz labels, and the error messages may be unhelpful. Also be sure there is a valid CRYST1 card in your pdb file.


phenix.maps - a command line tool to compute various maps

Seems to have no specific documentation. Can do B-factor sharpening for improving low-resolution maps.

phenix.real_space_correlation - compute correlation between two maps

Can work with ensembles of structures. Seems to have no specific documentation. Can also calculate map CC for all atoms or per residue.


phenix.fobs_minus_fobs_map - calculate difference density

Seems to have no specific documentation.


phenix.grow_density - local density improvement

As originally described in Acta Cryst. (1997). D53, 540-543 (in development). There is a PDF file (or [3]) to explain some parameters of phenix.grow_density. It is very sketchy and may not be 100% up-to-date.

Defining several spheres where the DA (dummy atoms) are going to be placed is better than defining one large sphere, although it depends on the region size and shape. For example:

sphere {
  center = 21.698   7.730  33.974
  radius = 5
sphere {
  center = 23.483  10.877  35.583
  radius = 5


with output=xplor produces an X-PLOR style map. Adding a PDB file will result in a masked map.


computes various arrays such as Fcalc, Fmask, Fmodel, Fbulk, and more.


  • File with reflection data (Fobs or Iobs), R-free flags, and optionally HL coefficients. It can be in most of known formats and spread across multiple files;
  • label(s) selecting which reflection data arrays should be used (in case there are multiple choices in input file, there is no need to provide labels otherwise);
  • PDB file with input model.

Usage examples:

  1. phenix.reciprocal_space_arrays model.pdb data.hkl f_obs_label="IOBS"
  2. phenix.reciprocal_space_arrays model.pdb data.hkl r_free_flags_label="FREE"

Output: MTZ file with data arrays.

NCS usage

phenix.find_ncs - identification of NCS operators

from protein coordinates (chains), heavy atom coordinates, or a density map. Example:

 phenix.find_ncs my_8_molecules.pdb

to get the NCS relationships in your structure into find_ncs.ncs_spec.

phenix.superpose_maps - transforms maps following a molecular superposition

Seems to have no specific documentation.

phenix.apply_ncs - applying NCS to a molecule to generate all NCS copies


 phenix.apply_ncs find_ncs.ncs_spec chainA.pdb

and it will generate the copies based on find_ncs.ncs .

torsion NCS


mmtbx.find_torsion_angle_ncs_groups model.pdb

This command will output which NCS groups the torsion NCS routine finds by the automated method.

Model analysis and manipulation

phenix.pbdtools - PDB model manipulations and statistics


phenix.pdbtools your_model.pdb  model_statistics=True

will show you complete statistics about B-factors and stereochemistry,

phenix.pbdtools your_model.pdb set_b_iso=25.3 selection="chain A and resname ALA and name CA" 

will set all B=25 for all CA atoms in all ALA residues of chain A.

phenix.pdb_interpretation - PDB bonds, distances, dihedrals, ...

phenix.pdb_interpretation model_1.pdb ligand.cif 

will result in a output file model_1.pdb.geo which contains ALL geometry information (bonds, angles, torsions, planarity, non-bonded ...) for each and every atom in your model.

phenix.reduce - tool for adding hydrogens to a PDB model

phenix.superpose_pdbs - Superposition of models

phenix.superpose_ligands - Superposition of ligands

Example files at [4]

phenix.get_cc_mtz_pdb - shift model to find origin

Assuming map_coeffs1.mtz corresponds to model_1.pdb,

  phenix.get_cc_mtz_pdb  map_coeffs1.mtz model_2.pdb

will create offset.pdb which is a copy of model_2.pdb, adjusted for the origin of map_coeffs_1.mtz, and therefore superimposing on model_1.pdb with space-group symmetry plus allowed origin shifts. This will not change the hand, however.

secondary structure analysis

phenix.ksdssp model.pdb

will output HELIX and SHEET records which you can paste into the PDB header. You should verify the assignments yourself, however, as it occasionally runs adjacent helices together.



starts the GUI and runs calculations resulting in a POLYGON drawing of important characteristics of your PDB file in relation to the data

phenix.validate_model and phenix.validate

are also GUI-only

phenix.ramalyze, phenix.rotalyze, and phenix.cbetadev


Prints out the worst contacts. The clash score should be below 20.


prints out R, Rfree, R-Rfree histograms based on PDB structures. If run without parameters, prints out helpful text about its usage.

Other programs

phenix.tls - tool to convert between total and residual ADPs

It can recognize Refmac and phenix.refine formats of TLS records in PDB files.

phenix.tls model.pdb combine_tls=true

will combine TLS from PDB file header with 'residual' B from ATOM records.

phenix.tls model.pdb extract_tls=true

will split the total B-factor in ATOM records into TLS component and 'residual' part.

Tips and Tricks

A handy tip: to check the syntax of a Phenix parameter file (for any program, not just phenix.refine), you can run this command (replacing params.eff with the file of interest):

libtbx.phil params.eff

If it works, it will just print out the parameters - if not, the error message should give some indication where the error occurred.

See also

Phenix home page

Phenix mailing list

PHENIX Newsletter

  • 42 pages of general introduction to structure refinement: [5]
  • 45 pages of phenix.refine overview (including extended details about its use from the command line): [6]
  • 42 pages of "Some Facts About Maps": [7]
  • 50 pages of "Crystallographic Structure Validation": [8]
  • 31 pages of introduction to PHENIX: [9]

server producing custom RNA/DNA base pairing restraints


  • electronic Ligand Builder and Optimization Workbench (eLBOW): a tool for ligand coordinate and restraint generation. Nigel W. Moriarty, Ralf W. Grosse-Kunstleve and Paul D. Adams, ActaCryst. (2009). D65, 1074-1080
  • phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics. Afonine PV, Grosse-Kunstleve RW, Chen VB, Headd JJ, Moriarty NW, Richardson JS, Richardson DC, Urzhumtsev A, Zwart PH, Adams PD. (2010) J Appl Crystallogr. 43, 669-676. [10]