Although density modification (solvent flattening, non-crystallographic symmetry, phase extension, histogram matching, etc.) has been a very powerful tool, its potential is much greater than has been achieved so far. There are two reasons for this:
RESOLVE uses a statistical approach to density modification, while other methods use an approach in which a map is modified to meet expectations and the new phases are recombined with experimental phases. For the mathematical details, see the references for RESOLVE . You might also wish to see the discussion and extensions in Kevin Cowtan's article "Gaussian Likelihoods in real and reciprocal space" in the CCP4 newsletter.
What is the optimal relative weighting of modified and experimental phases? |
Incorrect relative weighting means that the
final results will not be optimal
Incorrect weighting terms mean that the final figures of merit are almost always inflated |
When do you stop iterating? | In some approaches the maps initially get better, then get worse unless you stop |
Density modification can be thought of as a way to adjust crystallographic phases (or amplitudes) to make them simultaneously consistent with the experimental data and with our expectations of what an electron density map should look like. The statistical approach is a mathematical way to formulate this statement. By using this formulation, the weighting factors and problems with convergence are taken care of automatically.
In RESOLVE, any set of structure factor amplitudes and phases has an associated probability composed of two simple parts:
The probability of the experimental phases | This is the probability that you would have observed your experimental data if this set of phases (and amplitudes) were correct |
The lprobability of the map | This is the probability that the electron density map calculated from this set of phases is drawn from the set of plausible electron density maps for this structure |
RESOLVE adjusts your crystallographic phases so as to maximize the total (posterier) probability of those phases. The mathematics is a little complicated but the idea is very simple. To see the mathematics in detail, have a look at T. C. Terwilliger (2000) "Maximum likelihood density modification," Acta Cryst. D56, 965-972.
Note on terminology: The approach used by RESOLVE is now called "Statistical density modification," a name suggested by Kevin Cowtan. It used to be called "Maximum-likelihood density modification", using the term "likelihood" in a colloquial sense of probability. The old name (as pointed out by Gerard Bricogne and others) is confusing because the maximum-likelihood method is a specific technique that uses a specific definition of "likelihood" that is not used in this approach. Sorry to all for the confusion, and hoping that it will now be more clear. The mathematics remains exactly the same.
Density modification is usually thought of as a process that is carried out on an experimental electron density map prior to model building, but iterative model-building methods such as ARP/wARP can also be thought of as density modification techniques. With the statistical approach, partial model information can be seamlessly incorporated into the total expression for the probability of the phases. This allows a hierachical approach to incorporating information about phase probability:
The current version of RESOLVE can incorporate all of these types of information.
RESOLVE carries out density modification on several levels:
Each "mask cycle":
RESOLVE carries out mask cycles (up to 5) until no further changes occur in the phases.
If NCS is present, then RESOLVE carries out an initial mask cycle, not including any NCS, to estimate uncertainties in density estimated from NCS copies. Then RESOLVE carries out another initial mask cycle, using NCS but not solvent flattening, to estimate "sigma", the overall error in the map.
If "use_input_solv" is not set and "hklstart" is not specified, then RESOLVE uses the R factor to estimate the solvent content of the crystal. Solvent contents from 0.1 to 0.9 are tested, and the value leading to the minimum R is chosen. This optimal solvent content is written to the file "resolve.solvent." Note: if "use_input_solv" is specified, then RESOLVE assumes that the solvent content is already known and reads it from "solvent_content" if specified, or else from "resolve.solvent" if present, or else the default (0.40) is used.
RESOLVE also uses the R-factor to identify which histogram of solvent densities and protein densities to use in density modification. The file "rho.list" in $SOLVEDIR/segments/ contains several histogram profiles, all based on model electron density maps. These are at resolutions from 1.2 A to 4 A. RESOLVE carries out a test of each histogram initially and chooses the one leading to the lowest R factor. The histogram can be set using "database". The optimal database entry is written to "resolve.database".
Resolve estimates the optimal smoothing radius using a simple formula. For cycles where no density modification has occurred yet (first cycle normally, unless "phases_from_resolve" has been set), R is set with the equation: R=2.41 (dmin)**0.9 (fom)**-0.26. For all other cycles (after density modification has begun), the smoothing radius is 4 A. These can also be set with "wang_radius", "wang_radius_cycle", "wang_radius_start", or "wang_radius_finish".
If "n_restore" is set by the user to be non-zero (default = 0), then after the phases have converged, the whole process is repeated again, starting with the original phases, but using the current probabilistic solvent mask. This allows an optimized mask to be used in the "first" cycle of density modification.
Electron density maps obtained using phases calculated from atomic models often show peaks at the coordinates of atoms in the models, even when those atoms are incorrectly placed. This effect can be reduced by careful weighting such as can be accomplished by Randy Read's SIGMAA approach, but it cannot be eliminated unless the phases are changed.
Prime-and-switch phasing is a way to remove model bias by using statistical density modification, but without including the phase information coming from the model once an initial map has been calculated.
The basic procedure is simple:
The initial biased phase information from the model is required to get the procedure going. The final phases are essentially unbiased by the model because they are based on the features of the map, not on the prior phase probabilities.
The final phases are generally improved the most when:
There are some ways that prime-and-switch phasing can have residual bias:
There are some cases where prime-and-switch phasing does not yield a nice-looking map
Non-crystallographic symmetry is an important source of information about the probability of an electron density map. RESOLVE can begin with transformation matrices and an estimate of the center-of-mass of molecule 1 that you input. RESOLVE can also figure out the transformations and center-of-mass automatically from the NCS in heavy-atom sites in a PDB file (if the default file "ha.pdb" exists and you don't specify NCS transformations, RESOLVE will try to find the NCS in those sites). RESOLVE can figure out the region over which to apply the NCS relationships automatically. You can help it by restricting the region to search for NCS with the keyword "ncs_domain_pdb xxxx.pdb" and supplying a PDB file that contains dummy atoms in the region where NCS exists (all copies must be supplied in the PDB file).
See also the sample script at resolve_sample_scripts
NOTE: If there is more than one type of NCS relationship in the crystal, RESOLVE can carry out NCS averaging separately for each group of molecules related by an NCS relationship. If you use this, you should specify the maximum extent of the region over which to apply each NCS relationship with the ncs_domain_pdb command. In that case RESOLVE can refine the NCS operators using only the part of the molecule that you specify (and not be confused by some other region that is part of another NCS group). If the regions for several NCS groups overlap, the NCS group that will be used for the overlapping points in NCS averaging will be whichever NCS group has the higher NCS correlation near those points.
NOTE If you want to input the NCS operators manually, then the keywords rota_matrix, tran_orth, center_orth (see the list of resolve keywords) are useful. For example:
(Mapping molecule j onto molecule 1)
(As input)
Operator # 1
New X-prime= 1.0000 X + 0.0000 Y + 0.0000 Z + 0.0000
New Y-prime= 0.0000 X + 1.0000 Y + 0.0000 Z + 0.0000
New Z-prime= 0.0000 X + 0.0000 Y + 1.0000 Z + 0.0000
Approximate center_of_mass of this object (from center_of_mass of
object 1 and NC symmetry) is -28.39 18.81 -16.36
Operator # 2
New X-prime= -1.0000 X + -0.0035 Y + -0.0040 Z + -4.1335
New Y-prime= 0.0043 X + -0.1070 Y + -0.9943 Z +-21.6192
New Z-prime= 0.0031 X + -0.9943 Y + 0.1070 Z +-19.2559
Approximate center_of_mass of this object (from center_of_mass of
object 1 and NC symmetry) is 24.44 -7.12 -39.79
would be input as:
rota_matrix 1.0000 0.0000 0.0000
rota_matrix 0.0000 1.0000 0.0000
rota_matrix 0.0000 0.0000 1.0000
tran_orth 0.0000 0.0000 0.0000
center_orth -28.3915 18.8125 -16.3621
rota_matrix -1.0000 -0.0035 -0.0040
rota_matrix 0.0043 -0.1070 -0.9943
rota_matrix 0.0031 -0.9943 0.1070
tran_orth -4.1335 -21.6192 -19.2559
center_orth 24.4419 -7.1171 -39.7930
RESOLVE can use the local patterns of density in your electron density map in statistical density modification to improve crystallographic phases. The basic idea is that on a local level (within a sphere of radius 2 A) there are patterns of electron density that are associated with high density at the center of the pattern, and other patterns associated with low density at the center. RESOLVE goes through your electron density map, and at each point it compares the nearby density with a set of 20 templates (it does not use the density at the point of interest or right around it in this analysis). RESOLVE_PATTERN uses this analysis to come up with a new estimate of the density at each point in the map. This new estimate of density (the "image") has the remarkable property that errors in the image are almost uncorrelated with errors in the map used to create it. This means that phase information from the "image" can be combined with phase information from other sources in a simple way. You can see the details of all this in Terwilliger, T. C. (2003) Statistical density modification using local pattern matching. Acta Cryst. D59, 1688-1701.
The resolve_build script uses image-based phasing with pattern matching and fragment identification, alternating with model-building and standard density modification. Image-based phasing is the use of an electron density map that typically comes from either an atomic model or from pattern-matching or from NCS, along with observed values of FP, to estimate phases. The process results in phases and figures of merit similar to those obtained with Randy Read's SIGMAA, but the values come directly from map-probability phasing. The electron density map provided is used as a target for statistical density modification: crystallographic phases are found that, when combined with observed amplitudes, give a map that is as close as possible to the target map. The figures of merit reflect how precisely each phase can be determined using this approach. The phases from image-based phasing are not the same as those from an FC calculation and they are not always unimodal like FC, SIGMAA or Sim-weighted phases.
The resolve_pattern script also carries out image-based phasing, but it differs from the resolve_build script in that it does not alternate it with building a model, and in that it only uses patterns and not fragment identification.
RESOLVE can carry out an FFT-based search for fragments of structure (currently helices, strands), refine the locations of these fragments, and use them in density modification even if a complete model cannot be built. The approach to finding fragments ("Maximum-likelihood density modification with pattern recognition of structural motifs",Terwilliger, T. Acta Cryst D. 57, 1755-1762; 2001) is very similar to Kevin Cowtan's FFT-based search (Cowtan, K., Acta Cryst D54, 750-756, 1998). A template consisting of averaged helical density (or strand density) is rotated over a range of orientations designed to cover most possibilities within about 20 degrees and an FFT convolution is carried out for each orientation to find locations where the template and map match. The best matches are identified and the orientiations and positions are refined. Then a pseudo-map is constructed consisting of the original templates, oriented based on the refined positions found in the search, and weighted by the local correlation coefficient. This pseudo-map is used as a source of phase information through map-probability phasing (Map-likelihood phasing", Terwilliger, T., Acta Cryst., D57, 1763-1775). This approach is similar to the one described in the original publication ("Maximum-likelihood density modification with pattern recognition of structural motifs",Terwilliger, T. Acta Cryst D. 57, 1755-1762; 2001) but works much better than the original method.
Fragment identification is normally carried out right after model-building because the same FFT search can be used for both. The RESOLVE build script includes it.
After the completion of density modification, RESOLVE builds a model of your structure. For versions 2.02 and higher, the model needs sequence information from you. You specify a file with the keyword "seq_file" and RESOLVE expects a sequence of amino acids in 1-letter format. If there are more than one type of chain, RESOLVE expects them separated by a line containing ">>>". . Typically RESOLVE can build 70-90% of the residues for a good map at 2-3 A resolution. You can tell if the model is correct by noting how good the match is to the sequence and by noting the NCS correspondence among chains (if NCS exists). The PDB file that RESOLVE writes out will have the model and also as HETATM records at the end with the heavy atom sites from SOLVE output file ha.pdb.
You can read all the details about RESOLVE automated model-building in Terwilliger, T. C. (2003). Automated main-chain model-building by template-matching and iterative fragment extension. Acta Cryst. D59, 38-44 and Terwilliger, T. C. (2003). Automated side-chain model-building and sequence assignment by template-matching. Acta Cryst. D59, 45-49.
RESOLVE now has superquick model building! The standard RESOLVE model-building for version 2.05 and higher is about 3 times faster than earlier versions. This is made possible by a more selective choice of which fragments to consider extending (no need to work on a fragment that covers a region that is already built). Versions 2.05 and higher also have the option of "superquick_build" which is about 10 times faster than previous versions of RESOLVE model-building. For a very good map (one where RESOLVE can build >80% of the model) superquick_build typically gives almost the same model as the standard build. For a moderate-quality map, the standard build or even the "thorough_build" may give up to 10% more model built.
RESOLVE versions 2.05 and higher include cycles of model-building in which the thresholds for fit of the model to the map are sequentially lowered. This allows much more of the model to be built, while keeping the accuracy of most of the model high. You can use "aggressive_build" to try and build as much as possible, or "conservative_build" to build only the best parts.
RESOLVE versions 2.06 and higher include the capability of identifying fragments (helices; strands) in a map and including them in density modification
RESOLVE builds a model in the following way.
RESOLVE (versions 2.06 and higher) can carry out pattern identification, fragment identification, density modification, and iterative model-building and refinement in combination with refmac5 (versions 5.1.24 and higher only!)
RESOLVE (versions 2.03 and higher) can also carry out iterative model-rebuilding. This is like model-building except that you start with just a model of some kind and measured amplitudes and RESOLVE does everything from there. This works much more slowly than model-building with experimental phases.
RESOLVE_BUILD (versions 2.06 and higher) can automatically evaluate a model, given a set of amplitudes FP (and phases PHIB and FOM if available). First RESOLVE will rebuild the model (to reduce any bias due to refinement). Then RESOLVE will calculate a prime-and-switch composite omit map (as used in rebuilding) based on the rebuilt model and any phase information you give it. Then RESOLVE will compare the original model to this map and summarize the fit for you.
RESOLVE (versions 2.08 and higher) can carry out fitting of FLEXIBLE LIGANDS to an electron density map. The only inputs needed are an electron density map (or difference map), and either just one (recommended) or else 5-10 copies (ok also) of the ligand in random but stereochemically ideal conformations in a PDB format file. The routine will figure out the allowed bond rotations from the copies of the ligand, and then will fit the ligand into the density starting with the biggest rigid part of the ligand. Parts of the ligand that do not fit are built as reasonably as possible, but may be built out of density or may be left off.
You can use the sample script resolve_ligand_fit.com script which allows you to find one or more than one copy of a ligand in a map.
You can even take a list of PDB files containing different ligands, fit each one to your map, and score them to identify which ligand may be bound, using the sample script resolve_ligand_id.com.
See the additional descriptions in resolve_sample_scripts too.
Also see the list of resolve keywords for additional options.
Thanks to Herb Klei for emphasizing the need for ligand fitting and for suggesting the idea of first finding the biggest fixed part of the ligand and then building the rest from this core!
RESOLVE (versions 2.08 and higher) will automatically merge NCS-related copies of your model during iterative model-building and refinement. The merging is done in the "extend_only" mode of model-building. An mtz file with FP PHIB FOM, a model (with >1 NCS copy) and a coordinate file with positions of atoms or pseudo-atoms (ha_file) used to deduce the NCS relationships are read in. The coordinates of each NCS-related copy are placed at all NCS-related positions, merged (if possible) and then are extended if possible into the density. If you do not want this to be done, use the flag no_merge_ncs_copies. You can merge models yourself with RESOLVE too: use the extend_only flag for model-building and include your model with: pdb_in your-current-model . Note that you need to supply an mtz file with a map to do this. You can specify the keywords trim or no_trim to tell RESOLVE to trim the resulting model back to the density or not.