The routine SOLVE: the core of automated structure determination by SOLVE
The SOLVE routine is an exceptionally powerful feature of this package that can find and evaluate the quality of heavy-atom sites in a MIR, SIR, or MIR-like dataset. The SOLVE routine treats MAD data almost exactly like MIR data, beginning with the output from MADMRG and MADBST.
Ordinarily SOLVE is called after SCALE_MAD and ANALYZE_MAD or SCALE_MIR and ANALYZE_MIR as part of automated structure determination. In this case you don't have to worry about all the keywords because the previous routines figure them out for you and write them to the script file solve_mad.script (or solve_mir.script).
You can, however, control much of what SOLVE does by setting keywords before running it. SOLVE can also be called using the solve_mad.script or solve_mir.script file written out by ANALYZE_MAD or an edited version of this file.
For MAD datasets, SOLVE uses a "compressed" form of MAD data that can be analyzed much more rapidly than the full n-wavelength data. This compressed dataset is generated by MADMRG in ANALYZE_MAD . The compressed dataset essentially consists of the SIR+anomalous scattering equivalent to the full MAD dataset. This dataset can be used to refine heavy atom parameters and generate native phases more quickly than a MAD dataset can. At the conclusion of SOLVE, phases are calculated with full Bayesian correlated MAD phasing.
The SOLVE routine operates by using a new version of HASSP to generate a few or many possible "seed" solutions for the anomalously scattering atoms in the structure. The heavy-atom parameters in each seed are first refined using the very fast refinement procedure in HEAVY (origin-removed patterson refinement). The refined seed is then used in self-difference Fouriers to suggest possible additional sites. A number of solutions are scored based on each seed, each solution being evaluated based on both the difference Patterson and a "free" difference Fourier. Additionally, the non-randomness of the native Fourier is used to judge the quality of a solution and to identify the correct hand of the structure if anomalous data is present. The figure of merit of phasing is the final scoring criterion.
If desired, a solution may be read in and evaluated directly with ANALYZE_SOLVE. Also, a solution may be read in and used as a seed in generating additional sites and a more complete solution with ADDSOLVE.
Using SOLVE is quite easy, particularly since ANALYZE_MAD or ANALYZE_MIR writes out a script file (usually solve_mad.script or solve_mir.script) that has everything you need to run SOLVE.
The only really non-obvious thing you need to know about running SOLVE on MAD data is that it requires 2 input data files. One is the compressed datafile from MADMRG, usually called "solve.data". The other is the full MAD dataset, usually called "mad_fpfm.scl". SOLVE uses "solve.data" for most of its analyses, then switches to the full MAD dataset at the very end.
The way you enter information on scattering factors is a little different in the SOLVE routine from the way it was entered in SCALE_MAD and ANALYZE_MAD . In the SOLVE routine you define atom types for each wavelength and specify the scattering factors for that atom type. Then you tell SOLVE what atom type goes with which wavelength. In SCALE_MAD, in contrast, you specified scattering factors directly for each wavelength. The reason for the difference is that SOLVE has to deal with both MAD and MIR data and defining atom types is a simple way to do that.
The solve_mad.script control file for MAD data
A sample SOLVE script file that will give you an idea of what you need to specify and what other things you can specify follows. This script is an edited version of a script file written out by the ANALYZE_MAD routine.
This script file is written out during automated SOLVE operation. You may wish to edit the one SOLVE has written out for you and use it if:
!------------------solve_mad.script: solve a MAD problem----------------------
@solve.setup
LOGFILE solve.logfile
INFILE solve.data !input file with MADMRG-compressed data
MADFPFMFILE mad_fpfm.scl !input file with full MAD dataset
JSTD 1 ! Lambda 1 is reference wavelength used in MADMRG
IMADPHASE 1 ! this is a MAD dataset, reference
! wavelength is #1 (should match jstd)
NNATF 1 ! Pseudo-native F is column 1 of solve.data
NNATS 2 ! sigma is column 2
! Atom definitions with f' and f" values for the 3 wavelengths:
NEWATOMTYPE LAM1
AVAL 17.0006 5.8196 3.9731 4.3543
BVAL 2.4098 .2726 15.2372 43.8163
CVAL 2.8409
FPRIMV -1.6
FPRPRV 3.4
NEWATOMTYPE LAM2
AVAL 17.0006 5.8196 3.9731 4.3543
BVAL 2.4098 .2726 15.2372 43.8163
CVAL 2.8409
FPRIMV -8.5
FPRPRV 4.8
NEWATOMTYPE LAM3
AVAL 17.0006 5.8196 3.9731 4.3543
BVAL 2.4098 .2726 15.2372 43.8163
CVAL 2.8409
FPRIMV -9.85
FPRPRV 2.86
LAMBDA 1 ! This is wavelength #1
LABEL Wavelength 1 from MADMRG ! label for lambda 1
NCOLFBAR 3 ! Ncolfbar...ncolsdelf are column #'s
NCOLSFBAR 4 ! in solve.data (MADMRG-compressed)
NCOLDELF 5 ! datafile
NCOLSDELF 6
INPHASE
INANO
NOREFINESCALE ! Don't refine overall scale factor
! because this is MADMRG data
! Information for MADPHASE:
NCOLFPLUS 1 ! these 4 column numbers refer to the
NCOLSIGPLUS 2 ! full MAD datafile (mad_fpfm.scl)
NCOLFMINUS 3
NCOLSIGMINUS 4
! Heavy atoms for this wavelength:
ATOMNAME LAM1 ! "LAM1" tells the program to use
OCCUPANCY .1 ! the scattering factors input above for
BVALUE 35.0 ! LAM1
REFINEALL ! the occupancy and b values are guesses
LAMBDA 2
LABEL Wavelength 2 from MADMRG
NCOLFBAR 3
NCOLSFBAR 4
NCOLDELF 5
NCOLSDELF 6
INPHASE
INANO
! Information for MADPHASE:
NCOLFPLUS 5
NCOLSIGPLUS 6
NCOLFMINUS 7
NCOLSIGMINUS 8
! Heavy atoms for this derivative/wavelength:
ATOMNAME LAM2
LAMBDA 3
LABEL Wavelength 3 from MADMRG
NCOLFBAR 3
NCOLSFBAR 4
NCOLDELF 5
NCOLSDELF 6
INPHASE
INANO
! Information for MADPHASE:
NCOLFPLUS 9
NCOLSIGPLUS 10
NCOLFMINUS 11
NCOLSIGMINUS 12
! Heavy atoms for this derivative/wavelength:
ATOMNAME LAM3
! Information for HASSP and SOLVE
NCOLFHCOS 9 ! column #s for <fh cos theta>
NCOLFHSIN 10 ! and <fh sin theta> in solve.data
PATTFFTFILE patterson.patt ! name of Bayesian patterson calculated
! by MADBST
SOLVE ! run SOLVE
!---------------------------------------------------------------------------
The solve_mir.script control file for MIR data
Using SOLVE is quite easy with MIR data too, particularly since ANALYZE_MIR writes out a script file that has everything you need to run SOLVE. A sample SOLVE script file that will give you an idea of what you need to specify and what other things you can specify follows. This script is an edited version of a script file written out by the ANALYZE_MIR routine.
This script file is written out during automated SOLVE operation. You may wish to edit the one SOLVE has written out for you and use it if:
!------------------solve_mir.script: solve an MIR problem----------------------
@solve.setup
LOGFILE solve.logfile
INFILE mir_fbar.scl !input file with Fnat,sig, and
!(fbar,sig,delano,sig) for each derivative..
NNATF 1 ! Native F is column 1 of mir_fbar.scl
NNATS 2 ! sigma is column 2
Derivative 1 ! begin information about derivative 1
LABEL deriv 1 HG ! label for deriv 1
NCOLFBAR 3 ! Ncolfbar...ncolsdelf are column #'s
NCOLSFBAR 4 ! in mir_fbar.scl datafile
NCOLDELF 5
NCOLSDELF 6
INANO ! include anomalous differences
! Heavy atoms for this derivative:
ATOMNAME HG ! the atom type is "HG"
OCCUPANCY .1 ! guess for occupancy
BVALUE 35.0 ! guess for bvalue
REFINEALL ! refine everything that is reasonable
Derivative 2 ! begin information about derivative 2
LABEL deriv 2 Iodine ! label for deriv 2
NCOLFBAR 7 ! Ncolfbar...ncolsdelf are column #'s
NCOLSFBAR 8 ! in mir_fbar.scl datafile
NCOLDELF 9
NCOLSDELF 10
INANO ! include anomalous differences
ATOMNAME I- ! the atom type is "I-"
OCCUPANCY .1 ! guess for occupancy
BVALUE 35.0 ! guess for bvalue
REFINEALL ! refine everything that is reasonable
SOLVE ! run SOLVE
!---------------------------------------------------------------------------
There are a lot of keywords that can affect what SOLVE does. Ordinarily you do not have to worry about most of these because they are all set for you in ANALYZE_MAD. The solve_mad.script file written out by ANALYZE_MAD or the solve_mir.script file written by ANALYZE_MIR will have most of these keywords set for you. The keywords are listed here so that you can understand what they do and so that you can set them if you want to.
Most of these keywords can be specified at the beginning of automated data analysis to control what happens when SOLVE is called. For example, typing "ntopsolve 2" in the keywords before running SCALE_MAD and ANALYZE_MAD will affect SOLVE when it is called by restricting the number of solutions analyzed at the end of the routine to 2.
SOLVE treats MAD phasing and MIR phasing in almost exactly the same way except at the very end of the routine. Consequently "derivative" and "lambda" have the same meaning to SOLVE. You can enter information about lambda 1 by typing "lambda 1" or "derivative 1". The keywords that are specific to MAD phasing are listed at the top of the list.
Keywords that have a meaning for MAD data but not for MIR data:
INFILE xxx.data Principal input dorgbn-style file with compressed MAD data
from MADMRG and optional additional columns
of data. (usual file name = "solve.data"). This file
is usually produced by ANALYZE_MAD.
MADFPFMFILE yyy.scl Additional input file with (F+,sigma,F-
,sigma) for each wavelength will be yyy.scl.
This file is used at the very end of SOLVE
for Bayesian correlated MAD phasing if the
keyword "bayes" is set in ANALYZE_MAD or the
keyword "imadphase n" is set in SOLVE. All the
wavelengths have "inphase" specified for this
work. (DEFAULT="mad_fpfm.scl")
JSTD n wavelength to be used as reference (default = lowest wavelength)
IMADPHASE n This is a MAD dataset, n should match JSTD n
NOREFINESCALE include this for all wavelengths usually because the
refinements in SOLVE are based on MADMRG output which
should not be further refined.
If xx is not recognized by SOLVE you need to
specify instead:
Keywords that apply to both MAD and MIR data:
NNATF n column # in "infile" for native F (pseudo-native for MAD)
NNATS n column # in "infile" for sigma of native f
gotoderiv n go to derivative (wavelength) n and get ready to read some
modifications of the parameters for this wavelength
gotoatom n go to the n'th atom in this wavelength/derivative and get
ready to read some modifications of its parameters
LABEL xxxxxx label for this wavelength/derivative
NCOLFBAR n column # for Fbar for this wavelength/derivative
For MAD data, this and the next three values are only
needed for the one wavelength defined by JSTD
For MIR data, they are needed for all derivatives
NCOLSFBAR n column # for sigma of Fbar
NCOLDELF n column # for delAno (if INANO is specified)
NCOLSDELF n column # for sig of delAno
NCOLFHCOS xx column # in "infile" for estimated heavy atom structure
factor component along native structure
factor. (Output from MADBST for MAD data). This will be
used in calculation of heavy atom difference
Fouriers if ncolfhsin is also specified.
For MIR data, you can specify which derivative this applies
to by replacing the "1" in "ncolfhcos(1)" with another
derivative number. NCOLFHCOS is equivalent to NCOLFHCOS(1)
NCOLFHSIN xx column # in "infile" for estimated of heavy atom structure
factor perpendicular to native structure
factor. See ncolfhcos. NCOLFHSIN is equivalent to
NCOLFHSIN(1)
PATTFFTFILE xxxxxx MAD data: Use previously calculated Patterson FFT
xxxxxx as the patterson map for the anomalously scattering
atoms in this MAD structure.
MIR data: use patterson FFT xxxxx as patterson map for
derivative #1. PATTFFTFILE is equivalent to
PATTFFTFILE(1)(For other derivs, change the "1" to the
appropriate derivative number).
Also see all the commonly-used keywords for SOLVE.