Solve routine

The routine SOLVE: the core of automated structure determination by SOLVE

The SOLVE routine is an exceptionally powerful feature of this package that can find and evaluate the quality of heavy-atom sites in a MIR, SIR, or MIR-like dataset. The SOLVE routine treats MAD data almost exactly like MIR data, beginning with the output from MADMRG and MADBST.

Ordinarily SOLVE is called after SCALE_MAD and ANALYZE_MAD or SCALE_MIR and ANALYZE_MIR as part of automated structure determination. In this case you don't have to worry about all the keywords because the previous routines figure them out for you and write them to the script file solve_mad.script (or solve_mir.script).

You can, however, control much of what SOLVE does by setting keywords before running it. SOLVE can also be called using the solve_mad.script or solve_mir.script file written out by ANALYZE_MAD or an edited version of this file.

For MAD datasets, SOLVE uses a "compressed" form of MAD data that can be analyzed much more rapidly than the full n-wavelength data. This compressed dataset is generated by MADMRG in ANALYZE_MAD . The compressed dataset essentially consists of the SIR+anomalous scattering equivalent to the full MAD dataset. This dataset can be used to refine heavy atom parameters and generate native phases more quickly than a MAD dataset can. At the conclusion of SOLVE, phases are calculated with full Bayesian correlated MAD phasing.

The SOLVE routine operates by using a new version of HASSP to generate a few or many possible "seed" solutions for the anomalously scattering atoms in the structure. The heavy-atom parameters in each seed are first refined using the very fast refinement procedure in HEAVY (origin-removed patterson refinement). The refined seed is then used in self-difference Fouriers to suggest possible additional sites. A number of solutions are scored based on each seed, each solution being evaluated based on both the difference Patterson and a "free" difference Fourier. Additionally, the non-randomness of the native Fourier is used to judge the quality of a solution and to identify the correct hand of the structure if anomalous data is present. The figure of merit of phasing is the final scoring criterion.

If desired, a solution may be read in and evaluated directly with ANALYZE_SOLVE. Also, a solution may be read in and used as a seed in generating additional sites and a more complete solution with ADDSOLVE.

Using SOLVE is quite easy, particularly since ANALYZE_MAD or ANALYZE_MIR writes out a script file (usually solve_mad.script or solve_mir.script) that has everything you need to run SOLVE.

The only really non-obvious thing you need to know about running SOLVE on MAD data is that it requires 2 input data files. One is the compressed datafile from MADMRG, usually called "solve.data". The other is the full MAD dataset, usually called "mad_fpfm.scl". SOLVE uses "solve.data" for most of its analyses, then switches to the full MAD dataset at the very end.

The way you enter information on scattering factors is a little different in the SOLVE routine from the way it was entered in SCALE_MAD and ANALYZE_MAD . In the SOLVE routine you define atom types for each wavelength and specify the scattering factors for that atom type. Then you tell SOLVE what atom type goes with which wavelength. In SCALE_MAD, in contrast, you specified scattering factors directly for each wavelength. The reason for the difference is that SOLVE has to deal with both MAD and MIR data and defining atom types is a simple way to do that.

The solve_mad.script control file for MAD data

A sample SOLVE script file that will give you an idea of what you need to specify and what other things you can specify follows. This script is an edited version of a script file written out by the ANALYZE_MAD routine.

This script file is written out during automated SOLVE operation. You may wish to edit the one SOLVE has written out for you and use it if:

you want to change how the core SOLVE routine solves your structure
you want to use ANALYZE_SOLVE or ADDSOLVE to add sites to a solution you have found or to analyze a solution you have found

Sample MAD script file for SOLVE routine

                                
!------------------solve_mad.script: solve a MAD problem----------------------
@solve.setup
 LOGFILE solve.logfile

 INFILE solve.data             !input file with MADMRG-compressed data
 MADFPFMFILE mad_fpfm.scl      !input file with full MAD dataset
                        
 JSTD  1                       ! Lambda 1 is reference wavelength used in MADMRG
 IMADPHASE  1                  ! this is a MAD dataset, reference 
                               !   wavelength is #1 (should match jstd)

 NNATF  1                      ! Pseudo-native F is column 1 of solve.data
 NNATS  2                      ! sigma is column 2

 ! Atom definitions with f' and f" values for the 3 wavelengths:
 NEWATOMTYPE LAM1
 AVAL  17.0006 5.8196 3.9731 4.3543
 BVAL  2.4098 .2726 15.2372 43.8163
 CVAL  2.8409
 FPRIMV  -1.6
 FPRPRV  3.4
 NEWATOMTYPE LAM2
 AVAL  17.0006 5.8196 3.9731 4.3543
 BVAL  2.4098 .2726 15.2372 43.8163
 CVAL  2.8409
 FPRIMV  -8.5
 FPRPRV  4.8
 NEWATOMTYPE LAM3
 AVAL  17.0006 5.8196 3.9731 4.3543
 BVAL  2.4098 .2726 15.2372 43.8163
 CVAL  2.8409
 FPRIMV  -9.85
 FPRPRV  2.86


 LAMBDA  1                           !  This is wavelength #1
 LABEL Wavelength  1 from MADMRG     !  label for lambda 1
 NCOLFBAR  3                         ! Ncolfbar...ncolsdelf are column #'s
 NCOLSFBAR  4                        ! in solve.data (MADMRG-compressed)
 NCOLDELF  5                         ! datafile
 NCOLSDELF  6
 INPHASE
 INANO
 NOREFINESCALE                       ! Don't refine overall scale factor
                                     ! because this is MADMRG data

 ! Information for MADPHASE:
 NCOLFPLUS  1                        ! these 4 column numbers refer to the
 NCOLSIGPLUS  2                      ! full MAD datafile (mad_fpfm.scl)
 NCOLFMINUS  3
 NCOLSIGMINUS  4

 ! Heavy atoms for this wavelength:
 ATOMNAME LAM1                       ! "LAM1" tells the program to use
 OCCUPANCY  .1                       ! the scattering factors input above for
 BVALUE  35.0                        ! LAM1
 REFINEALL                           ! the occupancy and b values are guesses

 LAMBDA  2
 LABEL Wavelength  2 from MADMRG
 NCOLFBAR  3
 NCOLSFBAR  4
 NCOLDELF  5
 NCOLSDELF  6
 INPHASE
 INANO

 ! Information for MADPHASE:
 NCOLFPLUS  5
 NCOLSIGPLUS  6
 NCOLFMINUS  7
 NCOLSIGMINUS  8

 ! Heavy atoms for this derivative/wavelength:
 ATOMNAME LAM2

 LAMBDA  3
 LABEL Wavelength  3 from MADMRG
 NCOLFBAR  3
 NCOLSFBAR  4
 NCOLDELF  5
 NCOLSDELF  6
 INPHASE
 INANO

 ! Information for MADPHASE:
 NCOLFPLUS  9
 NCOLSIGPLUS  10
 NCOLFMINUS  11
 NCOLSIGMINUS  12

 ! Heavy atoms for this derivative/wavelength:
 ATOMNAME LAM3

 ! Information for HASSP and SOLVE
 NCOLFHCOS  9                      ! column #s for <fh cos theta>
 NCOLFHSIN  10                     ! and <fh sin theta> in solve.data
 PATTFFTFILE patterson.patt        ! name of Bayesian patterson calculated
                                      !  by MADBST
                           
 SOLVE                                ! run SOLVE
!---------------------------------------------------------------------------

The solve_mir.script control file for MIR data

Using SOLVE is quite easy with MIR data too, particularly since ANALYZE_MIR writes out a script file that has everything you need to run SOLVE. A sample SOLVE script file that will give you an idea of what you need to specify and what other things you can specify follows. This script is an edited version of a script file written out by the ANALYZE_MIR routine.

This script file is written out during automated SOLVE operation. You may wish to edit the one SOLVE has written out for you and use it if:

you want to change how the core SOLVE routine solves your structure
you want to use ANALYZE_SOLVE or ADDSOLVE to add sites to a solution you have found or to analyze a solution you have found

Sample script file for SOLVE (MIR data)

!------------------solve_mir.script: solve an MIR problem----------------------
@solve.setup

LOGFILE solve.logfile

INFILE mir_fbar.scl                !input file with Fnat,sig, and
                                   !(fbar,sig,delano,sig) for each derivative..

NNATF 1                            ! Native F is column 1 of mir_fbar.scl
NNATS 2                            ! sigma is column 2

Derivative 1                       ! begin information about derivative 1
LABEL deriv 1 HG                   ! label for deriv 1
NCOLFBAR 3                         ! Ncolfbar...ncolsdelf are column #'s
NCOLSFBAR 4                        ! in mir_fbar.scl datafile
NCOLDELF 5
NCOLSDELF 6
INANO                              ! include anomalous differences

! Heavy atoms for this derivative:

ATOMNAME HG                        ! the atom type is "HG"
OCCUPANCY .1                       ! guess for occupancy
BVALUE 35.0                        ! guess for bvalue
REFINEALL                          ! refine everything that is reasonable

Derivative 2                       ! begin information about derivative 2
LABEL deriv 2 Iodine               ! label for deriv 2
NCOLFBAR 7                         ! Ncolfbar...ncolsdelf are column #'s
NCOLSFBAR 8                        ! in mir_fbar.scl datafile
NCOLDELF 9
NCOLSDELF 10
INANO                              ! include anomalous differences
 
ATOMNAME I-                        ! the atom type is "I-"
OCCUPANCY .1                       ! guess for occupancy
BVALUE 35.0                        ! guess for bvalue
REFINEALL                          ! refine everything that is reasonable

SOLVE                              ! run SOLVE
!---------------------------------------------------------------------------

Keywords for the solve_mad.script and solve_mir.script files

There are a lot of keywords that can affect what SOLVE does. Ordinarily you do not have to worry about most of these because they are all set for you in ANALYZE_MAD. The solve_mad.script file written out by ANALYZE_MAD or the solve_mir.script file written by ANALYZE_MIR will have most of these keywords set for you. The keywords are listed here so that you can understand what they do and so that you can set them if you want to.

Most of these keywords can be specified at the beginning of automated data analysis to control what happens when SOLVE is called. For example, typing "ntopsolve 2" in the keywords before running SCALE_MAD and ANALYZE_MAD will affect SOLVE when it is called by restricting the number of solutions analyzed at the end of the routine to 2.

SOLVE treats MAD phasing and MIR phasing in almost exactly the same way except at the very end of the routine. Consequently "derivative" and "lambda" have the same meaning to SOLVE. You can enter information about lambda 1 by typing "lambda 1" or "derivative 1". The keywords that are specific to MAD phasing are listed at the top of the list.

Keywords that have a meaning for MAD data but not for MIR data:

 
INFILE  xxx.data    Principal input dorgbn-style file with compressed MAD data
                    from MADMRG and optional additional columns
                    of data. (usual file name = "solve.data").  This file
                    is usually produced by ANALYZE_MAD.


MADFPFMFILE yyy.scl Additional input file with (F+,sigma,F-
                    ,sigma) for each wavelength will be yyy.scl.
                    This file is used at the very end of SOLVE
                    for Bayesian correlated MAD phasing if the
                    keyword "bayes" is set in ANALYZE_MAD or the
                    keyword "imadphase n" is set in SOLVE.  All the
                    wavelengths have "inphase" specified for this
                    work. (DEFAULT="mad_fpfm.scl")

JSTD n              wavelength to be used as reference (default = lowest wavelength)

IMADPHASE n         This is a MAD dataset, n should match JSTD n

NOREFINESCALE      include this for all wavelengths usually because the
                    refinements in SOLVE are based on MADMRG output which
                    should not be further refined.
                    If xx is not recognized by SOLVE you need to
                    specify instead:


Keywords that apply to both MAD and MIR data:


NNATF n             column # in "infile" for native F (pseudo-native for MAD)
NNATS n             column # in "infile" for sigma of native f



 gotoderiv n        go to derivative (wavelength) n and get ready to read some
                    modifications of the parameters for this wavelength

 gotoatom n         go to the n'th atom in this wavelength/derivative  and get
                     ready to read some modifications of its parameters

 LABEL xxxxxx       label for this wavelength/derivative

 NCOLFBAR  n        column # for Fbar for this wavelength/derivative
                     For MAD data, this and the next three values are only 
                       needed for the one wavelength defined by JSTD
                     For MIR data, they are needed for all derivatives
 NCOLSFBAR  n       column # for sigma of Fbar
 NCOLDELF  n        column # for delAno (if INANO is specified)
 NCOLSDELF  n       column # for sig of delAno



NCOLFHCOS  xx     column # in "infile" for estimated heavy atom structure
                    factor component along native structure
                    factor. (Output from MADBST for MAD data). This will be
                    used in calculation of heavy atom difference
                    Fouriers if ncolfhsin is also specified.  
                    For MIR data, you can specify which derivative this applies
                    to by replacing the "1" in "ncolfhcos(1)" with another 
                    derivative number. NCOLFHCOS is equivalent to NCOLFHCOS(1)

NCOLFHSIN  xx      column # in "infile" for estimated of heavy atom structure
                    factor perpendicular to native structure
                    factor. See ncolfhcos. NCOLFHSIN is equivalent to 
                    NCOLFHSIN(1)

PATTFFTFILE xxxxxx   MAD data: Use previously calculated Patterson FFT
                      xxxxxx as the patterson map for the anomalously scattering           
                      atoms in this MAD structure.
                      MIR data: use patterson FFT xxxxx as patterson map for
                      derivative #1. PATTFFTFILE is equivalent to 
                      PATTFFTFILE(1)(For other derivs, change the "1" to the
                      appropriate derivative number).

Also see all the commonly-used keywords for SOLVE.