Generating and solving model datasets with errors

Generate allows you to construct an MIR or MAD dataset in which you specify the heavy atom locations and types. You can even specify the cell parameters for each derivative of an MIR dataset to simulate non-isomorphism. The output from GENERATE is suitable as input to SOLVE and you can run them in one script file to generate, then SOLVE, a dataset.

If you start with a PDB file in "coords.pdb" and specify "checksolve", then you can generate a dataset, solve it, and display the "solve.ezd" electron density map that SOLVE comes up with using "O". The map will automatically be referred to the same origin as the coords.pdb structure so you can overlay your map and the model to see how good the solution is. Please note that the EZD map will cover the asymmetric unit only. You may need to put your model in the asymmetric unit or else use Gerard Kleywegt's program mapman; manipulate your map (read it in to mapman as "NEWEZD") before you overlay the map.

Here are sample files that generate and solve MIR and MAD datasets. The keywords for generate_mir and generate_mad follow after the samples.

If you specify "checksolve" when you run one of these command files then SOLVE will automatically compare all the solutions it is getting with the one that you started with.

!---------------------------------------------------------------
!gensolvemir.script
! command file to generate an MIR dataset and solve it

CELL 76 28 42 90 103 90
SYMFILE /usr/local/lib/solve/c2.sym
resolution 3.0 20.0
logfile gensolvemir.logfile
solvefile gensolvemir.prt
percent_error 3.0                       ! 3% error added to intensities
coordinatefile coords.pdb              ! coordinate file used to generate
                                        ! the starting I's (if none supplied,
                                        ! the routine makes up I's
deriv 1
cell_derivative  77 28 41 90 103 90    ! Try cell params for derivatives that
                                        ! are about 1% different from wt
inano
atom hg
occ 1.0 
bvalue 31.
xyz 0.15 0.25 0.35

deriv 2
cell_derivative  75 28 42 90 102.5 90
inano
atom au
occ 0.8
bvalue 25.
xyz 0.33 0.15 0.17

GENERATE_MIR                            ! generate the MIR dataset now.

! Now the data are in: native.intensities, der1.intensities, and der2.intensities

!...  now analyze this MIR dataset...

rawnativefile native.intensities        !file for native data

premerged
readformatted

gotoder 1                               
rawderivfile der1.intensities           ! We have to use "gotoder" because we're in the
                                        ! middle of SOLVE, not starting from the
                                        ! beginning, and we have already specified
                                        ! more than one derivative.
gotoder 2
rawderivfile der2.intensities

nres 87                                 [approx # of residues in protein molecule]
nsolsite 1                              ! one site per derivative
checksolve                              ! compare the solutions to the correct one
comparisonfile native.fft         ! get correlation coefficient of map 
                                 !calculated from each solution along the 
                                 !way with the true map in native.fft

scale_native
scale_mir
analyze_mir
solve
!---------------------------------------------------------------

... and now for a MAD dataset:

!---------------------------------------------------------------
!gensolvemad.script
! command file to generate a MAD dataset and solve it
CELL 72 28 42 90 103 90 
SYMFILE /usr/local/lib/solve/c2.sym
resolution 3.0 20.0
logfile gensolvemad.logfile
solvefile gensolvemad.prt
percent_error 3.0                       ! 3% error added to intensities
coordinatefile coords.pdb              ! coordinate file used to generate
                                        ! the starting I's (if none supplied,
                                        ! the routine makes up I's
mad_atom se                              ! define the scattering factors...
lambda 1
wavelength 0.90 
fprimv_mad -1.6
fprprv_mad 3.4
atomname se 
xyz 0.197 0.377 0.216 
occ 1.0 
bfactor 20
atomname se                             ! you only have to specify the coords for
                                        ! this one wavelength (they're copied to the
                                        ! others)
xyz 0.216 0.115 0.399
occ 1.0 
bfactor 20
lambda 2
wavelength 0.9794
fprimv_mad -8.5
fprprv_mad 4.8
lambda 3
wavelength 0.9797
fprimv_mad -9.85
fprprv_mad  2.9

GENERATE_MAD                            ! generate the MAD dataset now.

! Now the data are in: lam1.intensities, lam2.intensities, and lam3.intensities for
!  the 3 wavelengths of data

! solve the dataset

premerged
readformatted

gotoder 1                      
rawmadfile lam1.intensities 
gotoder 2
rawmadfile lam2.intensities 
gotoder  3
rawmadfile lam3.intensities

nres 87                  [approx # of residues in protein molecule]
nanomalous 2
checksolve
comparisonfile lambda_1.fft         ! get correlation coefficient of map 
                             !calculated from each solution along the 
                             !way with the true map in lambda_1.fft

scale_mad
analyze_mad
solve
!---------------------------------------------------------------

Notes on using GENERATE_MAD

You can have your generated MAD dataset contain more than one anomalously-scattering atom. You input information on the first atom type in the usual way as described above. For the second atom type, you need to:

input one NEWATOMTYPE with scattering factors for each wavelength of MAD data. The NEWATOMTYPE for the various wavelengths must be of the Form PTL1 PTL2 PTL3 etc., for lambda 1, 2, 3.
input the heavy atom parameters for this atom for lambda 1
SOLVE will generate heavy atom parameters for this atom for all the other wavelengths and will include it in the generate procedure.

Keywords for GENERATE_MIR and GENERATE_MAD

coordinatefile          pdb file with coordinates.  Used to generate the starting
                        values of F and phase for the structure. Only C N O and S
                        atoms are read in.

percent_error           % error added to intensities

cell_derivative a b c alpha beta gamma    (only for generate_mir) cell parameters
                                          for this derivative.

derivative nn
lambda nn               derivative or wavelength number

atomname   xx           name of an atom about to be specified
xyz x y z               coordinates of this atom
bvalue b                b-factor
occupancy               occupancy value