Auto -- Automated structure determination (with sample input files)

Answers to SOLVE Frequently-asked questions

What is the best way to tell if my data are good?
For MAD data , have a look at the correlation of anomalous differences. See the table in the beta-catenin dataset for example. For MIR data, check that your R-factors between derivative and native start out large at low resolution, get smaller, and then finally get bigger again (the last rise is due to the errors in measurement and indicate where to cut off).

Can I input sites that I already know into SOLVE? Yes, you can. You just put them right in under the correct wavelength or derivative, add the keywords "addsolve" or "analyze_solve" before the scaling command, and SOLVE will use them to find new sites (addsolve) or to refine and calculate phases (analyze_solve). See addsolve and analyze_solve instructions for examples.

Can I use a solution at low resolution to run SOLVE at high resolution? Yes, you can. The easiest way is with addsolve and analyze_solve.

What do "checksolve" and "comparisonfile" do? Checksolve tells SOLVE to compare all the solutions it gets with the one that you input. SOLVE finds the origin (and hand, if you do not have anomalous data) that best matches its trial solutions with the one you entered, and reports the solution relative to this origin and hand. Comparisonfile allows you to input an FFT that SOLVE has previously calculated (at the same resolution as SOLVE is working); in combination with checksolve, SOLVE will calculate the correlation coefficient of every map that it examines to the one you input. This is handy when you have used "generate" to create a dataset.

Will SOLVE give me the right hand for my structure? Usually if you have good anomalous differences, then yes, SOLVE will give you the correct hand. Sometimes your anomalous differences might be reversed (due to incorrect analysis of data or detector hooked up backwards). In that case you can use "swap_ano" to reverse the signs of the differences.

How do I get a bigger version of SOLVE? The distribution comes with the regular sized SOLVE and solve_giant and solve_huge. Try these first. If you need even a bigger version, then email me at terwilliger@lanl.gov and I'll give you the source so you can compile a bigger version. You will need the CCP4 library file libccp4.a to compile SOLVE.

Do I need a new access file for a new version of SOLVE? No, the same access file is good for all versions from version 1.0 through 1.99.

Where do I get f' and f" scattering factors? The best place to get f' and f" values for your MAD experiment is from the beamline staff where you collected your data. They will usually have made careful measurements of these for standard settings on their beamline, so if you do a Se experiment, for example, their values should be very good. You can also measure X-ray fluorescence from your own crystal and use the Kramers-Kronig transformation to estimate these values with the same programs the beamline staff used for their standard cases.

SOLVE does use the f' and f" values and they are very important. The wavelength values are not used in any important ways by SOLVE

Where do I get scattering factors for atoms that SOLVE has not heard of? They are on pp. 500-501 of Volume C of the international tables. For example (Nb:)

NEWATOMTYPE NB
AVAL 17.6142 12.0144 4.04183 3.53346
BVAL 1.18865 11.7660 0.204785 69.7957
CVAL 3.75591
FPRIMV -.248
FPRPRV 2.48

Why are the figures of merit in the solve.status file not quite the same as the final values? The reason that the final phases look better for MAD data than the ones reported in the solve.status file is that SOLVE calculates phases at the very end using bayesian correlated mad phasing, which gives much better phases than the SIRAS-like phases used during the main part of the run (when the solve.status file is being written). The reason the full phasing is not used all the time is that it is very slow.

Should I use all my data, or just the good data? Though it would be nice to use all the data, it is far better to use just the good data. Unless your sigmas are perfect and the statistics were done perfectly, it is really hard to get rid of the interference caused by data containing noise and essentially no signal.

Will SOLVE use NCS? Regrettably, no.

Why should I use NO MERGE ORIGINAL INDEX in scalepack? You should use "no merge original index" in scalepack so that SOLVE can re-scale the data with local scaling. This flag tells scalepack to write out the place in reciprocal space that each reflection was measured. Then SOLVE can compare it to its neighbors in reciprocal space.

Can I compare Z-scores for SOLVE runs in different space groups? At different resolutions? No, Z-scores are relative and therefore cannot be compared for different space groups or resolutions.

Can I read in data in 2 different formats? Unfortunately not.

Can I convert solve files like ?mad_fbar.scl? into mtz files? Yes, you can. You will need to use "export" to export the data to a flat file, then use the ccp4 routine f2mtz to import into mtz.

Can I look at my patterson maps? Yes, you can. SOLVE writes some of them out as ".ezd" files which you can read right into "O" or convert to anything else with "mapman". Others you can convert to ezd with "ffttoezd".

Why do I get an execution error with no output when I try to run SOLVE? On an SGI, if you run a version of solve that does not match your computer, you get an "exec error". Try a version of solve for a lower version of your machine (i.e., r5000 instead of r12000).

Why does SOLVE say ?CELL DIMENSION <1 OR > 1000 FOUND?? This happens if you try to use a really huge unit cell that SOLVE didn't expect. You'll have to cut back on the resolution a bit if it happens.

Why does SOLVE say ?/sbin/loader: Fatal Error: set_program_attributes failed to set heap start?? This is an error that your Compaq Alpha might give you if you don't have enough memory allocated to you. The solution is to add a line to your .cshrc file that just says: "unlimit". This tells the system to give you all available resources.

Why doesn?t COMBINE_ALL work for me? For combine_all to work, you have to be sure and input two or more complete datasets, separated by "new_dataset".

Why cant SOLVE find 2 sites that are close together? SOLVE won't let you find sites that are closer than a specified number of grid units. The distance depends on the grid size, which is typically 1/3 the resolution. The default ("ntol_site") is 8, or about 2 to 3x the resolution. You can decrease it if you want; in which case SOLVE will have to consider more solutions and may have trouble identifying the best.