Structure solution from weak anomalous data Phenix Workshop SBGrid-NE-CAT Computing School Harvard Medical School, Boston June 7, 2014 Gábor Bunkóczi, Airlie McCoy, Randy Read (Cambridge University) Nat Echols, Ralf Grosse-Kunstleve, Paul Adams (Lawrence Berkeley National Laboratory) Tom Terwilliger (Los Alamos National Laboratory)
Structure solution from weak anomalous data The problems with weak signal Quantifying the anomalous signal Solving the anomalous sub-structure with weak signal Solving structures with weak signal Estimating the anomalous signal from the data
Structure solution from weak anomalous data The problem: low anomalous signal-to-noise Reasons: few anomalous scatterers, sulfur SAD, weak diffraction, wavelength far from peak
Structure solution from weak anomalous data Consequences of low anomalous signal-to-noise Substructure identification is difficult Phasing is poor Iterative density modification, model-building and refinement works poorly
Measures of anomalous signal-to-noise Ratio of anomalous signal (<ΔI 2 >-<σ 2 >( to uncertainty in anomalous signal (<σ 2 >) (0 for no anomalous signal) Ratio of anomalous differences to differences among equivalent centric reflections (1 for no anomalous signal) Anomalous correlation (between wavelengths, between half-datasets) Measurability (fraction of anomalous differences measured with difference/sigma > 3)
Measures of anomalous signal-to-noise Problems with these measures: Most require estimates of uncertainties in the data No obvious relationship between the signal-to-noise and whether a structure can be solved End up with rules of thumb: data with anomalous CC > 0.3 will be useful )
Another measure of anomalous signal-to-noise Make a measure that is related to what we want to do with the anomalous signal How about: The signal-to-noise in an anomalous difference Fourier at the positions of the anomalous scatterers Related to: the signal-to-noise in the anomalous differences, the number of reflections, and to the number of sites
Signal-to-noise in an anomalous difference Fourier at the positions of the anomalous scatterers Advantage: Closely related to what we need to do with the anomalous differences Disadvantage: Can only be calculated directly if the structure is solved (Can be estimated )
Example of anomalous signal-to-noise Holton Challenge data http://bl831.als.lbl.gov/~jamesh/challenge/ Simulated diffraction data from 3dk0 to 1.8 Å (useful to 2.3 Å) 0% to 100% occupancy of Se in selenomethionine Impossible.mtz" is difficulty (fraction S) = 0.79
Difficulty: 0.78 22% SeMet incorporation http://bl831.als.lbl.gov/~jamesh/powerpoint/ anomalous_challenge.pptx
Difficulty: 0.79 21% SeMet incorporation http://bl831.als.lbl.gov/~jamesh/powerpoint/ anomalous_challenge.pptx
Example of anomalous signal-to-noise Holton Challenge data Anomalous signal (mean density at sites in anomalous different Fourier) 16 14 12 10 8 6 4 2 0 Anomalous signal 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction Se
Finding the anomalous sub-structure with weak anomalous signal Current approaches Anomalous Difference Patterson seeding Direct methods (Rantan) Dual-space methods (Shelxd, HySS, Crunch2) Difference Fourier (Solve)
Finding the anomalous sub-structure with weak anomalous signal Most powerful source of information about substructure before phases are known is the SAD likelihood function: The likelihood of measuring the observed anomalous data given a partial model
Using the SAD likelihood function to find the anomalous sub-structure Start with guess about the anomalous sub-structure From anomalous difference Patterson Random Any other source Find additional sites that increase the likelihood LLG completion based on log-likelihood gradient maps* Iterative addition of sites Related to using a difference Fourier but much better *La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472-494 McCoy, A. J. & Read, R. J. (2010). Acta Cryst. D66, 458-469.
Using LLG completion in HySS Guess 2- site solutions Peaks from Patterson Extrapolation Direct methods Phaser LLG completion Scoring Correlation Phaser LLG Range of resolution Variable number of Patterson solutions Adjustable LLGC_SIGMA (cut-off for peak height) Use LLG score to compare solutions Terminate early if same solution found several times Run quick direct methods first
Using LLG completion in HySS Test cases 164 SAD datasets from PDB (largely JCSG MAD data) Using peak, remotes, inflection as available to include data with low anomalous signal
Setting up test data on 165 datasets phenix.fetch_pdb 2o7t phenix.python $PHENIX/phenix/phenix/ autosol/sad_data_from_pdb.py 2o7t Splits out each wavelength (peak, edge, remote etc) for MAD and run separately Run HySS with direct methods, with LLG completion
Direct methods vs LLG completion 164 SAD datasets from PDB
Direct methods vs LLG completion 164 SAD datasets from PDB
Holton Challenge data Choosing a high-resolution cutoff Anomalous signal Mean anomalous peak height at coordinates of Se atoms 10 8 6 4 2 0 1 2 3 4 5 6 7 High-resolution cutoff (Å )
Holton Challenge data Correct sites found vs anomalous signal-to-noise Correct sites 14 12 10 8 6 4 2 0 Anomalous signal needed to find sites HySS-LLG-brute-force HySS-LLG Shelxd (100000 tries) Shelxd (1000 tries) Crunch2 SOLVE HySS (direct methods) 0 2 4 6 8 10 12 14 16 Anomalous signal
CysZ multi-crystal sulfur-sad data Qun Liu, Tassadite Dahmane, Zhen Zhang, Zahra Assur, Julia Brasch, Lawrence Shapiro, Filippo Mancia, Wayne Hendrickson (2012). Science 336,1033-1037 Data from 7 crystals collected at 1.74 Å Only merged data could be solved What is the minimum number of crystals that could have been used?
CysZ multi-crystal sulfur-sad data Datasets Anomalous signal 5 5.22 1 5.66 4 5.81 2 5.87 6 6.23 7 6.63 3 6.77 56 7.13 561 7.94 67 8.22 273 9.02 2734 9.03 27345 9.07 27346 9.28 273456 9.41 2734561 9.63
CysZ multi-crystal sulfur-sad data 25 20 LLG (brute-force) Shelxd (100000 tries) Correct sites 15 10 5 0 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Anomalous signal
CysZ multi-crystal sulfur-sad data 25 20 Correct sites 15 10 5 0 0 1 2 3 4 5 6 7 Number of crystals included LLG (bruteforce) Shelxd (100000 tries)
CysZ multi-crystal sulfur-sad data 25 Merge 6-7 20 Correct sites 15 10 5 0 0 1 2 3 4 5 6 7 Number of crystals included LLG (bruteforce) Shelxd (100000 tries)
CysZ multi-crystal sulfur-sad data Merge of crystals 6, 7 AutoSol R/Rfree=0.22/0.26
CysZ multi-crystal sulfur-sad data Merge of crystals 6, 7 AutoSol R/Rfree=0.22/0.26
HySS: summary of new features LLG completion of anomalous substructure Initiation of search with Patterson solutions, input sites, or randomized input sites LLG completion from Patterson solutions or direct methods solutions Parallel execution of searches Automation of search over resolution, direct methods, and Phaser completion Termination if same solution is found from different Patterson seeds at same resolution
Structure determination with weak anomalous signal AutoSol: substructure solution, phasing, density modification, preliminary model-building AutoBuild Iterative model-building, refinement, density modification Parallel AutoBuild Parallel runs of AutoBuild with map averaging and picking best models
Structure solution with phenix.autosol Experimental data, sequence, anomalously-scattering atom, wavelength(s) Find heavy-atom sites with direct methods (HYSS dual-space) Calculate phases (Phaser) Improve phases, find NCS, build model
Structure solution with phenix.autosol: enhancements for weak SAD data Experimental data, sequence, anomalously-scattering atom, wavelength(s) Find heavy-atom sites with direct methods (HYSS LLG completion) Use map and model in LLG completion Calculate phases (Phaser) Improve phases, find NCS, build model
AutoSol structure solution 164 SAD datasets from PDB (including inflection/remote datasets not previously used as SAD data) 1.0 0.8 Map correlation 0.6 0.4 0.2 0.0-0.2 0 10 20 30 40 Anomalous signal
1.0 AutoSol structure solution 164 SAD datasets from PDB AutoSol (new) 0.8 Map correlation 0.6 0.4 0.2 0.0-0.2 0 10 20 30 40 Anomalous signal
1.0 AutoBuild model-building 164 SAD datasets from PDB AutoBuild 0.8 Map correlation 0.6 0.4 0.2 0.0-0.2 0 5 10 15 20 25 30 35 40 45 Anomalous signal
1.0 AutoBuild model-building 164 SAD datasets from PDB Parallel AutoBuild 0.8 Map correlation 0.6 0.4 0.2 0.0-0.2 0 5 10 15 20 25 30 35 40 45 Anomalous signal
Holton Challenge data Final map correlation vs anomalous signal-to-noise 1.0 0.8 Map correlation 0.6 0.4 0.2 0.0 AutoSol/Parallel AutoBuild AutoSol/AutoBuild Crank2 6 7 8 9 10 11 12 Anomalous signal
Estimating the anomalous signal from the data Gold standard: Mean peak height of anomalous difference Fourier at positions of anomalously-scattering atoms Estimates of anomalous signal: Signal-to-noise in anomalous differences Xtriage recommended resolution CC of half-dataset anomalous differences Skew of anomalous difference Patterson
Signal-to-noise estimated from anomalous differences and uncertainties (r 2 =0.13) Signal-to-noise in anomalous difference Fourier 45 40 35 30 25 20 15 10 5 0 0.00 0.50 1.00 1.50 2.00 Solve estimate of anomalous signal-to-noise
Signal-to-noise estimated from Xtriage recommended resolution (r 2 =-0.10) Signal-to-noise in anomalous difference Fourier 45 40 35 30 25 20 15 10 5 0 0 5 10 15 20 25 30 Xtriage recommended resolution (Å)
Signal-to-noise estimated from halfdataset anomalous correlation (r 2 =0.28) Signal-to-noise in anomalous difference Fourier 45 40 35 30 25 20 15 10 5 0 0 0.2 0.4 0.6 0.8 1 Half-dataset Anomalous CC
Signal-to-noise estimated from halfdataset anomalous correlation (corrected for numbers of reflections and sites) (r 2 =0.50) Signal-to-noise in anomalous difference Fourier 45 40 35 30 25 20 15 10 5 0-20 -10 0 10 20 30 40 50 60 Half-dataset Anomalous CC * (Nrefl/sites) 1/2
Signal-to-noise estimated from skew of origin-removed truncated anomalous difference Fourier (r 2 =0.39) Signal-to-noise in anomalous difference Fourier 45 40 35 30 25 20 15 10 5 0-0.05 0.00 0.05 0.10 0.15 0.20 0.25 Skew of anomalous difference Patterson
Signal-to-noise estimated from skew of origin-removed truncated anomalous difference Fourier (r 2 =0.60) Signal-to-noise in anomalous difference Fourier 45 40 35 30 25 20 15 10 5 0-20 0 20 40 60 Skew * Nrefl 1/2
Success of direct methods substructure solution as function of true signal-to-noise HySS Direct Methods Fraction of sites found 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20 25 30 35 40 45 Anomalous signal
Success of direct methods substructure solution as function of estimated signal-to-noise HySS Direct Methods Fraction of sites found 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20 25 30 35 40 45 Signal estimated from skew * Nrefl 1/2
Structure solution from weak anomalous data: Perspectives Signal-to-noise in at sites in anomalous difference Fourier is a useful measure of quality Signal-to-noise can be estimated Likelihood-based methods for finding the anomalous substructure are powerful even with weak signal Structures can be solved with weak signal
Lawrence Berkeley Laboratory The PHENIX Project Los Alamos National Laboratory Duke University Cambridge University An NIH/NIGMS funded Program Project