PHENIX Wizards and Tools

Similar documents
Structure solution from weak anomalous data

Charles Ballard (original GáborBunkóczi) CCP4 Workshop 7 December 2011

Phaser: Experimental phasing

The Crystallographic Process

SOLVE and RESOLVE: automated structure solution, density modification and model building

Pipelining Ligands in PHENIX: elbow and REEL

research papers Iterative-build OMIT maps: map improvement by iterative model building and refinement without model bias 1.

The Crystallographic Process

Likelihood and SAD phasing in Phaser. R J Read, Department of Haematology Cambridge Institute for Medical Research

Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat

Molecular replacement. New structures from old

Automated ligand fitting by core-fragment fitting and extension into density

Pathogenic C9ORF72 Antisense Repeat RNA Forms a Double Helix with Tandem C:C Mismatches

CCP4 Diamond 2014 SHELXC/D/E. Andrea Thorn

electronic reprint PHENIX: a comprehensive Python-based system for macromolecular structure solution

SHELXC/D/E. Andrea Thorn

Molecular Replacement. Airlie McCoy

research papers 1. Introduction Thomas C. Terwilliger a * and Joel Berendzen b

Supporting Information

Protein Data Bank Changes Guide. New Changes in Version 3.20 September 15, 2008

Web-based Auto-Rickshaw for validation of the X-ray experiment at the synchrotron beamline

Supplementary Information

Resolution Dependence of an Ab Initio Phasing Method in Protein X-ray Crystallography

MR model selection, preparation and assessing the solution

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Molecular Replacement (Alexei Vagin s lecture)

Direct Method. Very few protein diffraction data meet the 2nd condition

Direct Methods and Many Site Se-Met MAD Problems using BnP. W. Furey

GC376 (compound 28). Compound 23 (GC373) (0.50 g, 1.24 mmol), sodium bisulfite (0.119 g,

From x-ray crystallography to electron microscopy and back -- how best to exploit the continuum of structure-determination methods now available

research papers A general method for phasing novel complex RNA crystal structures without heavy-atom derivatives

Experimental phasing in Crank2

Ensemble refinement of protein crystal structures in PHENIX. Tom Burnley Piet Gros

Full wwpdb X-ray Structure Validation Report i

Crystal lattice Real Space. Reflections Reciprocal Space. I. Solving Phases II. Model Building for CHEM 645. Purified Protein. Build model.

Supporting Information

The SHELX approach to the experimental phasing of macromolecules. George M. Sheldrick, Göttingen University

Protein Structure Prediction, Engineering & Design CHEM 430

Full wwpdb X-ray Structure Validation Report i

Protein Crystallography

Statistical density modification with non-crystallographic symmetry

Protein crystallography. Garry Taylor

Full wwpdb X-ray Structure Validation Report i

Automated Protein Model Building with ARP/wARP

Electron Density at various resolutions, and fitting a model as accurately as possible.

Computational aspects of high-throughput crystallographic macromolecular structure determination

Experimental phasing, Pattersons and SHELX Andrea Thorn

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Full wwpdb X-ray Structure Validation Report i

The structure of Aquifex aeolicus FtsH in the ADP-bound state reveals a C2-symmetric hexamer

ACORN - a flexible and efficient ab initio procedure to solve a protein structure when atomic resolution data is available

Principles of Physical Biochemistry

Full wwpdb X-ray Structure Validation Report i

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Anomalous dispersion

Full wwpdb X-ray Structure Validation Report i

HTCondor and macromolecular structure validation

The structure of a nucleolytic ribozyme that employs a catalytic metal ion. Yijin Liu, Timothy J. Wilson and David M.J. Lilley

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Full wwpdb X-ray Structure Validation Report i

Orientational degeneracy in the presence of one alignment tensor.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Axel T. Brunger, Debanu Das, Ashley M. Deacon, Joanna Grant, Thomas C. Terwilliger, Randy J. Read, Paul D. Adams, Michael Levitt and Gunnar F.

Garib N Murshudov MRC-LMB, Cambridge

Full wwpdb X-ray Structure Validation Report i

Week 10: Homology Modelling (II) - HHpred

Prediction and refinement of NMR structures from sparse experimental data

X-ray Crystallography

Assignment 2 Atomic-Level Molecular Modeling

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Rietveld Structure Refinement of Protein Powder Diffraction Data using GSAS

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

NMR, X-ray Diffraction, Protein Structure, and RasMol

Exploiting Protein Conformational Change to Optimize Adenosine-Derived Inhibitors of HSP70

The structure of a nucleolytic ribozyme that employs a catalytic metal ion Liu, Yijin; Wilson, Timothy; Lilley, David

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract

Pymol Practial Guide

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

PAN-modular Structure of Parasite Sarcocystis muris Microneme Protein SML-2 at 1.95 Å Resolution and the Complex with 1-Thio-β-D-Galactose

Full wwpdb X-ray Structure Validation Report i

Experimental Phasing with SHELX C/D/E

wwpdb X-ray Structure Validation Summary Report

Supporting Information. Full and Partial Agonism of a Designed Enzyme Switch

Direct-method SAD phasing with partial-structure iteration: towards automation

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Macromolecular Crystallography Part II

Analysis and Prediction of Protein Structure (I)

Better Bond Angles in the Protein Data Bank

Structural Mechanism for the Fidelity Modulation of DNA Polymerase λ. 128 Academia Road Sec. 2, Nankang, Taipei, 115, Taiwan

Acta Crystallographica Section F

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

research papers HKL-3000: the integration of data reduction and structure solution from diffraction images to an initial model in minutes

Automated identification of functional dynamic contact networks from X-ray crystallography

2013 Eric Pitman Summer Workshop in Computational Science....an introduction to R, statistics, programming, and getting to know datasets

Protein Structure Prediction

Molecular Biology Course 2006 Protein Crystallography Part II

Bio nformatics. Lecture 23. Saad Mneimneh

CRYSTALLOGRAPHY AND STORYTELLING WITH DATA. President, Association of Women in Science, Bethesda Chapter STEM Consultant

Transcription:

PHENIX Wizards and Tools Tom Terwilliger Los Alamos National Laboratory terwilliger@lanl.gov l The PHENIX project Computational Crystallography Initiative (LBNL) Paul Adams, Ralf Grosse-Kunstleve, Peter Zwart, Nigel Moriarty, Nicholas Sauter, Pavel Afonine Los Alamos National Lab (LANL) Tom Terwilliger, Li-Wei Hung Cambridge University Randy Read, Airlie McCoy, Gabor Bunkoczi, Rob Oeffner Duke University Jane Richardson, David Richardson, Jeff Headd, Vincent Chen Texas A&M University Tom Ioerger, Jim Sacchettini

PHENIX Wizards AutoSol Wizard: Structure solution (MIR/MAD/SAD) with HYSS/Phaser/Solve/Resolve AutoBuild Wizard: Iterative density modification, model- building and refinement with Resolve/phenix.refine/Elbow; model rebuilding in place; touch-up of model; simple OMIT; SA- OMIT; Iterative-build OMIT; OMIT around atoms in a PDB file; protein, RNA, DNA model-building LigandFit Wizard: automated fitting of flexible ligands AutoMR Wizard: Phaser molecular AutoMR Wizard: Phaser molecular replacement followed by automatic rebuilding

Determining a SAD structure with PHENIX Solve the structure: phenix.autosol sad.sca 12 se AutoBuild a model and improve phases: phenix.autobuild after_autosol=true Find ligands: phenix.ligandfit sad.sca model=partial.pdb p ligand=atp Refine the model carefully: phenix.refine exptl_fobs_freer_flags.mtz freer flags \ overall_best.pdb #and many more commands

Why automate structure determination? Automation makes straightforward cases accessible to a wider group of structural biologists makes difficult cases more feasible for experts can speed up the process can help reduce errors Automation also allows you to try more possibilities estimate uncertainties

Requirements for automation of structure determination of macromolecules by X-ray crystallography y (1) Software carrying out individual steps (2) Seamless connections between steps (3) A way to decide what is good (4) Strategies for structure determination and decisionmaking

Why we need good measures of the quality of an electrondensity map: Which h solution is best? Are we on the right track? If map is good: It is easy

Why we need good measures of the quality of an electrondensity map: Which h solution is best? Are we on the right track? Contiguous solvent region Flat solvent region If map is good: It is easy Connected density

Histogram of electron density values has a positive skew Typical histogram of electron density Low density: Points between atoms and in solvent region High density: Points on top of atoms Number of gr rid points 6000 5000 4000 3000 2000 1000 Histogram skewed to the right 0-0.5 0.0 0.5 1.0 1.5 Electron density

Evaluating electron density maps Basis Good map Random map Skew of density (Podjarny, 1977) Highly skewed (very positive at positions of Gaussian histogram atoms, zero elsewhere) Connectivity of regions of high density A few connected regions Many very short can trace entire molecule connected regions (Baker, Krukowski, & Agard, 1993) Correlation of local rms densities (Terwilliger, 1999) Neighboring regions in map have similar rms densities Map has uniform rms density R-factor in 1 st cycle of density modification (Cowtan, 1996) Low R-factor High R-factor

Which scoring criteria best reflect the quality of a map? Create real maps Score the maps with each criteria Compare the scores with the actual quality of the maps

Creating real maps 247 MAD, SAD, MIR datasets with final model available (PHENIX library and JCSG publicly-available data) Run AutoSol Wizard on each dataset. Calculate maps for each solution considered (opposing hands, additional sites, including various derivatives for MIR)

Score maps based on each criteria Calculate map correlation coefficient (CC) to model map (no density modification, shift origin if necessary) Model map 1VQB, 2.6 Å, SG C2 SOLVE MAD map CC=0.62 Inverse-hand map CC=0.55

Skew of electron density 0.8 density Skew of electron 0.6 0.4 0.2 0.0 1-1.99 A 2-2.99 A 3.0-3.5 A -0.2 0.0 0.2 0.4 0.6 0.8 1.0 CC to model map

Correlation of local RMS density (Solvent next to solvent, protein next to protein)

Using scoring criteria to estimate the quality of a map Skew depends on CC Estimate CC from skew 0.8 1.0 kew of electron de ensity 0.6 0.4 0.2 CC to model ma ap 0.8 0.6 0.4 CC=0.6-0.7 S 0.0 0.2 Skew=0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0-0.2 0.0 0.2 0.4 0.6 0.8 CC to model map Skew of electron density

How accurate are estimates of map quality? Bayesian estimates of CC using Skew 1.0 0.8 Actual quality CC to mod del map 0.6 0.4 0.2 00 0.0-0.2-0.2 0.0 0.2 0.4 0.6 0.8 1.0 Predicted CC to model map Estimated quality

Estimated map quality in practice Evaluating solutions to a 2-wavelength MAD experiment (JCSG Tm3681, 1VPM, SeMet 1.6 Å data) Data for HYSS Sites Estimated CC ± 2SD Actual CC Peak 12 0.73 ± 0.04 00 0.72 Peak (inverse hand) 12 0.11 ± 0.43 0.04 F A F A (inverse) 12 073± 0.73 ± 003 0.03 12 0.11 ± 0.42 072 0.72 0.04 Sites from diff Fourier 9 0.70 ± 0.17 0.69 What to do next: Follow up on all the solutions that MIGHT be the best (within 2 SD of the top)

Statistical density modification (RESOLVE) Principle: phase probability information from probability of the map and from experiment: P( )= P map probability ( ) P experiment ( ) Phases that lead to a believable map are more probable than those that do not A believable map is a map that has a relatively flat solvent region NCS (if appropriate) A distribution of densities like those of model proteins Method: calculate how map probability varies with electron density deduce how map probability varies with phase combine with experimental phase information

Map probability phasing: Getting a new probability distribution for each phase given estimates of all others 1. Identify expected features of map (flat far from center) 2. Calculate map with current estimates of all structure factors except one (k) 3. Test all possible phases for structure factor k (for each phase, calculate new map including k) 4. Probability of phase estimated from agreement of map with expectations 5. Phase probability of reflection k from map is independent of starting phase probability because reflection k is omitted from the map A function that is (relatively) flat far from the origin Function calculated from estimates of all structure factors but one (k) Test each possible phase of structure factor k. P( ) is high for phase that leads to flat region

A map-probability function allowing different weighting of information from different parts of the map Log-probability of the map is sum over all points in map of local log-probability A map with a flat (blank) solvent region is a likely map Local log-probability is believability bilit of the value of electron density ( (x)) found at this point If the point is in the PROTEIN region, most values of electron density ( (x)) are believable If the point is in the SOLVENT region, only values of electron density near zero are believable

Rapid building of models for regions containing regular secondary-structure Helices: Identification: rods of density at low resolution Strands: Identification: structure as nearly-parallel pairs of tubes Any protein chains (trace_chain): Identification: C positions consistent with density and geometry of protein chains RNA/DNA: Identification: match of density to averaged A or B-form template

Model -helix; 3 Å map

Model -helix; 7 Å map

Find points along tubes of density in 7 Å map

Trace along tubes of density in 7 Å map

Trace main-chain with ideal helix, allowing curvature 2 Å radius, 5.4 Å /turn ideal helix

Identify direction and C position from overlap with 4 Å radius helices offset +/- 1 Å from main-chain 4 Å radius, 5.4 Å /turn ideal helix offset +1 Å along x 4 Å radius, 5.4 Å /turn ideal helix offset -1 Å along x

Choose best-fitting helices; link together if necessary

Comparison with model helix

A real case: 1T5S SAD map (3.1 Å)

A real case: 1T5S SAD map (7 Å)

Finding helices in 1T5S SAD map (7 Å)

Finding helices in 1T5S SAD map (3.1 Å)

Helices from 1T5S SAD map compared with 1T5S (3.1 Å)

PHENIX AutoSol Wizard Scale and analyze X-ray data (SAD/MAD/MIR/Multiple datasets) HYSS heavy-atom search on each dataset t Initial phasing (2.5 Å) Score and rank solutions (Which hand? Which dataset gave best solution?) Final phasing and density modification Build secondary-structure-only model

AutoSol fully automatic tests with structure library (MAD datasets, HYSS search, SOLVE/RESOLVE phases)

AutoSol fully automatic tests with structure library (MAD datasets, HYSS search, Phaser phases)

AutoSol fully automatic tests with structure library (MIR datasets, HYSS search, SOLVE/RESOLVE phases)

RESOLVE model-building at moderate resolution FFT-based identification of helices and strands Extension with tripeptide libraries Probabilistic sequence alignment Automatic molecular assembly

Initial model-building strand fragments

Chain extension (result: many overlapping fragments)

Main-chain as a series of fragments (choosing the best fragment at each location) 3 1 4 2

Side-chain template matching to identify sequence alignment to map (IF5A data) Relative probability for each amino acid at each position (Correct amino acids in bold) # G A S V I L M C F Y K R W H E D Q N P T 1 6 5 4 18 18 6 1 1 1 2 6 2 2 1 9 6 1 0 1 4 2 4 11 14 37 5 2 0 2 0 0 2 3 0 0 1 2 0 0 0 6 3 11 23 5 12 5 3 2 0 1 3 7 3 1 0 5 3 2 0 2 2 4 7 9 6 16 8 5 2 0 1 3 8 4 1 0 7 6 2 0 3 4 5 31 7 3 7 4 2 1 0 1 3 5 4 1 0 6 2 2 0 11 1 6 1 3 3 41 14 8 0 0 0 0 2 1 0 0 2 4 0 0 1 9 7 0 0 0 0 0 0 0 0 15 63 1 0 17 1 0 0 0 0 0 0 8 2 3 6 23 10 6 2 1 0 1 4 3 0 0 5 16 1 0 1 6 9 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Addition of side-chains to fixed main-chain positions 3 1 4 2

Iterative density modification, model-building and refinement with the PHENIX AutoBuild Wizard Fp, Fp, phases, HL HL coefficients Density modify (with (with NCS, NCS, density density histograms, histograms, solvent solvent flattening, flattening, fragment fragment ID, ID, local local pattern pattern ID) ID) Build and and score models Refine with with phenix.refine Density modify including model information Evaluate final final model

Generating one very good model with the PHENIX AutoBuild Wizard Near-final model Rebuild model in overlapping 6-residue segments Combine best parts of each model 5 rebuilt models Final model

AutoBuild tests with structure library Fully automated iterative model-building, final R/Rfree AutoBuild R and Fr reer valu ues (one e-good-m model) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 R Rfree 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Resolution (Å)

AutoBuild tests with structure library Final Rfree with one-good-model vs standard AutoBuild 0.6 05 0.5 FreeR d-model One-goo 0.4 0.3 0.2 01 0.1 0.0 00 0.0 01 0.1 02 0.2 03 0.3 04 0.4 05 0.5 06 0.6 AutoBuild Free R

What can you do with automated procedures for structure solution and model-building? If a task is modular and automated you can run it many times checking different space groups, datasets to use checking if your model is biasing your map checking if you always get the same model

Iterative-Build OMIT procedure 2mFo-DFc omit map After building outside OMIT region 10 cycles 1HP7 molecular replacement with 1AS4 R/Rfree after initial refinement: 0.41/0.48

Multiple-model representation of uncertainties 20 models built iltfor 1CQP, no waters, Dmin=2.6 26A R=0.19-0.20; 019020 Rfree=0.26-0.27 026027 The variation among models is a lower bound on their uncertainty

What else can you do with automated procedures for structure solution and model-building? If a task is modular and automated you can run it focusing on different parts of the structure build the RNA and then the protein build the helices in a low resolution map... use cross-crystal averaging in density modification build a protein model and then add ligands

Building RNA Group II intron at 3.5 Å. Data courtesy of J. Doudna

Finding helices Ca 2+ ATPase SAD map at 3.1 Å. Data courtesy of P. Nissen

Statistical density modification with cross-crystal averaging Cell receptor at 3.5/3.7 Å. Data courtesy of J. Zhu Crystal 1 (4 copies) Crystal 2 (2 copies) RESOLVE density modification PHENIX Multi-crystal averaging

Automated fitting of flexible ligands FAD (0.95 Å, 1N1P) 8-(2,5-DIMETHOXY-BENZYL)-2- FLUORO-9-PENT-9H-PURIN-6-YLAMINE (2.2 Å 1UYI)

phenix.find_all_ligands 1J4R (3 molecules of FKB12) FKB12 Peptide decoy Site 1 Site 2 Site 3

The future: many hard problems remain in macromolecular crystallography y Automatically identifying and building all ligands, metals, waters Building multiple conformers Building poorly-defined regions Building complexes of protein and nucleic acid Representation of uncertainties in models Choosing optimal data (multiple crystals, multiple soaks) to use Automatic analysis of radiation damage Optimal structure solution in the presence of twinning and many more

PHENIX AT ARGONNE CCP4 WORKSHOP Paul Adams Tom Terwilliger www.phenix-online.org org phenix.doc for help phenix.autosol, phenix.autobuild phenix.refine

The PHENIX project Computational Crystallography Initiative (LBNL) Paul Adams, Ralf Grosse-Kunstleve, Peter Zwart, Nigel Moriarty, Nicholas Sauter, Pavel Afonine Los Alamos National Lab (LANL) Tom Terwilliger, Li-Wei Hung Cambridge University Randy Read, Airlie McCoy, Gabor Bunkoczi, Rob Oeffner Duke University Jane Richardson, David Richardson, Jeff Headd, Vincent Chen Texas A&M University Tom Ioerger, Jim Sacchettini http://www.phenix-online.org