Molecular Replacement. Airlie McCoy

Similar documents
The Crystallographic Process

Molecular replacement. New structures from old

Structure solution from weak anomalous data

Phaser: Experimental phasing

PHENIX Wizards and Tools

The Crystallographic Process

Charles Ballard (original GáborBunkóczi) CCP4 Workshop 7 December 2011

Molecular Replacement (Alexei Vagin s lecture)

MR model selection, preparation and assessing the solution

Likelihood and SAD phasing in Phaser. R J Read, Department of Haematology Cambridge Institute for Medical Research

Ensemble refinement of protein crystal structures in PHENIX. Tom Burnley Piet Gros

Direct Method. Very few protein diffraction data meet the 2nd condition

Q1 current best prac2ce

research papers Iterative-build OMIT maps: map improvement by iterative model building and refinement without model bias 1.

Pseudo translation and Twinning

Twinning. Andrea Thorn

ACORN - a flexible and efficient ab initio procedure to solve a protein structure when atomic resolution data is available

Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat

Pipelining Ligands in PHENIX: elbow and REEL

research papers An introduction to molecular replacement 1. Introduction Philip Evans a * and Airlie McCoy b

SHELXC/D/E. Andrea Thorn

Twinning in PDB and REFMAC. Garib N Murshudov Chemistry Department, University of York, UK

Pathogenic C9ORF72 Antisense Repeat RNA Forms a Double Helix with Tandem C:C Mismatches

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Supplementary Information

Maximum Likelihood. Maximum Likelihood in X-ray Crystallography. Kevin Cowtan Kevin Cowtan,

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract

From x-ray crystallography to electron microscopy and back -- how best to exploit the continuum of structure-determination methods now available

The structure of Aquifex aeolicus FtsH in the ADP-bound state reveals a C2-symmetric hexamer

SOLVE and RESOLVE: automated structure solution, density modification and model building

GC376 (compound 28). Compound 23 (GC373) (0.50 g, 1.24 mmol), sodium bisulfite (0.119 g,

Crystal lattice Real Space. Reflections Reciprocal Space. I. Solving Phases II. Model Building for CHEM 645. Purified Protein. Build model.

CCP4 Diamond 2014 SHELXC/D/E. Andrea Thorn

Generalized Method of Determining Heavy-Atom Positions Using the Difference Patterson Function

Practical aspects of SAD/MAD. Judit É Debreczeni

Integra(ng and Ranking Uncertain Scien(fic Data

Protein Structure Prediction, Engineering & Design CHEM 430

Electron Density at various resolutions, and fitting a model as accurately as possible.

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

X-ray Crystallography

Web-based Auto-Rickshaw for validation of the X-ray experiment at the synchrotron beamline

BIOINF 4371 Drug Design 1 Oliver Kohlbacher & Jens Krüger

Supporting Information

GG612 Lecture 3. Outline

Automated ligand fitting by core-fragment fitting and extension into density

PSD '17 -- Xray Lecture 5, 6. Patterson Space, Molecular Replacement and Heavy Atom Isomorphous Replacement

Full wwpdb X-ray Structure Validation Report i

Image Data Compression. Dirty-paper codes Alexey Pak, Lehrstuhl für Interak<ve Echtzeitsysteme, Fakultät für Informa<k, KIT

wwpdb X-ray Structure Validation Summary Report

Overview - Macromolecular Crystallography

Full wwpdb X-ray Structure Validation Report i

Experimental phasing in Crank2

Protein Crystallography

research papers 1. Introduction Thomas C. Terwilliger a * and Joel Berendzen b

Orientational degeneracy in the presence of one alignment tensor.

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Supporting Information. Full and Partial Agonism of a Designed Enzyme Switch

Pedro Alexandrino Fernandes, Dep. Chemistry & Biochemistry, University of Porto, Portugal

Macromolecular Phasing with shelxc/d/e

Ensemble Data Assimila.on and Uncertainty Quan.fica.on

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Folding Thermodynamics of Protein-Like Oligomers with Heterogeneous Backbones

Kirill Prokofiev (NYU)

Mul$- model ensemble challenge ini$al/model uncertain$es

Statistical density modification with non-crystallographic symmetry

Full wwpdb X-ray Structure Validation Report i

Biology III: Crystallographic phases

Full wwpdb X-ray Structure Validation Report i

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB

Full wwpdb X-ray Structure Validation Report i

One- factor ANOVA. F Ra5o. If H 0 is true. F Distribu5on. If H 1 is true 5/25/12. One- way ANOVA: A supersized independent- samples t- test

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

So I have an SD File What do I do next? Rajarshi Guha & Noel O Boyle NCATS & NextMove So<ware

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Data Envelopment Analysis (DEA) with an applica6on to the assessment of Academics research performance

Anisotropy in macromolecular crystal structures. Andrea Thorn July 19 th, 2012

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

CS 6140: Machine Learning Spring 2016

Axel T. Brunger, Debanu Das, Ashley M. Deacon, Joanna Grant, Thomas C. Terwilliger, Randy J. Read, Paul D. Adams, Michael Levitt and Gunnar F.

SUPPLEMENTARY INFORMATION

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

research papers Detecting outliers in non-redundant diffraction data 1. Introduction Randy J. Read

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

ASTRO 310: Galac/c & Extragalac/c Astronomy Prof. Jeff Kenney

DART Tutorial Sec'on 1: Filtering For a One Variable System

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Exploiting Protein Conformational Change to Optimize Adenosine-Derived Inhibitors of HSP70

Practice Paper Set 2 MAXIMUM MARK 100 FINAL. GCSE (9-1) MATHEMATICS J560/06 Paper 6 (Higher Tier) PRACTICE PAPER (SET 2) MARK SCHEME

Priors in Dependency network learning

Smith Chart The quarter-wave transformer

Parallelizing Gaussian Process Calcula1ons in R

Ultra-high resolution structures in validation

SUPPLEMENTARY INFORMATION

electronic reprint PHENIX: a comprehensive Python-based system for macromolecular structure solution

Computational engineering of cellulase Cel9A-68 functional motions through mutations in its linker region. WT 1TF4 (crystal) -90 ERRAT PROVE VERIFY3D

Transcription:

Molecular Replacement Airlie McCoy

Molecular Replacement Find orienta5on and posi5on where model overlies the target structure Borrow the phases Then it becomes a refinement problem the phases change known structure H K L F φ 0 0 1 12.6 120 0 0 2 2.1 10 0 0 3 69.9 280 etc... copy unknown structure H K L F φ 0 0 1 10.4 120 0 0 2 3.1 10 0 0 3 52.2 280 etc...

Known structure Molecular Replacement Unknown homologous structure (Φ,Ψ,Κ) Rotation (X,Y,Z) Translation origin

Known structure Molecular Replacement Unknown homologous structure (Φ,Ψ,Κ) Rotation (X,Y,Z) Translation origin

Contents of the Asymmetric Unit You have to find ALL the molecules in your asymmetric unit

MaBhew s coefficient First calculated by Brian MaBhews in 1968 (over 3500 cita5ons) Most crystals are 50% protein by volume Can be used to es5mate the contents of the asymmetric unit Figure 1: Kantardjieff and Rupp (2003)

(Φ,Ψ,Κ) (X,Y,Z) Programs differ in search method and scoring function

Molecular Replacement Issues 1. How to score each orienta5on and posi5on so as to find when the model best fits the target structure 2. How to search for solu5ons: strategies for exploring rota5ons and transla5ons MR can fail due to subop5mal choices in either Choice of search method and scoring func5on are not independent Different scoring func5ons allow different search methods

Search strategy Each molecule needs 6 parameters (6D) An exhaus5ve search is big All angles sampled at 2.5 ; N rot = 1.5 10 6 All posi5ons sampled at 1Å in a 100Å cubic cell; N tra = 1.0 10 6 6 dimensional search is N rot N tra = 1.5 10 12 points This is only ONE component of asymmetric unit MR search strategies can be divided into rota5on and transla5on separately (2x3D) N rot +N tra = 2.5 10 6 points

Scoring What is the best match between the observed and calculated structure factors Can use PaBerson methods/correla5on Coefficient Maximum Likelihood methods Can use Structure factor amplitudes Structure factor intensi5es

Sohware MolRep AMoRe CNS EPMR COMO SOMoRe Queen of Spades PaBerson RF/CC Amplitudes RF+TF, independent CC (TF) Amplitudes 6D (but not exhaus5ve) Maximum likelihood Intensities RF+TF, cumulative, amalgamation

Some Important Features of Phaser ML can take account of errors Model errors can be very large Data errors can be very large for weak reflec5ons ML allows solu5ons to be built up by addi5on Expected LLG allows ar5ficial intelligence to be built into resolu5on selec5on, search order, and termina5on criteria Bias free structure pruning Transla5onal NCS correc5on allows new classes of structures to be solved by MR

PaBerson Scoring PaBerson is the FT of the amplitude 2 and the phases set to zero Can be calculated from the intensi5es This is the vector map of the atoms Can be deconvoluted if the structure is small molecule Patterson Vector Map

Maximum Likelihood Scoring Use probability Probabili5es account for errors PaBerson methods cannot do this LLGI = h 2E log e 1 D 2 2 obs σ exp E 2 + D 2 e obs σ 2 2 A E C A 1 D 2 2 obs σ I 2E e D obs σ A E C 0 A 1 D 2 2 obs σ A E e and D obs are defined as in (Read & McCoy, 2016); E e is the effective E, representing information derived from E obs2, and D obs represents the reduction in correlation between observation and E e arising from experimental error; E obs 2 = I obs /(εσ N ) where ε and Σ N includes correction terms for anisotropy and tncs modulations

Brute rota5on func5on search Place model at orienta5ons and calculate probability of each orienta5on being correct

Euler Angles

Peak selec5on Must chose a selec5on criteria to carry poten5al solu5ons through to the next step By default, solu5ons over 75% of the difference between the top peak and the mean are selected

Brute transla5on func5on search Place model at points in unit cell and calculate probability that it is in each posi5on

When is a model correctly placed? TF Z-score Solved? < 5 no 5-6 unlikely 6-7 possibly 7-8 probably > 8 definitely

Origins Can only find the transla5on perpendicular to a rota5on axis no rota5on symmetry, no transla5on to find! If there are mul5ple symmetry axes of the same order of rota5on in a plane then the transla5on can be defined with respect to any one of these These are equivalent to different choices of origin Different MR solu5ons may be on different origins and look different when they are really the same

+ + + +

+ + + +

+ + + +

+ + + +

Packing Func5on Cα clash test Other components of solution Symmetry related copies of itself Symmetry related copies of other components of solution

Packing Func5on Hexagonal Grid clash test Other components of solution Symmetry related copies of itself Symmetry related copies of other components of solution

Packing Func5on Mixed Hexagonal Grid and Cα clash test Other components of solution Symmetry related copies of itself Symmetry related copies of other components of solution

Packing Func5on Small Medium Large all atoms Cα atoms Hexagonal Grid 28

Refinement Rota5on and Transla5on searches are scored on a grid

Figure: Martin Noble

Using Par5al Structure The MLRF and MLTF can use models that have already been placed in the asymmetric unit PaBerson RF cannot account for placed models The PaBerson Correla5on Coefficient can account for known posi5ons But most of the difficulty in MR is the rota5on func5on, because the signal to noise ra5o is much lower

Searches for mul5ple components It may not be possible to find each component in a separate search

Searches for mul5ple components ML includes par5al structure informa5on from previous placements Structure builds up by addi5on

Mul5-copy searches 34

Mul5-copy searches One RF finds all orienta5ons One TF for each orienta5on finds component Tree search generates a heavily branched search All solu5ons equivalent aher sequen5al RF/TF searches Naive search does more work than necessary

Fast Search Algorithm Phaser has a search algorithm that amalgamates more than one solu5on per RF/ TF pair Fixes origin with first, then reuses RF peaks to find other placements 1 (fix origin) 2 3 4 5 6 7

Transla5onal NCS If tncs is not accounted for then TFZ > 8 does not indicate a correct placement TFZ values are always higher TFZ > 12 can be wrong When TFZ is accounted for the TFZ values are those expected of data without tncs

Transla5onal NCS Perfect tncs Real tncs Three tncs parameters refined from data alone 1. Rota5on 2. Transla5on 3. RMSD between copies tncs correc5on factors used in MR and SAD Two classes of tncs cases accounted for These cover the majority of cases

Transla5onal NCS Pairs of molecules related by one vector One peak in PaBerson Molecules in pairs There can be any number of pairs of molecules related by the same tncs vector Molecules related by mul5ples of one vector Peaks in PaBerson are mul5ples of same vector Molecules in sets related by same vector There can be any number of sets of molecules related by the same tncs vector

Twin Detec5on Transla5onal NCS masks twinning Has opposite effect on intensity sta5s5cs Correc5ng the data for tncs unmasks twinning Phaser generates cumula5ve intensity plots for centric and acentric reflec5ons aher correc5on for tncs and anisotropy Phaser gives a P-value for there being twinning in the presence of tncs

Likelihood-based molecular-replacement solu5on for a highly pathological crystal with tetartohedral twinning and sevenfold tncs Sliwiak J, Jaskolski M, Dauter Z, McCoy AJ, Read RJ. Repeats 7/2 along c* Solved finding 56 copies of the monomer in P1 28 copies in C2 (true space group) Acta Cryst D70 2014: 471 480.

Will MR work for me? Is my model good enough? Is my data good enough? Do I need to place multiple models simultaneously to get a signal? Will fragment-based MR work? Will α-helices work? How big does my helix/fragment have to be? Will single-atom MR work? First, find the criteria for deciding that MR has worked!

Final LLG for MR solu5ons Unambiguous Enrichment Solved LLG Database of over 23000 MR problems R.D. Oeffner

When is a model correctly placed? TF Z-score LLG score Solved? < 5 < 25 no 5-6 25-36 unlikely 6-7 36 49 possibly 7-8 49 64 probably > 8 > 64 definitely

Predic5ng LLG of solu5on So if you can predict the LLG You know how easy/difficult will be MR You can priori5ze structure solu5on strategies Removes uncertainty in MR Knowing when to start/stop has always been the problem with MR You can minimize the time to structure solution

Expected LLG <LLG> Total number of reflec5ons σ A : Error in the calculated structure factors Frac5on of the scabering (completeness) RMS error in the coordinates Number of residues Sequence iden5ty

MR with Fragment/Atom Fragments or even atoms are just models with low completeness Low σ A Success of fragment/atom based MR relies on other contribu5ons to the <LLG> being favourable Lots of reflec5ons Low RMS Success does NOT depend on The resolution (strictly) The type of fragment

Devolu5on Homologous structures Domains Ab Initio models Fragments Helices Atoms

<LLG> and Resolu5on Example Data <LLG> target resolu5on HEWL 1.9 Å 120 5.6 Å Ribosome 3.6 Å 120 10.8 Å

Single Atom MR Aldose Reductase 36 kda, 0.78Šresolu5on 2525 non-h atoms in structure <LLG> for first N 0.1 For first S 4 For first S 16 if B-factor is 2 Ų < Wilson B

<LLG> Pruning Occupany refinement would normally overparameterize a model ellg can be used to determine number of atoms for significant change in LLG Window offset C Window offset B Window offset A 0 1 2 3 4 5 6 7 8 9 10 11 12

<LLG> Pruning C-Amp-Protein K (1ctp solved with 1atp) Clash due to conforma5onal change ellg guided pruning removes sec5ons of smaller domain Pruned structure passes packing tests

Gyre and Gimble ML replacement for PCrefinement

New approaches

The pathway of structure solu5on Historically, there has been a linear progression through structure solu5on You had to be sure each step is correct before progressing to the next When signal is low you cannot be sure (of anything) Find best model Molecular replacement Model building

New approaches Take mul5ple possibili5es for each step and uses subsequent steps to dis5nguish correct from incorrect solu5ons Enables structure solu5on when signal is low Find best model Molecular Replacement Model Building

phaser.mrage Fetches models and processes using sculptor For each par5al structure model MR is farmed out to a cluster in a highly parallel manner Calcula5ons are performed in the order of sequence iden5ty or LLG score at each stage Explora5on con5nues un5l a solu5on is found All alterna5ve models are superposed onto the solu5on and refined. This allows the quick evalua5on of model quality for a poten5ally large number of alterna5ve models. Phaser.MRage: automated molecular replacement Bunkoczi G, Echols N, McCoy AJ, Oeffner RD, Adams PD, Read RJ Acta Cryst. (2013). D69, 2276-2286

phenix.mr_roseba Find MR solu5ons with Phaser, rebuild them with ROSETTA using techniques from ab ini5o modelling (ROSETTA energy term) to bring the structures within the radius of convergence of standard rebuilding/ refinement in phenix.autobuild Correct solu5on must be in list passed to ROSETTA phenix.mr_roseba takes top 5 by default, regardless relies on enrichment Rebuilding in ROSETTA includes map informa5on through a density term in the ROSETTA energy Increasing the Radius of Convergence of Molecular Replacement by Density and Energy Guided Protein Structure Optimization DiMaio et al (2011) Nature, 473, 540-543.

Lawrence Berkeley Laboratory The Phenix Project Paul Adams, Pavel Afonine, Youval Dar, Nat Echols, Nigel Moriarty, Nader Morshed, Ian Rees, Oleg Sobolev Los Alamos National Laboratory Tom Terwilliger, Li-Wei Hung Randy Read, Airlie McCoy, Gabor Bunkoczi, Rob Oeffner, Richard Mifsud Cambridge University An NIH/NIGMS funded Program Project Duke University Jane & David Richardson, Chris Williams, Bryan Arendall, Bradley Hintze