X-ray Crystallography I. James Fraser Macromolecluar Interactions BP204

Similar documents
Summary of Experimental Protein Structure Determination. Key Elements

Direct Method. Very few protein diffraction data meet the 2nd condition

X-ray Crystallography

Linking data and model quality in macromolecular crystallography. Kay Diederichs

Molecular Biology Course 2006 Protein Crystallography Part II

Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat

Data quality indicators. Kay Diederichs

Ensemble refinement of protein crystal structures in PHENIX. Tom Burnley Piet Gros

Crystal lattice Real Space. Reflections Reciprocal Space. I. Solving Phases II. Model Building for CHEM 645. Purified Protein. Build model.

Protein Crystallography Part II

Macromolecular Crystallography Part II

SHELXC/D/E. Andrea Thorn

Full wwpdb X-ray Structure Validation Report i

Resolution: maximum limit of diffraction (asymmetric)

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

ABSTRACT. Chong Zhang

Macromolecular Crystallography Part II

Resolution and data formats. Andrea Thorn

CCP4 Diamond 2014 SHELXC/D/E. Andrea Thorn

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Structure factors again

SUPPLEMENTARY INFORMATION

Protein Crystallography

Anisotropy in macromolecular crystal structures. Andrea Thorn July 19 th, 2012

1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?!

Protein crystallography. Garry Taylor

Full wwpdb X-ray Structure Validation Report i

wwpdb X-ray Structure Validation Summary Report

Full wwpdb X-ray Structure Validation Report i

Ultra-high resolution structures in validation

X-ray Crystallography. Kalyan Das

The structure of Aquifex aeolicus FtsH in the ADP-bound state reveals a C2-symmetric hexamer

Handout 12 Structure refinement. Completing the structure and evaluating how good your data and model agree

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Proteins. Central Dogma : DNA RNA protein Amino acid polymers - defined composition & order. Perform nearly all cellular functions Drug Targets

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract

Fourier Syntheses, Analyses, and Transforms

RNA protects a nucleoprotein complex against radiation damage

Data quality noise, errors, mistakes

X-ray Diffraction. Diffraction. X-ray Generation. X-ray Generation. X-ray Generation. X-ray Spectrum from Tube

Full wwpdb X-ray Structure Validation Report i

Crystals, X-rays and Proteins

Electron Density at various resolutions, and fitting a model as accurately as possible.

Refinement of Disorder with SHELXL

Nitrogenase MoFe protein from Clostridium pasteurianum at 1.08 Å resolution: comparison with the Azotobacter vinelandii MoFe protein

X-Ray structure analysis

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

X-ray Data Collection. Bio5325 Spring 2006

Phase problem: Determining an initial phase angle α hkl for each recorded reflection. 1 ρ(x,y,z) = F hkl cos 2π (hx+ky+ lz - α hkl ) V h k l

Molecular Modeling lecture 2

Schematic representation of relation between disorder and scattering

PSD '18 -- Xray lecture 4. Laue conditions Fourier Transform The reciprocal lattice data collection

SOLID STATE 18. Reciprocal Space

Acta Crystallographica Section F

3.012 Structure An Introduction to X-ray Diffraction

X- ray crystallography. CS/CME/Biophys/BMI 279 Nov. 12, 2015 Ron Dror

Model and data. An X-ray structure solution requires a model.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

CS273: Algorithms for Structure Handout # 13 and Motion in Biology Stanford University Tuesday, 11 May 2003

Chapter 2. X-ray X. Diffraction and Reciprocal Lattice. Scattering from Lattices

Small Molecule Crystallography Lab Department of Chemistry and Biochemistry University of Oklahoma 101 Stephenson Parkway Norman, OK

SUPPLEMENTARY INFORMATION

Plasmid Relevant features Source. W18N_D20N and TrXE-W18N_D20N-anti

SOLID STATE 9. Determination of Crystal Structures

Molecular Replacement (Alexei Vagin s lecture)

Experimental phasing, Pattersons and SHELX Andrea Thorn

This is an author produced version of Privateer: : software for the conformational validation of carbohydrate structures.

HTCondor and macromolecular structure validation

Rigid body Rigid body approach

NMR, X-ray Diffraction, Protein Structure, and RasMol

Copyright WILEY-VCH Verlag GmbH, D Weinheim, 2000 Angew. Chem Supporting Information For Binding Cesium Ion with Nucleoside Pentamers.

shelxl: Refinement of Macromolecular Structures from Neutron Data

APPENDIX E. Crystallographic Data for TBA Eu(DO2A)(DPA) Temperature Dependence

How to interpret the BUSTER reciprocal space correlation coefficients plot. Gérard Bricogne

Automated identification of functional dynamic contact networks from X-ray crystallography

Likelihood and SAD phasing in Phaser. R J Read, Department of Haematology Cambridge Institute for Medical Research

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

Full wwpdb/emdatabank EM Map/Model Validation Report i

Twinning. Andrea Thorn

PLATON/SQUEEZE. Ton Spek. Bijvoet Center Utrecht University, The Netherlands. PLATON Workshop

Physics with Neutrons I, WS 2015/2016. Lecture 11, MLZ is a cooperation between:

PSD '17 -- Xray Lecture 5, 6. Patterson Space, Molecular Replacement and Heavy Atom Isomorphous Replacement

Rietveld Structure Refinement of Protein Powder Diffraction Data using GSAS

Macromolecular X-ray Crystallography

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Acta Cryst. (2017). D73, doi: /s

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Protein Structure Determination. Part 1 -- X-ray Crystallography

disordered, ordered and coherent with the substrate, and ordered but incoherent with the substrate.

A Primer in X-ray Crystallography for Redox Biologists. Mark Wilson Karolinska Institute June 3 rd, 2014

Full wwpdb NMR Structure Validation Report i

Scattering Lecture. February 24, 2014


Structure and Dynamics : An Atomic View of Materials

IgE binds asymmetrically to its B cell receptor CD23

Experimental Phasing with SHELX C/D/E

Transcription:

X-ray Crystallography I James Fraser Macromolecluar Interactions BP204

Key take-aways 1. X-ray crystallography results from an ensemble of Billions and Billions of molecules in the crystal 2. Models in the PDB are often sub-optimal and can contain errors 3. Intensity of spots relates to the electron density (which relates to the molecules) in the unit cell 4. Positions of spots relates to the arrangement of unit cells in the crystal 5. Every spot contains contributions from every part of the crystal. Every part of the map contains contributions from every spot

Key outcomes Understand Table 1 in X-ray Papers (now often Table S1 ) Understand the basic workflow of determining a crystal structure Embrace the beauty and challenge of disorder at high and low resolution

Today we are going to tackle crystallography in reverse Texts begin with diffraction theory from a series of point atoms (e.g. Biomolecular Crystallography, Rupp; Principles of Protein X-ray Crystallography, Drenth; Crystallography Made Crystal Clear, Rhoades) Bob teaches mini-course in Spring with this level of detail Today - model to reflections;tomorrow - phasing

What is a protein structure?

What is a protein structure Is it a: pretty cartoon... space-filling set of spheres... picture of the protein in the crystal... computational picture of the protein... representation of atoms that satisfies experimental constraints... PDB formatted text file... model!!!

Moreover... a model of the crystal lattice...

ProteinDataBank Files are text: chemistry, sequence, position, certainty HEADER HYDROLASE 10-DEC-06 2O7A TITLE T4 LYSOZYME C-TERMINAL FRAGMENT COMPND MOL_ID: 1; COMPND 2 MOLECULE: LYSOZYME; REMARK 3 FIT TO DATA USED IN REFINEMENT (NO CUTOFF). REMARK 3 R VALUE (WORKING + TEST SET, NO CUTOFF) : NULL REMARK 3 R VALUE (WORKING SET, NO CUTOFF) : 0.090 REMARK 3 FREE R VALUE (NO CUTOFF) : 0.108...... ATOM 1 N VAL A 2-19.742-2.254-19.976 1.00 54.44 N ATOM 2 CA VAL A 2-19.867-2.152-18.529 1.00 54.48 C ATOM 3 C VAL A 2-19.073-0.927-18.101 1.00 41.86 C ATOM 4 O VAL A 2-19.367 0.178-18.554 1.00 47.57 O ATOM 5 CB VAL A 2-19.341-3.411-17.836 1.00 68.76 C... MASTER 287 0 3 10 0 0 0 6 1566 1 22 10 END

The$universe$of$protein$structures:$$ Our$knowledge$about$protein$structures$is$increasing..$ 65,271' protein' structures' are' deposited' in' PDB' (2/15/2010).' This'number'is'growing'by'>'~7000'a'year'' Growing'input'from'Structural'Genomics'HT'structure' determinajon'(>1000'structures'a'year)' XPray' Robert'M.'Stroud'2012' 9'

How do we tell if a model is good? physically (packing, contacts) chemically (bond lengths, bond angles, chirality, planarity, torsions) crystallographically (real space fits - B- factors, R-factor) statistically (R-free, CC1/2) Most of these stats appear in Table I

Physical Checks Bad Steric clashes Good Overall clash score (number of bad overlaps per 1000 atoms) A clash: disallowed atom pair overlap 0.4 Å MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Davis et al, Nucleic Acids Research, 2007, Vol. 35

Chemical checks Bond lengths and angles - Typical values (resolutions ~1.5-2Å): rmsd(bonds)~0.02å, rmsd(angles)~2 o These values can be smaller at lower resolution (~2.5-3Å), approaching 0 at ~3Å and lower resolution, and they can be larger at higher resolution (~1.5Å and higher). Engh and Huber, Acta Cryst, 1991

Chemical checks backbone and side chain torsion angles Ramachandran plot Rotamers Rotamers: a set of conformers arising from restricted rotation about one single bond χ 2 χ 1 Typically 1-3% outliers

Are all outliers bad?

...not if justified by fit to electron density map...what forces might cause this?

...similarly for side chains Flagged as rotamer outlier Correct rotamer

... each outlier should be explainable by examining the electron density AND by forces acting in context of the whole protein

Phenix offers handy tools for looking at outliers PHENIX tools for model validation outliers in graphs also recenter Coot

What the Fobs-Fcalc are electron density maps?

Density maps can offer a model free view 2mFo-DFc (blue) mfo-dfc (red, green) m and D are de-biasing coefficients Fobs = Observed Amplitude Fcalc = Model-based Amplitude!

Maps are contoured in units of SIGMA (rmsd) Typically 1.0 for 2Fo-Fc, +/-3 for Fo-Fc Two ways of bringing a map into some scale: - Divide it by standard deviation (map in sigmas) - Include reflection F(000) and divide map by the unit cell volume. Model should be complete to estimate F(000). Map in e/å 3.

to Coot!!!

Computationally it is very beneficial to approximate the electron density arising from each atom as a Gaussian function - Electron density at the point r of an atom located at position r 0 and having B-factor B and occupancy q: 5 3 / 2 4π ρ atom (r,r 0,B,q) = q a k exp 4π 2 r r 0 k=1 b k + B b k + B - Number of terms in the above formula depends on how accurately we want to model an atom 2 q and B are hard to separate (even at very high resolution)

looks very nice: However in practice we see densities more like:

What are B-factors? ATOM 1 N VAL A 2-19.742-2.254-19.976 1.00 54.44 N ATOM 2 CA VAL A 2-19.867-2.152-18.529 1.00 54.48 C ATOM 3 C VAL A 2-19.073-0.927-18.101 1.00 41.86 C ATOM 4 O VAL A 2-19.367 0.178-18.554 1.00 47.57 O ATOM 5 CB VAL A 2-19.341-3.411-17.836 1.00 68.76 C F calc ( ~ h)= X j f j exp 1 4 B j ~ h t ~ h exp 2 i ~ h t ~x j, (1) i.e. three coordinates ~x j =(x j,y j,z j ) and one isotropic B-Factor B j are refined for each atom j (f j is the respective scattering factor, ~ h a reciprocal lattice vector). The overall mean displacement of an atom originates from several sources: di erent conformations in di erent unit cells ( internal static disorder ) vibration or dynamic transitions within molecules ( internal dynamic disorder ) lattice defects lattice vibrations (acoustical phonons) +restraints, +model errors!

Going anisotropic means 6 parameters instead of 1 F calc ( ~ h)= X j f j exp 2 2 ~ h t U j ~ h exp 2 i ~ h t ~x j, When is this decision justified?

What is q? doi:10.1016/s0022-2836(02)00476-x available online at http://www.idealibrary.com on Bw J. Mol. Biol. (2002) 320, 783 799 Structural Basis for Mobility in the 1.1 Å Crystal Structure of the NG Domain of Thermus aquaticus Ffh Ursula D. Ramirez 1, George Minasov 1, Pamela J. Focia 1 Robert M. Stroud 2, Peter Walter 2, Peter Kuhn 3 and Douglas M. Freymann 1 *

Difference Maps: Fo-Fc Error in position Error in occupancy Error in B-factor Should be green, not blue...

Model anisotropic atom with isotropic Add positional error

Ser residue needs a different rotamer

Fo#Fc%maps%iden.fy%everything%ordered%that%is%'missing'% mapmap% #Eliminate%Bias% #Half%electron%content% #See%electrons% Robert%M.%Stroud%2012% 70%

Also useful in dynamic crystallography

Refinement is the process of minimizing Fo-Fc...need to balance prior knowledge and data...an iterative process, difference maps minimized, and 2Fo-Fc maps improve (phases... we are coming to this)

Structure refinement is a process of changing a model parameters in order to optimize a goal (target) function: T = F(Experimental data, Model parameters, A priori knowledge) - Experimental data a set of diffraction amplitudes Fobs (and phases, if available). - Model parameters: coordinates, ADP, occupancies, bulk-solvent, - A priori knowledge (restraints or constraints) additional information that may be introduced to compensate for the insufficiency of experimental data (finite resolution, poor data-to-parameters ratio) Typically: T = T DATA + w*t RESTRAINTS - E DATA relates model to experimental data - E RESTRAINTS represents a priori knowledge - w is a weight to balance the relative contribution of E DATA and E RESTRAINTS

Gradient-driven minimization Target function profile Simulated annealing (SA) Target function profile Local minimum Global minimum Deeper local minimum Global minimum Grid search (Sample parameter space within known range [X MIN, X MAX ]) Hands & eyes (Via Coot) X MIN solution XMAX Target function profile Local minima Global minimum

How do we tell if a model is good? physically (packing, contacts) chemically (bond lengths, bond angles, chirality, planarity, torsions) crystallographically (real space fits - B- factors, R-factor) statistically (R-free, CC1/2) Most of these stats appear in Table I

Hands and Eyes are still important! Minimization Simulated Annealing Real-space grid search Both minimization and SA can fix it This is beyond the convergence radius for minimization This is beyond the convergence radius for minimization and SA

CC = % ' & $ " OBS # " OBS $ " CALC # " CALC grid points $ grid points " OBS # " OBS 2 grid points $ grid points " CALC # " CALC 2 ( * ) 1/ 2 Scale independent Can be computed for the whole structure (not really interesting you already have R-factor) or locally (most interesting; typically computed per residue) Values greater than ~0.8 indicate good correlation May give high correlation for weak densities Map CC is correlated with B-factor: poorly defined regions typically have low map CC and high B-factors

although this emphasizes local adjustments, refinement is global Every&X(ray&reflec,on&(h,k,l)&has&a&contribu,ng&wave&from&all&atoms&.& & & & ρ(x,y,z)&=&σ & F (h,k,l) &e ((2πi(hx+ky+lz))& & & F (h,k,l) &=&Σ j &f j &e (2πi & (hx+ky+lz))& or ρ(x,y,z)&=&σ F (h,k,l) &e ((2πi(hx+ky+lz)&+&φ hkl )& & Every&point&in&the&density&map&has&contribu,ons&from&every&reflec,on& & &

R-factor formula R-factor values: - Expected value for a random model R~59% - You can see some model in 2mFo-DFc map, R~30% - You can see most of the model in 2mFo-DFc map, R<20% - Perfect model R~0% R = # reflections # F OBS reflections F OBS " F MODEL F MODEL = k OVERALL e "su CRYSTAL s t F CALC_ATOMS + k SOL e " B 2 # SOL s & 4 % F MASK ( $ ' Sometimes the R-factor looks very good (you would expect a good model) but the model-to-map fit is terrible Overfitting.

Let s suppose: (red, blue or green) is the model: y = ax + b (2 parameters: a and b) is the data. Lot s of data one single correct model Less data more ambiguity, less certainty: a bunch of models Little data variety of models: from good to completely wrong R-factor is good R-factor may be good too R-factor = 0 for all models (including wrong ones)

Let s suppose: model: y = ax + b (2 parameters: a and b) data model described using more parameters: y=ax 2 +bx+c model described using even more parameters: y=a 1 x n +a 2 x n-1 + Less parameters More parameters Much more parameters R-factor is good R-factor is better R=0

What leads to overfitting? - Insufficient amount of data (low resolution, poor completeness) - Ignoring data (cutting by resolution, sigma, anisotropy correction) - Inoptimal parameterization - Excess of imagination - Bad weights Choice for model parameterization depends on amount of available data and its resolution Key resolution limits and corresponding features

Solution: cross-validation (R-free factor): - At the beginning of structure solution split the data into two sets: test set (~5-10% of randomly selected data), and work set (the rest). - From this point on you look at two R-factors: R-work (computed using work set), and R-free (computed using test set) Dataset (F OBS ) work test Work set reflections are used for everything: model building, refinement, map calculation, Test set reflections are never used for any model optimization, expect Rfree factor calculation Rationale: the model that fits well ~90% of work set should fit well 10% of excluded data (test set). Since test set data does not participate in refinement, Rfree > Rwork. The gap Rfree Rwork depends on resolution and ranges from 5-7% (at medium to low resolution) to ~0.5A 1% (at ultra-high resolution)

Why does Rfree work so well? Every&X(ray&reflec,on&(h,k,l)&has&a&contribu,ng&wave&from&all&atoms&.& & & & ρ(x,y,z)&=&σ & F (h,k,l) &e ((2πi(hx+ky+lz))& & & F (h,k,l) &=&Σ j &f j &e (2πi & (hx+ky+lz))& or ρ(x,y,z)&=&σ F (h,k,l) &e ((2πi(hx+ky+lz)&+&φ hkl )& & Every&point&in&the&density&map&has&contribu,ons&from&every&reflec,on& & &

What the are Fhkl reflections, structure factors, amplitudes, spots?

We rotate the crystal to place a different set of reflections on the detector Robert(M.(Stroud(2012( 10(

Ewald sphere construction given: wavelength angle lattice distance from detector orientation of lattice relative to detector predicts: which diffracted waves satisfy Bragg s law

Each reflection is measured multiple times F = sqrt(intensity) SigI = error in Intensity (resulting from multiple observations)

Where to cut the data? I/sigma - background Rmerge - consistency CC1/2, CC* - effect on refinement (Karplus and Diederichs, Science, 2012) Robert(M.(Stroud(2012( 10(

If you have too many overloads

if you throw out weak (low res) data

if you randomly miss data (like Rfree)

if you miss slices of data (bad strategy) - why you need a whole dataset

Sca-ering#pa-ern#is#the## Fourier#transform#of#the#structure## FT# FT =1# F(S)#=#Σ j #f j #e (2πirj.S)# ## Structure#is#the# inverse # Fourier#transform#of#the## Sca-ering#pa-ern### ρ(r)"="σ " F(S)"e (&2πir.S)"

A crystal only samples the parts of the transform that satisfy Bragg s Law a% b% FT% FT #1% 1/b%

F(h,k,l)&=&Σj&fj&e(2πi&(hx+ky+lz))& Every&X(ray&reflec,on&(h,k,l)&has&a&contribu,ng&wave&from&all&atoms&.& & & & ρ(x,y,z)&=&σ&f(h,k,l)&e((2πi(hx+ky+lz))& & & or ρ(x,y,z)&=&σ F(h,k,l) &e((2πi(hx+ky+lz)&+&φhkl)& & Every&point&in&the&density&map&has&contribu,ons&from&every&reflec,on& & & Fourier Transform

Waves have phase too... next lecture - and paper... how model phases bias our maps and how to solve the phase problem & & or ρ(x,y,z)&=&σ F (h,k,l) &e ((2πi(hx+ky+lz)&+&φ hkl )& & Every&point&in&the&density&map&has&contribu,ons&from&every&reflec,on& & &

Key take-aways 1. X-ray crystallography results from an ensemble of Billions and Billions of molecules in the crystal 2. Models in the PDB are often sub-optimal and can contain errors 3. Intensity of spots relates to the electron density (which relates to the molecules) in the unit cell 4. Positions of spots relates to the arrangement of unit cells in the crystal 5. Every spot contains contributions from every part of the crystal. Every part of the map contains contributions from every spot

Key outcomes Understand Table 1 in X-ray Papers (now often Table S1 ) Understand the basic workflow of determining a crystal structure Embrace the beauty and challenge of disorder at high and low resolution