Computational aspects of structure determination by NMR

Similar documents
Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190

PROTEIN'STRUCTURE'DETERMINATION'

NMR Assay of Purity and Folding

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules

I690/B680 Structural Bioinformatics Spring Protein Structure Determination by NMR Spectroscopy

NMR, X-ray Diffraction, Protein Structure, and RasMol

NMR in Structural Biology

Structural Bioinformatics (C3210) Molecular Mechanics

Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190

NMR structure determination of a peptide using the ARIA webportal

Molecular Dynamics Simulations. Dr. Noelia Faginas Lago Dipartimento di Chimica,Biologia e Biotecnologie Università di Perugia

Magnetic Resonance Lectures for Chem 341 James Aramini, PhD. CABM 014A

An introduction to Molecular Dynamics. EMBO, June 2016

CE 530 Molecular Simulation

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland

Figure 1. Molecules geometries of 5021 and Each neutral group in CHARMM topology was grouped in dash circle.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Molecular Mechanics. Yohann Moreau. November 26, 2015

Molecular Mechanics, Dynamics & Docking

Experimental Techniques in Protein Structure Determination

Automated NMR protein structure calculation

Useful background reading

Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar.

Lecture 11: Potential Energy Functions

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Introduction solution NMR

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Structure determination through NMR

Potential Energy (hyper)surface

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Theory and Applications of Residual Dipolar Couplings in Biomolecular NMR

Molecular dynamics simulation of Aquaporin-1. 4 nm

Exploring the energy landscape

Timescales of Protein Dynamics

Conformational Searching using MacroModel and ConfGen. John Shelley Schrödinger Fellow

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

What is Classical Molecular Dynamics?

Deuteration: Structural Studies of Larger Proteins

HADDOCK: High Ambiguity

T6.2 Molecular Mechanics

NMR-Structure determination with the program CNS

Direct Method. Very few protein diffraction data meet the 2nd condition

Timescales of Protein Dynamics

Why study protein dynamics?

Analysis of the simulation

Computational Protein Design

Biomolecules are dynamic no single structure is a perfect model

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

1) NMR is a method of chemical analysis. (Who uses NMR in this way?) 2) NMR is used as a method for medical imaging. (called MRI )

A topology-constrained distance network algorithm for protein structure determination from NOESY data

Kd = koff/kon = [R][L]/[RL]

The Molecular Dynamics Method

NMR in Medicine and Biology

Supporting Online Material for

Quantification of Dynamics in the Solid-State

TOPOLOGIES AND FORCE FIELD PARAMETERS FOR NITROXIDE SPIN LABELS

PROTEIN NMR SPECTROSCOPY

The Molecular Dynamics Method

Course Notes: Topics in Computational. Structural Biology.

k θ (θ θ 0 ) 2 angles r i j r i j

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Practical Manual. General outline to use the structural information obtained from molecular alignment

Protein Structure Prediction

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

Peptide folding in non-aqueous environments investigated with molecular dynamics simulations Soto Becerra, Patricia

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

User Guide for LeDock

A.D.J. van Dijk "Modelling of biomolecular complexes by data-driven docking"

Sequential Assignment Strategies in Proteins

SUPPLEMENTARY INFORMATION

3rd Advanced in silico Drug Design KFC/ADD Molecular mechanics intro Karel Berka, Ph.D. Martin Lepšík, Ph.D. Pavel Polishchuk, Ph.D.

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Michael Nilges*, Maria J. Macias, SeÂan IO'Donoghue and Hartmut Oschkinat. Introduction

CS273: Algorithms for Structure Handout # 13 and Motion in Biology Stanford University Tuesday, 11 May 2003

Structurele Biologie NMR

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

CAP 5510 Lecture 3 Protein Structures

Solving the three-dimensional solution structures of larger

Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations

) (3) Structure Optimization Combining Soft-Core Interaction Functions, the Diffusion Equation Method, and Molecular Dynamics

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

HSQC spectra for three proteins

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

BMB/Bi/Ch 173 Winter 2018

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Analysis of MD trajectories in GROMACS David van der Spoel

Orientational degeneracy in the presence of one alignment tensor.

Full wwpdb NMR Structure Validation Report i

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Generating Small Molecule Conformations from Structural Data

Exercise 2: Solvating the Structure Before you continue, follow these steps: Setting up Periodic Boundary Conditions

Hyeyoung Shin a, Tod A. Pascal ab, William A. Goddard III abc*, and Hyungjun Kim a* Korea

Modeling Biological Systems Opportunities for Computer Scientists

Proteins are not rigid structures: Protein dynamics, conformational variability, and thermodynamic stability

Transcription:

Computational aspects of structure determination by NMR Alexandre Bonvin Utrecht University EMBO course Il Ciocco 2002 With contributions from - Michael Nilges and Jens Linge (Institut Pasteur, Paris) - Chris Spronk (CMBI Nijmegen) The ABC of NMR structure determination NMR structure determination steps NMR experiment Resonance assignment Structural restraints distances NOE assignment torsion angles, orientation restraints, Structure calculations Structure validation Structure determination by NMR Lengthy process: Sample preparation ( months ) Acquisition of experimental data (1-2 months) Chemical shift assignments Backbone (few days) Side-chains (few weeks) Analysis of NOE spectra Structure calculations (several months) 1

NMR experimental observables providing structural information Secondary chemical shifts Backbone conformation from secondary chemical shifts (Chemical Shift Index- CSI) Chemical shifts deviations from random coil values contain information on secondary structure Distance restraints from NOEs Backbone and side chain dihedral angle restraints from scalar couplings Hydrogen bond restraints from cross-hydrogen bond scalar couplings Orientation restraints from residual dipolar couplings Secondary chemical shifts Chemical Shift Index Chemical shift index (CSI) based on H a, C a, C b and C secondary chemical shift (Wishart et al. J Biomol NMR 4, 171, 1994) H a C a CSI program of Wishart et al. Output readable in ARIA CSI analysis in NMRView5 Choice of several reference databases and standards C b C Chemical shifts analysis with TALOS Chemical shifts analysis with TALOS Torsion Angle Likelihood Obtained from Shift and sequence similarity (Cornilescu et al. J. Biomol. NMR 13, 289, 1999) Analysis of secondary chemical shift patterns in tripeptides by comparison with a chemical shift database with known 3D structures (Xray, resolution < 2.2 Å, ~3000 triplets) 2

Chemical shifts analysis with TALOS TALOS outputs good predictions if: 10 consistent matches 9 consistent matches out of 10 and f<0 and outlier with f<0 9 consistent matches out of 10 and f>0 Those predictions can be transformed into f/y dihedral restraints for structure calculations: E.g. average angle ± 2*std or minimum of 10 A more conservative approach would be to accept only those predictions with 10 consistent matches One single wrong prediction does however not affect much the calculated structures (tested for CI2) Secondary chemical shifts: application to CI2 Consensus secondary structure from CSI and TALOS 32 Good predictions transformed into f/y dihedral angle restraints (average angle ± 2*std or minimum of 10 ) Distance information from NOEs Scalar (J) couplings NOEs are the result of through space dipolar interactions, ~ 1/r 6 --> max ~ 5Å Derived from 1H-1H (homonuclear) spectra 2D 3D 15 N- or 13 C-dispersed (heteronuclear) spectra 3D 4D 15 N 13 C Can be used directly or indirectly as derived dihedral angle restraints using the Karplus relationship 15 N 13 C 13 C 13 C 15 N 15 N Problem of conformational averaging! Observation of hydrogen bonds by 3hb J NC scalar couplings Observation of hydrogen bonds by 3hb J NC scalar couplings in CI2 Based on HNCO experiments Provides medium- (a-helices) to long-range (b-sheets) structural information 600 MHz long-range CT- HNCO (Cordier & Grzesiek, JACS 121, 1601, 1999) 2 mm sample, ph 4.6, 303K Evolution time T for couplings set to 64.5 ms 18 cross-hydrogen bond 3hb J NC couplings detected Cornilescu et al. JACS 121, 2949 (1999) Cordier and Grzesiek, JACS 121, 1601 (1999) 5 additional weak couplings 3

13C Protein structure from secondary chemical shifts and cross hydrogen bond coupling Protein structure from secondary chemical shifts and cross hydrogen bond coupling Application to CI2 f/y dihedral angles for 32 residues derived from secondary chemical shifts 18 measured hydrogen bonds NMR closest to average Crystal structure (2CI2) Rmsd from Xray: - backbone ss/all: 1.3/2.0 Å - all heavy ss/all: 3.0/3.6 Å Orientational restraints from dipolar (D) couplings Reports angle of internuclear vector relative to magnetic field Ho F2 F1 F3 1 H 15 N Ho 1 H 1 H 15 N 1 H Must accommodate multiple solutionsæ multiple orientations Structure calculations 3D structure has to satisfy: experimental restraints chemical knowledge --> find the minimum of a target function that combines empirical force field with experimental restraints Empirical force field 4

Empirical force fields Bond stretching l Rather simple description of the forces in the system Energetic penalties associated with deviation from reference or equilibrium values V potential = V bonds + V angles + V torsion Various forms possible, e.g.: Morse potential V(l) = D c {1 - e [- a (l -l 0 )] } 2 Allows dissociation Computationally more expensive Harmonic potential V(l) + V non-bonded + V exp V(l) = k 2 (l - l 0 ) 2 l 0 Most commonly used l Van der Waals electrostatic Angle bending J Torsional terms w Usually harmonic potential as for the bond stretching term V(q) = k 2 (q - q 0 )2 V(J) Describe rotation around bonds The torsion angle w around the B-C bond is defined as the angle between the ABC and BCD planes V torsion should allow for multiple minima (rotameric states) J Torsional terms w Improper torsions and out-of plane bending motions Usually expressed as a cosine series expansion: [ ] N V V(w) = Â n where V n is the barrier height 2 1 + cos(nw - g ) n = 0 O-CH 2 -CH 2 -O Defined to maintain planarity of aromatic rings or chirality of atoms, e.g.: Torsion C a -N-C-C b to maintain tetrahedral conformation of C a 35 for L-amino acids -35 for D-amino acids Torsion C d1 -C g -C e1 -H d1 to keep aromatic hydrogen H d1 in the plane of the ring Usually implemented as harmonic potentials N H d1 C a C b C C d1 C g C e1 Tyr Single term: all minima equal Two terms: minima no longer equal 5

Electrostatic interactions Atoms carry small partial charges Electrostatic interactions calculated as the sum of interactions between pairs of point charges using Coulomb s law Slow decay as function of distance between atoms (~ 1/r) Long-range contributions V elec = N N Â Â i = 1 j = 1 q i q j 4pe 0 r ij q: partial charges e: dielectric constant Van der Waals interactions Attractive long-range forces Repulsive short-range forces Repulsion between nuclei Attraction between induced dipoles from fluctuations in electron clouds (London forces) Often expressed using the Lennard-Jones 12-6 function V L-J È 12 Ê ˆ s = 4e Á - s 6 Ê ˆ Í Í Á Í Ë r Ë r Î s: collision diameter e: well depth Force field: chemical information topology: atom names atom types atom masses connectivity parameters: energy constants ideal values Derivatives of the energy function Many molecular modelling techniques based on force fields require the derivative of the energy (i.e. the force) to be calculated with respect to the coordinates. The derivative can be calculated using the chain rule: The force on atom i due to V(q) is given by: r ij V (q) V (q) q = x i q r ij x i V (q) F i (q ) = - x i Force field parameterization Parameterization in order to reproduce structural or vibrational or thermodynamics properties Parameterization possible against: Structural data, e.g. crystallographic databases (CSB,PDB) Spectroscopic data, e.g. IR, UV, NMR, Thermodynamics data, e.g.heat of formation, density, heat of vaporization, heat capacity, viscosity, diffusion constants Ab initio and semi-empirical calculations,e.g. equilibrium geometries, electrostatic potentials Empirical force field NMR force field! Always be aware of the purpose for which a force field has been parameterized! 6

Standard NOE distance restraint potential Flat bottom harmonic potential Soft potential for NOE-derived distances Asymptotic behavior at longer distances Avoid excessive forces (force becomes constant for large violations) no force if r between lower and upper bounds large forces for violations: may lead to distortions NMR structure calculation: simplified force field covalent interactions: rigid, uniform force constants ideal values from Engh & Huber vdw interaction: quartic potential, no attractive part no electrostatics Empirical force field NMR force field Structure calculation methods Locating the global minimum of the target function Except for very simple systems, the potential energy is a very complicated function with 3N-6 degrees of freedom where N is the number of atoms Difficult to visualize the potential and identify energy minima E.g. Potential energy surface of pentane NMR structure calculation methods Energy minimization ("build-up method", DIANA) Metric matrix distance geometry (DISGEO, DG2) Molecular dynamics methods: Simulated annealing in Cartesian space from random structures (X-PLOR, CNS) We need computational methods to locate explore the energy surface and locate minima Simulated annealing in torsion angle space from random structures (X-PLOR, CNS, DYANA) 7

Energy minimization (EM) At the minimum of the function f(x) the following is true: EM will only locate the nearest minimum Can not cross energy barriers f x i = 0; 2 f x i 2 > 0 Energy minimization (EM) Various minimizations methods Non-derivative methods (only need energy),e.g. simplex Derivative methods (need gradient g=dv/dx), e.g. steepest descents, conjugate gradients, Newton-Raphson, Steepest descents methods (derivative method) Move the system in the direction parallel to the force Direction given by s k =-g k / g k New coordinates obtained as x k+1 = x k + ls k l is the step size and can be adapted during EM Multiple minima problem Distance Geometry High energy barriers to fold proteins Standard minimization only "downhill --> can only locate local minima Avoids "folding problem": direct conversion from distances to (approximate) coordinates Distance Geometry schedule Complicated schedule spreading of distance information to other atoms by triangle inequality consistency check random choice of distance "embedding" by calculation of eigenvectors of metric matrix "real space" optimisation necessary Molecular dynamics (MD) Generates successive configurations of the system by integrating Newton s laws of motion 1. A body moves in a straight line unless a force acts on it 2. Force equal mass times acceleration 3. To every action there is an equal reaction MD trajectory obtained by solving Newton s second law (F=ma): d 2 x i dt 2 = F i m i with F i = - V x i 8

Molecular dynamics Direction of motion depends on forces (derived from force field and experimental restraints) momentum Molecular dynamics can overcome local energy barriers MD: integration of the equation of motions Various methods available for the integration All based on Taylor expansions of positions and velocities: r(t + dt) = r(t) + dtv(t) + 1 2 dt 2 a(t) + 1 6 dt 3 b(t) +... v(t + dt) = v(t) + dta(t) + 1 2 dt 2 b(t) + 1 6 dt 3 c(t) +... A few common algorithms: Verlet, Leap-frog, Beeman, predictor-corrector methods Temperature control in MD Temperature T related to the kinetic energy E kin of the system and therefore to the velocities MD: choosing the time step What is an appropriate time step for the integration? E kin = Nat oms  i = 1 1 2 m iv i 2 = k bt 2 (3N - N c ) N c = # constraints in the system (3N-N c ) = # degrees of freedom Temperature can be controlled by modifying the velocities v i of atoms and molecules in the system, e.g.: Velocity scaling: velocities are scaled so that T=T ref at each step No temperature fluctuations! Weak coupling: velocities are scaled at a rate proportional to the temperature difference (exponential decay toward reference temperature) Temperature fluctuations, more realistic Too small nothing happens System Flexible molecules, rigid bonds and angles Flexible molecules, flexible bonds, angles Too large instabilities Types of motion translation, rotation, torsions Translation, rotation, torsions, vibrations Appropriate NMR time step (e.g. in CNS) 20-40 fs 2 to 5 fs Simulated annealing Simulated annealing with energy scaling Temperature control and variation More flexible annealing schemes Different variation of different energy terms E.g.: E chem / E exp E covalent / E exp / E nonbond 9

Example of a NMR simulated annealing scheme SA example SA example: IL8 dimer SA example: BPTI Torsion angle dynamics dynamics time step dictated by bond stretching: waste of CPU time important motions are around torsions ~ 3 degrees of freedom per AA (vs 3N atom for Cartesian dynamics) Available in DYANA, X-PLOR, CNS, X-PLOR-NIH Calculation of structure ensembles Repeat calculation (20-200-xxx times) Random variation of initial conditions (starting structure/ velocities) Obtain information on uniqueness / different folds "dynamics Structure selection problem! 10

Structural restraints from NOEs Selection of peaks Assignment to proton pair Calibration to distance From NOEs to distances From NOEs to distances NOE intensity A ij given by: È Í r 1 s 12... s 1N Í s R = 21 r 2... s 2N with Í Í............ Í Î s N1 s N2... r N A ij = and [ exp( -t m R) ] ij s ij ª 1 f(motion) r 6 Series expansion: A = exp(-t m R) = 1 - t m R + t 2 m 2 R2 -... Two-spin approximation (ISPA isolated spin-pair approximation) From the series expansion and assuming: 1 Ê ˆ no internal dynamics 1 6 d no spin diffusion ij ª c Á cal Á A ij Ë short mixing times Get calibration factor C cal from reference distances averages over all distances 1 Ê ˆ A 6 d ij ª d Á ref ref Á Ë A ij Problems: approximate, introduces systematic errors Standard treatment of errors: upper and lower bounds Loose upper and lower bounds are typically defined to account for errors such as peak integration, spin diffusion, internal dynamics, e.g. distance ± 10,20% Strong-medium-weak classification: Strong 1.8-2.8 Å Medium 1.8-3.5 Å Weak 1.8-4.5 Å (Very weak 1.8-5.5 Å) 11

Consequence of bounds Bounds have to be large enough for cumulative error Precise value not (too) important: even loose bounds restrict conformational space May affect precision of structure validation noise peak recognition (see below) Spin diffusion Spin diffusion is major source of error in NOE derived distances (in particular for long mixing times and large molecules) Indirect paths (---) are more efficient than direct path (1/r 6 ), --> underestimation of distance --> wide error bounds necessary Relaxation matrix approaches Back calculate NOEs from structure and compare with experimental data A ij = ( exp [-Rt m ]) R ij = 2p ij Transform back experimental NOEs to distances using E.g. IRMA, MARDIGRAS with ( ) 5 g4 h 2 -J 0 ij (0) + 6J 2 ij ( 2w 0 ) 1 J ij (w) = 4pr * t c 6 ij 1 + (wt c ) 2 R = -ln( A) / t m Simple spin diffusion correction in ARIA Calculate NOE between protons i and j from the structures of the previous iteration: Calculate distance matrix d ij from ensemble Apply cutoff criterion, simulate spin diffusion pathways Calculate relaxation matrix Calculate NOE spectrum Use calculated NOE intensities as correction factors when determining the target distances Direct NOE refinement: E NOE = k (A calc -A exp ) 2 Effect of relaxation matrix calculation From NOEs to distances (automated) NOE assignment 12

NOE assignment Resonance assignment Chemical shift table NOE assignment problem Standard structure calculation requires unambiguous NOE assignments NOE peak list Key problem: signal overlap Assignment of NOE peaks Most NOEs are ambiguous NOE ambiguities Manual iterative assignment Iterative structure calculation and assignment: calculate structures based on unambiguous assignments analyse model to find more assignments possible solutions don't use ambiguous data trial and error additional information (intra residue, secondary structure...) choose assignment consistent with current structure simultaneous unambiguous restraint for each possibility one ambiguous restraint Ambiguous NOEs Ambiguous NOEs contain structural information An ambiguous NOE corresponds to a sum of individual contributions: NOE = N Â NOE a a =1 Use for direct NOE refinement or for ambiguous distance restraints (ADRs) Ambiguous NOEs Ambiguous NOE can be approximately calculated from a structure with 6th power law Ê N d ˆ N NOE = Á Â NOE d Á a @ 1 Â 6 Ë a =1 a =1r a Where N d is the number of possible assignments of a peak for a frequency tolerance of d 13

Ambiguous distance restraints (ADR) ADRs for equivalent protons -1 / 6 Ê N d ˆ 6 D Á Á Â d ā Ë a =1 Approximate expression for equivalent protons in aromatic rings methyl groups diastereotopic methylene protons HD1 HE1 HD2 HE2 Ambiguous distance restraint: define an effective ("summed") distance D "distance" between more than two points Note: the effective distance D is always shorter than the shortest of the contributing distances! Small errors compared to correct treatment important: no correction to NOE volume no distance ("pseudo atom") corrections -1 / 6 Ê 2 ˆ 6 D Á Á Â d ā Ë a =1 ADRs for other data Structure calculation using ADRs hydrogen bonds: from one donor to several acceptors disulfide bridges: from one Cys SG to all others Metric matrix distance geometry cannot be used All "global" minimization approaches possible (Cartesian dynamics, torsion angle dynamics) Convergence more difficult Network of ADRs Structure calculation possible without unambiguous NOEs Iterative NOE assignment Principle of automated assignment with ARIA or CANDID In each iteration, partial assignment of ambiguities Network of ambiguous NOEs determines structure 14

NOE assignment using ARIA Ambiguity cutoff p The contribution of each possibility is: N 6 C a µ d ā Â C a =1 a=1 Where d is the average distance for one given assignment C a can include other factors (e.g. network anchored assignments) Sort the contributions according to size (shortest d first) Only use the first N p contributions such as: Np ÂC a p With p varying between 1 and ~ 0.9 a=1 during the various iterations p is a distance cutoff for the partial assignment of NOEs P is typically reduced over the various iterations Np ÂC a a=1 p From NOEs to distances (automated) NOE assignment Dealing with noise and errors Structure calculation with noisy data Origin of noise can be: Spectral artefacts Incorrect resonance assignments Missing proton assignments Insufficient frequency windows Underestimation of error bounds because of spin diffusion and internal dynamics Error detection by simple violation statistics R vio is the fraction of structures in which a restraint is violated by more than a threshold v tol : R vio = 1 ÂQ(D-U -n S tol ) conv U = upper distance limit S conv = number of converged structures If R exceeds a threshold (e.g. 0.5 or 0.75), remove the restraint from the list. (qexclude=true) Optionally increase distance bound ( fudge factor ) (qmove=true) From NOEs to distances (automated) NOE assignment Dealing with noise and errors Notes on ARIA 15

Tasks of ARIA Automatically generates topology and template files from sequence Generates restraint lists from peak lists in several formats Structure determination with ARIA Iterative NOE assignment Takes care of running the calculations on distributed and/or shared memory computing systems (e.g. DQS) Statistics of restraint violations Outputs peak lists in various formats Example of convergence Important parameters ARIA: restraints list generation ARIA: other restraints Supports several formats: ANSIG, ARIA, NMRView, PIPP, XEASY, SPARKY, simple column-based format Allows up to five 4D spectra Fore each spectrum: Generation of unambiguous and ambiguous restraints Calibration of restraints, possibly with spin-diffusion correction (fast relaxation matrix calculation with matrix doubling procedure) Noise filtering Merge restraints from all spectra and remove duplicates Distance restraints from other sources than NOE spectra H-bonds restraints (get a lower weight in the structure calculations) J-couplings Dihedral angle restraints CSI data (from CSI and/or Talos) Residual dipolar couplings (RDCs) and intervector projection angles derived from RDCs Chemical shifts, database restraints available in CNS 16

ARIA: force fields Various force fields are available Best results (personal experience) PROLSQ: Engh & Huber parameter set derived from CSB (Acta.Cryst. A, 1991) OPSL parameters (Jorgenson, Yale) typically used in the final refinement in explicit water Reference: Linge & Nilges, J. Biomol. NMR. 13, 51 (1999) Possibility to turn on dihedral angle potential (significant improvement of c 1 /c 2 side-chain angles) (recommended!) Improvement of dihedral potentials topallhdg5.3.pro (future release (Utrecht A.Bonvin /UCL Marc Williams /Pasteur Nilges/Linge)) ARIA: NOE analysis ARIA outputs for each spectrum an assignment file containing Information on the various contributions Is this peak used or not for the calculations Is is violated? Overview script giving the evolution of NOE assignments as a function of the iteration number Listing of new and rejected assignments (semi-) automated ARIA usage ARIA can be used in a fully automated way Usually, however, it will be used in a semi-automated way: Input as much information as possible from the beginning Generate new assignments User checks new and rejected assignments Start new ARIA run with partially assigned data Repeat the entire process X times ARIA keeps track of every run Validation of structures: Why? Structures should be reliable: Satisfy experimental data Good local and overall quality Protein structures are a valuable source for understanding biology Structure based drug design Homology modeling --> Only good structures are typically used What should be analyzed/validated? 17

Structure analysis and validation Validation of structures Geometry (rmsd from idealized bonds, angles ) Energetics (non-bonded, restraint energies ) Violations analysis Rmsd: pairwise (useful e.g. for clustering), from average, per residue rmsds Circular variance of f / y dihedral angles Stereochemical quality with PROCHECK WhatIf, Prosa, Bonded geometry Rotamers Inter-atomic bumps Validation of structures Validation of structures Electrostatics and hydrogen bonding Backbone conformation Ramachandran plot What should be analyzed/validated? Quality indicators Validation of structures: the reference set Structure Z-scores: Well refined X-ray structures (resolution < 2.0 Å, R-factor < 19%) Continuously updated RMS Z-scores: Cambridge small molecule database (CSD) Well refined X-ray structures 18

Validation of structures: local and overall quality Overall quality: Ramachandran, rotamers, packing etc. Indicators: Structure Z-scores Local geometry: Bond lengths, angles, planarity etc. Indicators: RMS Z-scores Others: Inter-atomic bumps, buried hydrogen-bonds etc. Indicators: Number of occurrences Validation of structures: Less than 1 in 10000 points are further away than 4 standard deviations from the mean Normal distributions and Z-scores Half of the points are above and half are below the average 68% of the points are within one standard deviation from the mean 95% of the points are within two standard deviations from the mean Z-score Validation of structures: Z-scores and RMS Z-scores Structure Z-scores: Z-scores > 0 Ë better than average Z-scores < 0 Ë worse than average However: a Z-score of -1 is equally likely as a Z-score of +1!! Local geometry RMS Z-scores: Too tight restraining of geometry Ë RMS Z-score < 1 Too loose restraining of geometry Ë RMS Z-score > 1 Proper Gaussian distribution Ë RMS Z-score ~1 What should be analyzed/validated? Quality indicators The NMR structures at the PDB Validation of structures: NMR structures at the PDB Validation of structures: NMR structures at the PDB 19

Validation of structures: Summary No consensus for restraining of geometry in structure calculation protocols High number of bad contacts Electrostatics are often not used Structure Z-scores indicate in general a lower quality of NMR structures compared to X-ray structures Improving protein NMR structures Refinements in explicit or implicit water for final optimization Better packing (van der Waals, electrostatic) Better outside of proteins Refinement in explicit water: Solvate the protein in water Run restrained Molecular Dynamics simulation, including full electrostatics and van der Waals Minimize the structures A few references... Molecular modelling: Molecular Modelling: Principles and Applications. Andrew R. Leach, Longman Limited 1996. Automated NOE assignment and structure calculations: Automated Assignment of Ambiguous Nuclear Overhauser Effects with ARIA. J.P. Linge, S.I. O'Donoghue and M. Nilges. Methods in Enzymology (2001) 339, 71-90. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. T. Herrmann, P. Güntert and K. Wüthrich. J. Mol. Biol. (2002) 319, 209-227. Molecular dynamics and NMR Molecular dynamics and NMR spin relaxation in Proteins. D. A. Case. Acc. Chem. Res. (2002) 35, 325-331 NMR in Utrecht Beowulf cluster: - 24 x 1.3 GHz - linux 900 MHz (End of 2002) 1 x 750 MHz 1 x 700 MHz 2 x 600 MHz (one with cryoprobe) 3 x 500 MHz (one widebore) 1 x 360 MHz The End. Thank you for your attention! 20