Prediction and refinement of NMR structures from sparse experimental data

Similar documents
Protein Structure Prediction

Contact map guided ab initio structure prediction

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction

Computer simulations of protein folding with a small number of distance restraints

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Orientational degeneracy in the presence of one alignment tensor.

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Basics of protein structure

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Supporting Online Material for

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Presenter: She Zhang

Template-Based Modeling of Protein Structure

Protein Structure Prediction

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Human and Server CAPRI Protein Docking Prediction Using LZerD with Combined Scoring Functions. Daisuke Kihara

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

As of December 30, 2003, 23,000 solved protein structures

proteins Prediction Methods and Reports

Protein Folding Prof. Eugene Shakhnovich

Protein Structure Determination

Template Free Protein Structure Modeling Jianlin Cheng, PhD

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

Introduction to" Protein Structure

NMR, X-ray Diffraction, Protein Structure, and RasMol

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Week 10: Homology Modelling (II) - HHpred

Analysis and Prediction of Protein Structure (I)

Pymol Practial Guide

A Method for the Improvement of Threading-Based Protein Models

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

PROTEIN STRUCTURE PREDICTION II

proteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick*

Protein structures and comparisons ndrew Torda Bioinformatik, Mai 2008

Template Free Protein Structure Modeling Jianlin Cheng, PhD

TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Nature Structural and Molecular Biology: doi: /nsmb.2938

Modeling for 3D structure prediction

TOUCHSTONE: A Unified Approach to Protein Structure Prediction

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Chemical Shift Restraints Tools and Methods. Andrea Cavalli

Supporting Information

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Development and Large Scale Benchmark Testing of the PROSPECTOR_3 Threading Algorithm

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

Docking. GBCB 5874: Problem Solving in GBCB

CAP 5510 Lecture 3 Protein Structures

Computational Protein Design

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

Course Notes: Topics in Computational. Structural Biology.

Protein Folding by Robotics

Overview & Applications. T. Lezon Hands-on Workshop in Computational Biophysics Pittsburgh Supercomputing Center 04 June, 2015

Molecular Modeling lecture 2

SUPPLEMENTARY INFORMATION

Ligand Scout Tutorials

SUPPLEMENTARY INFORMATION

AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement

Identification of correct regions in protein models using structural, alignment, and consensus information

Nitrogenase MoFe protein from Clostridium pasteurianum at 1.08 Å resolution: comparison with the Azotobacter vinelandii MoFe protein

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling

Universal Similarity Measure for Comparing Protein Structures

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Methods of Protein Structure Comparison. Irina Kufareva and Ruben Abagyan

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are

3DRobot: automated generation of diverse and well-packed protein structure decoys

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Bioinformatics. Macromolecular structure

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

fconv Tutorial Part 2

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

All-atom ab initio folding of a diverse set of proteins

Protein structure analysis. Risto Laakso 10th January 2005

Full wwpdb NMR Structure Validation Report i

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

Protein Docking by Exploiting Multi-dimensional Energy Funnels

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Monte Carlo simulation of proteins through a random walk in energy space

Protein Structure Prediction and Display

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

CHEM 463: Advanced Inorganic Chemistry Modeling Metalloproteins for Structural Analysis

ALL LECTURES IN SB Introduction

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Visualization of Macromolecular Structures

Assignment 2 Atomic-Level Molecular Modeling

Transcription:

Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology

Overview of talk General methodology Structure prediction on weakly homologous proteins with/without experimental restraints Refinement of NMR structures and comparison with X-ray structures Conclusions

General Methodology

TASSER:Threading/ASSEmbly/Refinement

Recipe for protein structure prediction Protein Model At what level of detail should the protein be represented? Potential What types of energy terms should be used? Conformational search scheme How does one search for the native state?

Protein Model

Potential: Generic terms that describe a protein such as hydrogen bonding and local conformational stiffness Predicted secondary structure Hydrophobic burial and pair interactions Predicted consensus side chain contacts from the template structures

Potential Energy Surface NATURE EXISTING POTENTIALS Energy Energy Configuration Configuration Native structure Non-native structure

CONFORMATIONAL SEARCH SCHEME: Replica Exchange Monte Carlo T 4 Temperature T 3 T 2 T 1 T 0 Simulation time

Use of sparse NMR restraints in TASSER

Large Scale Benchmark: 1375 non-homologous proteins between 40 and 200 residues that cover the PDB at 35 % sequence identity and which represent all secondary structure classes (α, β, α/β). Exclude all templates whose sequence identity to the target sequences >30%. Weakly homologous limit Compare TASSER runs with and without sparse NMR-derived constraints

Restraint Selection Randomly select N/8 restraints with N the number of residues in the protein such that their distribution is more or less uniform. Don t want to select N/8 restraints such that many are clustered in on region of the structure.

How are the restraints implemented? The strength of a restraint depends on the sequence separation, with those having the largest sequence separation twice that of those with the shortest separation in sequence. The strength of the restraints at the end of the simulation is double that at the beginning (to avoid kinetic traps due to the distant-insequence restraints)

How are restraints implemented? The strength of a restraint depends on the sequence separation, with those having the largest sequence separation twice that of those with the shortest separation in sequence. The strength of the restraints at the end of the simulation is double that at the beginning (to avoid kinetic traps due to the distant-in-sequence restraints)

Results:

Comparison of the best of top 5 clusters % of proteins With restraints Without restraints RMSD to native Addition of restraints slightly improves the quality of the results

Comparison of most populated cluster with and w/o restraints % of proteins With restraints Without restraints RMSD to native

Comparison of the RMSD of the first cluster: With restraints 1fxkC 1imuA No Restraints Restraints decease the RMSD to native on average by 0.5 Å Biggest effect is for the poorly predicted proteins w/o exact restraints (>6 Å) Green Zone See improvement in the 2-6 Å range of models Yellow Zone Little to no improvement for models <2 Å -resolution of TASSER models

1fxkC: Model with restraints is worse than without restraints A B C native A. Native State with the sparse restraints shown (dotted yellow lines). Green and yellow hairpins are far from each other. B First cluster structure after adding restraints (RMSD from native=17 Å). The structure satisfies all the restraints. The yellow and green hairpins make a beta sheet, which is packed against the two long helixes. The population of this structure increases after inclusion of exact restraints.

1imuA : No improvement with restraints added A B C A. Native State showing the selected sparse restraints (dotted yellow lines). C-terminus (in red) is highly unstructured. B. First cluster structure after adding restraints (RMSD=10 Å). The core is very similar to native, and most differences seem related to the C-terminal helix. C. Predicted structure with all the predicted restraints Thus, the predicted restraints incorrectly locate the C-terminal helix!

Cluster Rank Distribution Number of proteins No restraints With restraints Rank of top cluster Addition of restraints improves identification of top cluster as the first cluster (for 44% of all proteins, the first and top cluster coincide). In addition, the number of proteins with top cluster rank decreases monotonically

TASSER refinement of NMR structures S. Lee, Y. Zhang and J. Skolnick. TASSER-based refinement of NMR structures. Proteins 2006: 63: 451-456.

Benchmark set 61 nonhomologous proteins with both X- ray and NMR structures. Require that there be multiple NMR structures in the PDB file Basically the same set as used by Garbuzhnskiy et al, Proteins, 60, 139 (2005).

Method Identify the consensus region of the NMR models by superimposing models 2 to N onto the first NMR model. Calculate the average distance to residue j in all the models. Chose the cutoff so that 90% of the NMR structure is defined as the template.

Results: Table I. Comparison of final models from TASSER with X-ray structures RMSD NMR/X-ray RMSD to X-ray (ali) a RMSD to X-ray (all) b TM-score to X-ray (all) c NMR Model NMR Model NMR Model <1 0.664 0.814 0.860 1.014 0.940 0.915 <2 1.222 1.179 1.428 1.377 0.885 0.887 <3 1.495 1.384 1.797 1.601 0.850 0.861 <4 1.600 1.474 1.959 1.709 0.841 0.853 <5 1.731 1.556 2.080 1.785 0.828 0.846 a NMR and final TASSER models over the same aligned regions; b NMR and TASSER models over the entire chain. TM-score to X-ray structures: c NMR and TASSER models over the entire chain.

RMSD to X-ray structure Entire structure Aligned region

Representative example of TASSER refinement (a) (b) a. Superposition of NMR structure to NATIVE (RMSD = 4.3 Å) b. Superposition of TASSER model to Native (RMSD=1.4 Å). Red indicates residues <5 Å after superposition.

Histogram of better/worse RMSD relative to the initial X-ray/NMR RMSD Average RMSD of TASSER refined models to X-ray s 1.79 Å Average RMSD of NMR to X-ray structures = 2.1 Å

Conclusions

Sparse NMR Restraints Expect about 70% of nonhomologous proteins to have an acceptable model Effect of sparse restraints on best of top 5 predicted structures is very minor Sparse exact restraints DO improve the quality of the most populated structure, with the biggest effect for structures whose RMSD > 6 Å in the absence of exact restraints Also see improvement in the 2-4 Å range On average little improvement in structures whose RMSD < 2 Å in the absence of exact restraints- reflects resolution of the model

NMR-X-ray Studies If the NMR-X-ray structure has a RMSD >2 Å from native, TASSER models systematically move closer to the X-ray structure If the NMR-X-ray structure RMSD <2 Å, then 21/34 models (60%) move closer. This reflects the inherent resolution of TASSER.

Software dissemination TASSER is a mature suite of programs that is ready for wide dissemination to assist in the determination of protein structures by NMR Have a TASSER-Lite is available as a web based service http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/index.html Will be releasing a version for downloading for academic users

Acknowledgements Sparse NMR Restraints Jose Borreguero GA Tech Yang Zhang University of Kansas NMR to X-ray Refinement Seung Yup Lee GA Tech Yang Zhang University of Kansas