Protein Structure Prediction

Similar documents
Protein Structure Prediction

Contact map guided ab initio structure prediction

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template-Based Modeling of Protein Structure

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Protein Structure Prediction, Engineering & Design CHEM 430

Prediction and refinement of NMR structures from sparse experimental data

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Supporting Online Material for

Protein Structure Determination

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structures: Experiments and Modeling. Patrice Koehl

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

VC 2009 Wiley-Liss, Inc. INTRODUCTION

Protein Structure Prediction

Ab-initio protein structure prediction

AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

MMTSB Toolset. Michael Feig MMTSB/CTBP 2006 Summer Workshop. MMTSB/CTBP Summer Workshop Michael Feig, 2006.

Human and Server CAPRI Protein Docking Prediction Using LZerD with Combined Scoring Functions. Daisuke Kihara

3DRobot: automated generation of diverse and well-packed protein structure decoys

proteins Prediction Methods and Reports

Evolutionary design of energy functions for protein structure prediction

Protein Structure Prediction and Display

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Bioinformatics: Secondary Structure Prediction

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Improving De novo Protein Structure Prediction using Contact Maps Information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Bioinformatics: Secondary Structure Prediction

Protein modeling and structure prediction with a reduced representation

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

TOUCHSTONE: A Unified Approach to Protein Structure Prediction

Presenter: She Zhang

Protein Secondary Structure Prediction

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

All-atom ab initio folding of a diverse set of proteins

Computer simulations of protein folding with a small number of distance restraints

Chemical Shift Restraints Tools and Methods. Andrea Cavalli

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

FlexPepDock In a nutshell

Improved Protein Secondary Structure Prediction

Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama*

Building 3D models of proteins

CS612 - Algorithms in Bioinformatics

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Protein Structure Prediction

Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten*

proteins Prediction Report Template-based modeling and free modeling by I-TASSER in CASP7 Yang Zhang* 108 PROTEINS VC 2007 WILEY-LISS, INC.

Homology Modeling. Roberto Lins EPFL - summer semester 2005

PROTEIN STRUCTURE PREDICTION II

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

As of December 30, 2003, 23,000 solved protein structures

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

ALL LECTURES IN SB Introduction

Protein Structure Prediction 11/11/05

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Orientational degeneracy in the presence of one alignment tensor.

TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction

Application of Statistical Potentials to Protein Structure Refinement from Low Resolution Ab Initio Models

EBBA: Efficient Branch and Bound Algorithm for Protein Decoy Generation

Physiochemical Properties of Residues

Predicting Continuous Local Structure and the Effect of Its Substitution for Secondary Structure in Fragment-Free Protein Structure Prediction

Prediction of Protein Backbone Structure by Preference Classification with SVM

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Computational Molecular Biology. Protein Structure and Homology Modeling

Bioinformatics. Macromolecular structure

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Reconstruction of Protein Backbone with the α-carbon Coordinates *

Protein Secondary Structure Assignment and Prediction

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

CAP 5510 Lecture 3 Protein Structures

SUPPLEMENTARY MATERIALS

Research article Ab initio modeling of small proteins by iterative TASSER simulations Sitao Wu 1, Jeffrey Skolnick 2 and Yang Zhang* 1

Structural Bioinformatics

) P = 1 if exp # " s. + 0 otherwise

Generalized Pattern Search Algorithm for Peptide Structure Prediction

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Week 10: Homology Modelling (II) - HHpred

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling

Protein quality assessment

Transcription:

Protein Structure Prediction Michael Feig MMTSB/CTBP 2009 Summer Workshop

From Sequence to Structure SEALGDTIVKNA

Folding with All-Atom Models AAQAAAAQAAAAQAA All-atom MD in general not succesful for real proteins because of insufficient sampling but also force field issues. CHARMM force field Implicit solvent replica exchange simulations 8 replicas, 10 ns/replica Michael Feig

Ab initio Structure Prediction Protocol Amino Acid Sequence Sampling to generate native-like structures Scoring & Clustering to select native-like structures 3D Structure

Improved Ab initio Prediction Protocol Amino Acid Sequence Secondary Structure Prediction PSIPRED et al. Efficient Sampling e.g. MONSSTER/SICHO to generate native-like structures All-Atom Reconstruction Scoring & Clustering e.g. MMGB/SA, DFIRE to select native-like structures 3D Structure

Conformational Sampling vs. Scoring SICHO sampling of protein A All-Atom Energy in kcal/mol -3200-3250 -3300-3350 -3400 0 1 2 3 4 5 6 7 8 9 10 11 12 RMSD from native in Å

Realistic Example from CASP5-2400 -2450 CHARMM22/GBMV. -2500-2550 -2600-2650 -2700-2750 6 7 8 9 10 11 12 13 14 C RMSD in Å (62 residues)

Sampling, Scoring, Clustering -2400-2450 CHARMM22/GBMV. -2500-2550 -2600-2650 -2700-2750 6 7 8 9 10 11 12 13 14 C RMSD in Å (62 residues)

Ab initio Predictions DNase fragmentation factor NMR structure 1KOY Best-scoring prediction, 7.4 Å Problems with secondary structure prediction and sampling. Scoring luckily worked here.

Secondary Structure Prediction GDPIVKNAKLDSRLANKEALRLL?

Secondary Structure Propensities -helix -sheet turn Chou & Fasman (1974)

Secondary Structure Prediction Methods SABLE 77.6% PSIPRED 76.2% PSSP 75.1% SAM-T99-sec 76.1% PHD 72.3% C+F 50-60% C+F: Chou & Fasman GOR: Garnier, Osguthorpe, Robson Improved accuracy is obtained via known structural information from homologous proteins

70 60 Secondary structure prediction Per-Residue Accuracy <Q3>=72.3% ; sigma=10.5% Number of protein chains 50 40 30 20 10 0 1stu 1psm 3ifm 1spf 1bct 0 10 20 30 40 50 60 70 80 90 100 Per-residue accuracy (Q 3 )

Sampling Could we have reached the native state? -2400-2450 CHARMM22/GBMV. -2500-2550 -2600-2650 -2700-2750 6 7 8 9 10 11 12 13 14 C RMSD in Å (62 residues)

Random Sampling Random sampling of protein-like conformations: probability 2 RMSD ( r RMSD ) 1 2 2 Pr R e dr 0 RMSD 3.333N res 1/3 For N=62: <RMSD> = 13.2 Å A B so where are doing better than random sampling! - r -r +r RMSD from native Reva BA, Finkelstein AV, Skolnick J. Folding & Design (1998) 3:141-147

Can we Reach the Native State? Iterative refinement with ideal scoring function: RMSD from native? RMSD

Ideal Sampling with MD 1 ps MD w/ CHARMM19/EEF1 1000 samples at each cycle @50 cycles 2.6 Å T0132 initial: 6.7 Å Stumpff-Kane, Maksimiak, Lee, Feig: Proteins (2008) 70, 1345-1356

Can we reach the native? Resolution limit of about 2.5-3 Å Sampling? Force field? Stumpff-Kane, Maksimiak, Lee, Feig: Proteins (2008) 70, 1345-1356

How about coarse-grained sampling? 50 20 T0132 MD all-atom/impl. solvent 1 ps MD @ 300K 1000 samples/cycle SICHO coarse-grained/lattice MC sampling 1000 samples/cycle V F ma Kolinski & Skolnick: Proteins 32, 475 (1998)

Can we reach the native? Resolution limit of about >3 Å! SICHO Stumpff-Kane, Maksimiak, Lee, Feig: Proteins (2008) 70, 1345-1356

Scoring Can we pick out the native state? -2400-2450 CHARMM22/GBMV. -2500-2550 -2600-2650 -2700-2750 6 7 8 9 10 11 12 13 14 C RMSD in Å (62 residues)

Scoring Functions Knowledge-based/statistical derived from known protein structures limited by training data usually fast e.g. DOPE, DFIRE, OPUS-PSP, RAPDF, prosaii Force field based model physical energy landscape more robust and transferable often expensive (require minimization) e.g. MMPB(GB)/SA, UNRES

-3500 Scoring Function Comparison MMGB/SA vs. DFIRE -8000 MMGB/SA score -3600-3700 -3800-3900 -4000-4100 DFIRE score -8500-9000 -9500-10000 -4200-10500 5.5 6 6.5 7 7.5 8 8.5 5.5 6 6.5 7 7.5 8 8.5 C RMSD in Å 15.8 15.6 15.4 15.2 15 14.8 14.6 radius of gyration C RMSD in Å 14.4 14.2 5.5 6 6.5 7 7.5 8 8.5 C RMSD in Å

Reliable and accurate ab initio structure prediction may eventually succeed

but the solution may lie elsewhere.

Structure Prediction Approaches Sequence Align to sequences of known structures Build extended chain Build model(s) using aligned template(s) Simulated ab initio Folding Template-Based Modeling Prediction Ab initio prediction

Sequence Homology Human thioredoxin (1AUC) SDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVA : :....:..::: : :::::::: :.....:.... MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKYSNVIFL- KLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA...:..:....::..::.:. :::.: : :: :.:. :. EVDVDDCQDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV E. Coli thioredoxin (1THO)

Comparative Modeling Assumption: Proteins with similar sequence have similar structure Human thioredoxin (1AUC) E. Coli thioredoxin (1THO)

Structural Templates from Homology Challenges: Template identification Correct alignment Loop modeling All-atom reconstruction (side chain rebuilding) PGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA.:....::..::.:. :::.: : :: :.:. :. QDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV

Template-Based Model Building Direct template-based modeling: -Use C -trace from template to build backbone - Add sidechains to backbone (using rotamer libraries) Restraint-based modeling: - Use one or more templates to setup list of restraints -Sample C -trace, CG model (SICHO), or all-atom model (Modeller) from extended chain with force field under template-derived restraints - Reconstruct all-atom model Restraint-based modeling allows use of multiple templates and can reflect structural variations for different sequences.

Accuracy of Predictions by Homology 100 % predictions 80 60 40 20 0 20 30 40 50 60 70 80 90 100 % sequence identity <2Å RMSD <5Å RMSD

Prediction through Fold Recognition Assumption: Proteins with similar secondary structure share fold 1N91 1JRM

Templates through Fold Recognition Challenges: Wrong templates Alignment uncertain Fragment modeling Refinement needed

Extreme Homology Modeling: TASSER Find fragments of varying sizes (typically 20-100 residues) based on sequence alignment or fold recognition Jam fragments together in coarse-grained sampling procedure (Monte Carlo enhanced with replica exchange) Score resulting structures based on clustering Method very successful, especially for hard cases when no overall template is available. J. Skolnick, Y. Zhang

Extreme Homology Modeling: Rosetta Find short fragments (9 and 3 residues) based on sequence alignment or fold recognition Sample fragment assembly at C level with Monte Carlo procedure Generate very large number of structures (10,000-1,000,000) and score with series of filtering functions Method also very successful for hard cases. D. Baker

State of the Art in Structure Prediction CASP8 Results

CASP8: Human/Server TBM Targets 1 DBAKER 489 1.39468 62 Rosetta+expert/Baker 2 Zhang 071 1.12254 63 TASSER+servers/Zhang 3 SAM-T08-human 046 1.11230 61 SAM+expert/Karplus 4 fams-ace2 434 1.10969 64 Meta/Umeyama 5 TASSER 057 1.08156 64 TASSER+expert?/Skolnick 6 MULTICOM 453 1.06672 64 HHSearch+MultiMod+Meta/Cheng 7 McGuffin 379 1.06355 62 Meta/McGuffin 8 ZicoFullSTPFullData 138 1.05079 63 Meta/Girgis 9 ZicoFullSTP 196 1.04905 63 Meta/Girgis 10 Zhang-Server 426s 1.04062 64 TASSER/Zhang 11 Zico 299 1.03919 62 Meta/Girgis 12 IBT_LT 283 1.03000 54 COMA+expert/Venclovas 13 Chicken_George 081 1.02079 63 SimFold+use servers/chikenji 14 mufold 310 1.01067 60 Rosetta/Xu 15 pro-sp3-tasser 409s 0.96219 64 TASSER/Skolnick 16 GeneSilico 371 0.90565 62 Meta/Bujnicki 17 A-TASSER 149 0.89703 64 TASSER/Skolnick 18 Sternberg 202 0.88222 54 HHSearch+MultiMod+expert?/Sternberg 19 Jones-UCL 387 0.86397 58 Jones 20 BAKER-ROBETTA 425s 0.85422 64 Rosetta/Baker 21 fais@hgc 198 0.84538 52 Shirota 22 RAPTOR 438s 0.84459 61 Xu 23 LevittGroup 442 0.83579 57 Levitt 24 LEE 407 0.82484 64 MultiMod+CSA?/Lee 25 keasar 114 0.82034 59?Keasar 26 Bates_BMM 178 0.81492 61 Bates 27 Phyre_de_novo 322s 0.81193 57 HHSearch+MultiMod+Poing/Sternberg 28 3DShot1 282 0.78625 64 Seth 29 POEMQA 124 0.76109 64 Wenzel 30 MidwayFolding 208 0.76065 62 Sosnick 31 METATASSER 182s 0.76047 64 TASSER/Skolnick 32 Elofsson 200 0.76016 63 Elofsson 33 MULTICOM-CLUSTER 020s 0.70844 64 HHSearch+MultiMod/Cheng 34 MUSTER 408s 0.70141 64 /Zhang 35 3DShotMQ 419 0.70125 64 Seth 36 Phyre2 235s 0.66549 51 HHSearch+MultiMod/Sternberg 37 MULTICOM-REFINE 013s 0.65078 64 HHSearch+COMPASS+MultiModCheng 38 SAMUDRALA 034 0.60904 52 Samudrala 39 FEIG 166s 0.60726 62 TASSER+Rosetta+MultiMod/Feig 40 PS2-manual 023 0.60649 57 Chen Rosetta TASSER Meta-Method Human expert Targets where template(s) could be (easily) found in PDB

CASP8: Human/Server FM Results 1 DBAKER 489 2.22333 9 2 keasar 114 1.99333 9 3 MULTICOM 453 1.99222 9 4 MUFOLD-MD 404s 1.96667 9 5 BAKER-ROBETTA 425s 1.89889 9 6 Zico 299 1.88889 9 7 ZicoFullSTPFullData 138 1.86889 9 8 ZicoFullSTP 196 1.86111 9 9 LevittGroup 442 1.84222 9 10 mufold 310 1.79222 9 11 Zhang 071 1.75667 9 12 POEMQA 124 1.74111 9 13 fams-ace2 434 1.73111 9 14 SAM-T08-human 046 1.63667 9 15 Zhang-Server 426s 1.61000 9 16 Jones-UCL 387 1.51444 9 17 Bates_BMM 178 1.51222 9 18 McGuffin 379 1.50556 9 19 PSI 385s 1.48444 9 20 POEM 207 1.47222 9 21 ABIpro 340 1.42778 9 22 FALCON 351s 1.39667 9 23 A-TASSER 149 1.39000 9 24 RBO-Proteus 479s 1.37889 9 25 TASSER 057 1.34444 9 26 Sternberg 202 1.33556 9 27 pro-sp3-tasser 409s 1.22222 9 28 Chicken_George 081 1.20556 9 29 FALCON_CONSENSUS 220s 1.16222 9 30 Bilab-UT 325 1.13667 9 31 Sasaki-Cetin-Sasai 461 1.12556 9 32 TJ_Jiang 384 0.98667 9 33 METATASSER 182s 0.95222 9 34 fais-server 116s 0.85111 9 35 3DShotMQ 419 0.77778 9 36 MULTICOM-CLUSTER 020s 0.71222 9 37 3DShot1 282 0.61111 9 38 MUProt 443s 0.60222 9 39 LEE 407 0.59000 9 40 MULTICOM-REFINE 013s 0.55778 9 41 FEIG 166s 0.49222 9 Rosetta TASSER Meta-Method Human expert Targets where template could not (easily) be found in PDB

Questions?