Protein Structure Prediction 11/11/05

Similar documents
BCB 444/544 Fall 07 Dobbs 1

D Dobbs ISU - BCB 444/544X 1

BCB 444/544 Fall 07 Dobbs 1

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Protein Structures. 11/19/2002 Lecture 24 1

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

ALL LECTURES IN SB Introduction

Bioinformatics. Macromolecular structure

CS612 - Algorithms in Bioinformatics

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CAP 5510 Lecture 3 Protein Structures

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Getting To Know Your Protein

Basics of protein structure

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Analysis and Prediction of Protein Structure (I)

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Protein Structure Prediction

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

BCB 444/544 Fall 07 Dobbs 1

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Prediction, Engineering & Design CHEM 430

Protein structure alignments

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Building 3D models of proteins

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structures: Experiments and Modeling. Patrice Koehl

Large-Scale Genomic Surveys

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Introduction to" Protein Structure

Syllabus BINF Computational Biology Core Course

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Week 10: Homology Modelling (II) - HHpred

Supporting Online Material for

Template-Based Modeling of Protein Structure

Prediction and refinement of NMR structures from sparse experimental data

Introduction to Computational Structural Biology

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Lecture 8: Protein structure analysis

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Protein Structure Prediction

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Protein Structure Determination

Protein Structure: Data Bases and Classification Ingo Ruczinski

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

The protein folding problem consists of two parts:

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Protein Structure & Motifs

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

FlexPepDock In a nutshell

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Protein Structure Basics

EBI web resources II: Ensembl and InterPro

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Protein Structure Prediction

STRUCTURAL BIOINFORMATICS I. Fall 2015

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Visualization of Macromolecular Structures

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Lecture 21 (11/3/17) Protein Stability, Folding, and Dynamics Hydrophobic effect drives protein folding

STRUCTURAL BIOINFORMATICS II. Spring 2018

Ab-initio protein structure prediction

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

"Omics" - Experimental Approachs 11/18/05

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

1) NMR is a method of chemical analysis. (Who uses NMR in this way?) 2) NMR is used as a method for medical imaging. (called MRI )

BCMP 201 Protein biochemistry

Presenter: She Zhang

An integrated software environment for protein structure refinement

Structural biomathematics: an overview of molecular simulations and protein structure prediction

RNA and Protein Structure Prediction

STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

Details of Protein Structure

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Protein Structure Prediction and Display

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

Transcription:

11/11/05 Protein Structure Prediction & Modeling Bioinformatics Seminars Nov 11 Fri 12:10 BCB Seminar in E164 Lago Building Supertrees Using Distances Steve Willson, Dept of Mathematics http://www.bcb.iastate.edu/courses/bcb691-f2005.html Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link) Nov 14 Mon 1:10 PM Doug Brutlag, Stanford Discovering transcription factor binding sites Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 1 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 2 Protein Structure & Function: Analysis & Prediction Mon Protein structure: basics; classification,databases, visualization Wed Protein structure databases - cont. Thurs Lab Protein structure databases Visualization software Secondary structure prediction Reading Assignment (for Mon-Fri) Mount Bioinformatics Chp 10 Protein classification & structure prediction http://www.bioinformaticsonline.org/ch/ch10/index.html pp. 409-491 Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html Fri Protein structure prediction Protein-nucleic acid interactions 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 3 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 4 Required: BCB 544 Additional Reading Gene Prediction: Burge & Karlin 1997 JMB 268:78 Optional: Prediction of complete gene structures in human genomic DNA Structure Prediction: Schueler-Furman Baker 2005 Science 310:638 Progress in modeling of protein structures and interactions Review last lecture: Protein Structure: Databases, Classification & Visualization 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 5 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 6 D Dobbs ISU - BCB 444/544X 1

Protein sequence databases UniProt (SwissProt, PIR, EBI) http://www.pir.uniprot.org NCBI Protein http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=protein More on these later: protein function prediction Protein sequence & structure: analysis Diamond STING Millennium - many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/sms/ SwissProt (UniProt) protein knowledgebase http://us.expasy.org/sprot InterPRO sequence analysis tools http://www.ebi.ac.uk/interpro 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 7 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 8 Protein structure databases PDB Protein Data Bank http://www.rcsb.org/pdb/ MMDB (RCSB) - THE protein structure database Molecular Modeling Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=structure (NCBI Entrez) - has "added" value Protein structure classification SCOP = Structural Classification of Proteins Levels reflect both evolutionary and structural relationships http://scop.mrc-lmb.cam.ac.uk/scop CATH = Classification by Class, Architecture, Topology & Homology http://cathwww.biochem.ucl.ac.uk/latest/ MSD Molecular Structure Database http://www.ebi.ac.uk/msd Especially good for interactions, binding sites DALI/FSSP (recently moved to EBI & reorganized) fully automated structure alignments DALI server http://www.ebi.ac.uk/dali/index.html DALI Database (fold classification) http://ekhidna.biocenter.helsinki.fi/dali/start 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 9 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 10 Protein structure visualization Molecular Visualization Freeware: http://www.umass.edu/microbio/rasmol MolviZ.Org http://www.umass.edu/microbio/chime Protein Explorer http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm RASMOL (& many decendents: Protein Explorer,PyMol, MolMol, etc.) http://www.umass.edu/microbio/rasmol/index2.htm CHIME http://www.umass.edu/microbio/chime/getchime.htm Cn3D http://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/structure/cn3d/ Deep View = Swiss-PDB Viewer http://www.expasy.org/spdbv 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 11 Protein structure visualization Superb interactive structure visualization software by Jane & Dave Richardson, Duke University KINIMAGE http://kinemage.biochem.duke.edu/ Fantastic research tools for structure analysis & refinement 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 12 D Dobbs ISU - BCB 444/544X 2

RCSB PDB - Beta site http://pdbbeta.rcsb.org/pdb/welcome.do MMDB http://www.ncbi.nlm.nih.gov/structure/mmdb/mmdb.shtml 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 13 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 14 Cn3D http://www.ncbi.nlm.nih.gov/structure/cn3d/cn3d.shtml Cn3D : Displaying 2' Structures 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 15 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 16 Cn3D: Structural Alignments SCOP - Structure Classification http://scop.mrc-lmb.cam.ac.uk/scop 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 17 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 18 D Dobbs ISU - BCB 444/544X 3

6 main classes of protein structure 1) α Domains 2) β Domains 3) α/β Domains Bundles of helices connected by loops Mainly antiparallel sheets, usually with 2 sheets forming sandwich Mainly parallel sheets with intervening helices, also mixed sheets 4) α+β Domains Mainly segregated helices and sheets 5) Multidomain (α & β) Containing domains from more than one class 6) Membrane & cell-surface proteins CATH - Structure Classification http://cathwww.biochem.ucl.ac.uk/latest/ 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 19 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 20 Structural Genomics ~ 30,000 "traditional" genes in human genome (not counting:???) ~ 3,000 proteins in a typical cell > 2 million sequences in UniProt > 33,000 protein structures in the PDB Experimental determination of protein structure lags far behind sequence determination! Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction Structural Genomics Projects TargetDB: database of structural genomics targets http://targetdb.pdb.org Protein Structure Prediction? 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 21 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 22 Protein Folding "Major unsolved problem in molecular biology" In cells: spontaneous assisted by enzymes assisted by chaperones Steps in Protein Folding 1- "Collapse"- driving force is burial of hydrophobic aa s (fast - msecs) 2- Molten globule - helices & sheets form, but "loose" (slow - secs) 3- "Final" native folded state - compaction, some 2' structures rearranged In vitro: many proteins fold spontaneously & many do not! Native state? - assumed to be lowest free energy - may be an ensemble of structures 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 23 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 24 D Dobbs ISU - BCB 444/544X 4

Protein Dynamics Protein in native state is NOT static Function of many proteins depends on conformational changes, sometimes large, sometimes small Globular proteins are inherently "unstable" (NOT evolved for maximum stability) Energy difference between native and denatured state is very small (5-15 kcal/mol) (this is equivalent to 1 or 2 H-bonds!) Folding involves changes in both entropy & enthalpy Protein Structure Prediction Structure is largely determined by sequence BUT: Similar sequences can assume different structures Dissimilar sequences can assume similar structures Many proteins are multi-functional Protein folding: determination of folding pathways prediction of tertiary structure still largely unsolved problems 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 25 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 26 New today: Protein Structure Prediction Secondary structure (text focuses on this - I won't) Tertiary structure (let's do this instead!) Deciphering the Protein Folding Code Protein Structure Prediction or "Protein Folding" Problem given the amino acid sequence of a protein, predict its 3-dimensional structure (fold) "Inverse Folding" Problem given a protein fold, identify every amino acid sequence that can adopt its 3-dimensional structure 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 27 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 28 Protein Structure Determination? High-resolution structure determination X-ray crystallography (<1A ) Nuclear magnetic resonance (NMR) (~1-2.5A ) Lower-resolution structure determination Cryo-EM (electron-microscropy) ~10-15A Theoretical Models? Highly variable - now, some equiv to X-ray! Tertiary Structure Prediction Fold or tertiary structure prediction problem can be formulated as a search for minimum energy conformation search space is defined by psi/phi angles of backbone and side-chain rotamers search space is enormous even for small proteins! number of local minima increases exponentially of the number of residues Computationally it is an exceedingly difficult problem! 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 29 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 30 D Dobbs ISU - BCB 444/544X 5

Ab Initio Prediction 1. Develop energy function bond energy bond angle energy dihedral angle energy van der Waals energy electrostatic energy 2. Calculate structure by minimizing energy function (usually Molecular Dynamics or Monte Carlo methods) Ab initio prediction - not practical in general Computationally? very expensive Accuracy? Usually poor for all but short peptides (but see Baker review!) Two primary methods Comparative Modeling 1) Homology modeling 2) Threading (fold recognition) Note: both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target Provide folded structure only Provides both folding pathway & folded structure 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 31 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 32 Homology Modeling 1. Identify homologous protein sequences PSI-BLAST multiple sequence alignment (MSA) 2. Among those with available structures, choose closest sequence match for template 3. Build model by placing residues into corresponding positions of homologous structure models & refine by "tweaking" Homology modeling - works "well" Computationally? not very expensive Accuracy? higher sequence identity better model Requires >30% sequence identity Threading - Fold Recognition Identify best fit between target sequence & template structure 1. Develop energy function 2. Develop template library 3. Align target sequence with each template & score 4. Determine best score (1D to 3D alignment) 5. Build refine structure as in homology modeling Threading - works "sometimes" Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck") But, usually higher sequence identity better model 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 33 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 34 Threading - a "local" example Threading Goals & Issues Target Sequence ALKKGF HFDTSE Find correct sequence-structure alignment of a target sequence with its native-like fold in PDB Structure Templates 1. Align target sequence with template structures (fold library) from the Protein Data Bank (PDB) 2. Calculate energy (score) to evaluate goodness of fit between target sequence & template structure 3. Rank models based on energy scores Structure database - must be complete: no decent model if no good template in library! Sequence-structure alignment algorithm: Bad alignment Bad score! Energy function (scoring scheme): must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments must distinguish correct fold from close decoys Prediction reliability assessment - how determine whether predicted structure is correct (or even close?) 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 35 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 36 D Dobbs ISU - BCB 444/544X 6

Threading Structure database Build a template database (e.g., ASTRAL domain library derived from PDB) Threading - Energy function Two main methods (and combinations of these) Structural profile (environmental) physico-chemical properties of aa s Contact potential (statistical) based on contact statistics from PDB (Miyazawa & Jernigan - Jernigan now at ISU) Supplement with additional decoys, e.g., generated using ab initio approach such as Rosetta (Baker) 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 37 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 38 Protein Threading typical energy function MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE What is "probability" that two specific residues are in contact? Total energy: E_p + E_s + E_g How well does a specific residue fit structural environment? Alignment gap penalty? Find a sequence-structure alignment that minimizing the energy function A Rapid Threading Approach for Protein Structure Prediction Kai-Ming Ho, Physics Haibo Cao Yungok Ihm Zhong Gao James Morris Cai-zhuang Wang Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Jeff Sander 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 39 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 40 Performance Evaluation? "Blind Test" Typical Results: well, actually, BEST Results: HO = #1 ranked CASP prediction for this target CASP5 Competition (Critical Assessment of Protein Structure Prediction) Target 174 PDB ID = 1MG7 Predicted Structure Given: Amino acid sequence Goal: Predict 3-D structure T174_1 (before experimental results published) Actual Structure T174_2 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 41 D Dobbs ISU - BCB 444/544X 7

Overall Performance in CASP5 Contest (M. Levitt, Stanford) FR Fold Recognition (targets manually assessed by Nick Grishin) ----------------------------------------------------------- Rank Z-Score Ngood Npred NgNW NpNW Group-name 1 24.26 9.00 12.00 9 12 Ginalski 2 21.64 7.00 12.00 7 12 Skolnick Kolinski 3 19.55 8.00 12.50 9 14 Baker 4 16.88 6.00 10.00 6 10 BIOINFO.PL 5 15.25 7.00 7.00 7 7 Shortle 6 14.56 6.50 11.50 7 13 BAKER-ROBETTA 7 13.49 4.00 11.00 4 11 Brooks 8 11.34 3.00 6.00 3 6 Ho-Kai-Ming 9 10.45 3.00 5.50 3 6 Jones-NewFold ----------------------------------------------------------- FR NgNW - number of good predictions without weighting for multiple models FR NpNW - number of total predictions without weighting for multiple models D Dobbs ISU - BCB 444/544X 8