Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Similar documents
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Protein Structures. 11/19/2002 Lecture 24 1

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Basics of protein structure

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Analysis and Prediction of Protein Structure (I)

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure: Data Bases and Classification Ingo Ruczinski

Introduction to" Protein Structure

CAP 5510 Lecture 3 Protein Structures

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

ALL LECTURES IN SB Introduction

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Getting To Know Your Protein

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.

Protein Structure Prediction, Engineering & Design CHEM 430

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Bioinformatics. Macromolecular structure

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Motif Prediction in Amino Acid Interaction Networks

Introduction to Computational Structural Biology

D Dobbs ISU - BCB 444/544X 1

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Lecture 8: Protein structure analysis

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

CS612 - Algorithms in Bioinformatics

Details of Protein Structure

Protein Structure Basics

Protein structure alignments

Supersecondary Structures (structural motifs)

Protein Structure Prediction 11/11/05

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Supporting Online Material for

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Protein Structures: Experiments and Modeling. Patrice Koehl

Bio nformatics. Lecture 23. Saad Mneimneh

Protein Structure Prediction

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Protein Structure Prediction and Display

STRUCTURAL BIOINFORMATICS I. Fall 2015

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

SUPPLEMENTARY MATERIALS

Structure to Function. Molecular Bioinformatics, X3, 2006

BCB 444/544 Fall 07 Dobbs 1

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Week 10: Homology Modelling (II) - HHpred

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Study of Mining Protein Structural Properties and its Application

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Lecture 10 (10/4/17) Lecture 10 (10/4/17)

Physiochemical Properties of Residues

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

From Amino Acids to Proteins - in 4 Easy Steps

NMR, X-ray Diffraction, Protein Structure, and RasMol

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Heteropolymer. Mostly in regular secondary structure

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Computational Molecular Biology. Protein Structure and Homology Modeling

Protein Secondary Structure Prediction

Ant Colony Approach to Predict Amino Acid Interaction Networks

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Sequence analysis and comparison

STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

Prediction and refinement of NMR structures from sparse experimental data

A General Model for Amino Acid Interaction Networks

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Bioinformatics: Secondary Structure Prediction

Pymol Practial Guide

Protein Structure Determination

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Bayesian Models and Algorithms for Protein Beta-Sheet Prediction

Protein structure analysis. Risto Laakso 10th January 2005

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

DATE A DAtabase of TIM Barrel Enzymes

Transcription:

CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1

EM Algorithm Goal: Find θ, Z that maximize Pr (X, Z θ) Initialize: random profile E-step: Using profile, compute a likelihood value z ij for each m-window at position i in input sequence j. M-step: Build a new profile by using every m- window, but weighting each one with value z ij. Stop if converged 2/15/07 CAP5510 2 MEME [Bailey, Elkan 1994]

EM Method: Model Parameters Input: upstream sequences X = {X 1, X 2,, X n }, Motif profile: 4 k matrix θ = (θ rp ), r {A,C,G,T} 1 p k θ rp = Pr(residue r in position p of motif) Background distribution: θ r0 = Pr(residue r in background) 2/15/07 CAP5510 3

Z = {Z ij }, where Z ij = EM Method: Hidden Information 1, if motif instance starts at position i of X j 0, otherwise Iterate over probabilistic models that could generate X and Z, trying to converge on this solution, i.e., maximize Pr (X, Z θ). 2/15/07 CAP5510 4

Statistical Evaluation Z-score of a motif with a certain frequency: Information Content or Relative Entropy of an alignment or profile: Maximum a Posteriori (MAP) Score: Model Vs Background Score: Obs( w) Exp( w) z( w) = Var( w) IC( M) = MAP( M ) 4 m i= 1 j= 1 = 4 i= 1 j= 1 m mi, j log b m n Pr( w M ) L( w) = = Pr( w Bg ) i, log j m Π j = 1 m m i, j i b b i, j i i i, j 2/15/07 CAP5510 5 Counts Frequencies

Predicting Motifs in Whole Genome MEME: EM algorithm [ Bailey et al., 1994 ] AlignACE: Gibbs Sampling Approach [ Hughes et al., 2000 ] Consensus: Greedy Algorithm Based [ Hertz et al., 1990 ] ANN-Spec: Artificial Neural Network and a Gibbs sampling method [ Workman et al., 2000 ] YMF: Enumerative search [Sinha et al., 2003 ] 2/15/07 CAP5510 6

Gibbs Sampling for Motif Detection 2/15/07 CAP5510 7

Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary 2/15/07 CAP5510 8

Proteins Primary structure is the sequence of amino acid residues of the protein, e.g., Flavodoxin: AKIGLFYGTQTGVTQTIAESIQQEFGGESIVDLNDIANADA Different regions of the sequence form local regular secondary structures, such as Alpha helix, beta strands, etc. AKIGLFYGTQTGVTQTIAESIQQEFGGESIVDLNDIANADA Secondary 2/15/07 CAP5510 9

α-helix More on Secondary Structures Main chain with peptide bonds Side chains project outward from helix Stability provided by H-bonds between CO and NH groups of residues 4 locations away. β-strand Stability provided by H-bonds with one or more β-strands, forming β-sheets. Needs a β-turn. 2/15/07 CAP5510 10

Proteins Tertiary structures are formed by packing secondary structural elements into a globular structure. Myoglobin Lambda Cro 2/15/07 CAP5510 11

Quaternary Structures in Proteins The final structure may contain more than one chain arranged in a quaternary structure. Quaternary Insulin Hexamer 2/15/07 CAP5510 12

Amino Acid Types Hydrophobic I,L,M,V,A,F,P Charged Basic Acidic Polar K,H,R E,D S,T,Y,H,C,N,Q,W Small A,S,T Very Small Aromatic A,G F,Y,W 2/15/07 CAP5510 13

Structure of a single amino acid All 3 figures are cartoons of an amino acid residue. 2/15/07 CAP5510 14

Chains of amino acids Amino acids vs Amino acid residues 2/15/07 CAP5510 15

Angles φ and ψ in the polypeptide chain 2/15/07 CAP5510 16

2/15/07 CAP5510 17

Amino Acid Structures from Klug & Cummings 2/15/07 CAP5510 18

Amino Acid Structures from Klug & Cummings 2/15/07 CAP5510 19

Amino Acid Structures from Klug & Cummings 2/15/07 CAP5510 20

Amino Acid Structures from Klug & Cummings 2/15/07 CAP5510 21

2/15/07 CAP5510 22

2/15/07 CAP5510 23

Alpha Helix 2/15/07 CAP5510 24

2/15/07 CAP5510 25

Beta Strand 2/15/07 CAP5510 26

Active Sites Active sites in proteins are usually hydrophobic pockets/crevices/troughs that involve sidechain atoms. 2/15/07 CAP5510 27

Active Sites Left PDB 3RTD (streptavidin) and the first site located by the MOE Site Finder. Middle 3RTD with complexed ligand (biotin). Right Biotin ligand overlaid with calculated alpha spheres of the first site. 2/15/07 CAP5510 28

Secondary Structure Prediction Software 2/15/07 CAP5510 29

PDB: Protein Data Bank Database of protein tertiary and quaternary structures and protein complexes. http://www.rcsb.org/pdb/ Over 29,000 structures as of Feb 1, 2005. Structures determined by NMR Spectroscopy X-ray crystallography Computational prediction methods Sample PDB file: Click here [ ] 2/15/07 CAP5510 30

Protein Folding Unfolded Molten Globule State Folded Native State Rapid (< 1s) Slow (1 1000 s) How to find minimum energy configuration? 2/15/07 CAP5510 31

Modular Nature of Protein Structures Example: Diphtheria Toxin 2/15/07 CAP5510 32

Protein Structures Most proteins have a hydrophobic core. Within the core, specific interactions take place between amino acid side chains. Can an amino acid be replaced by some other amino acid? Limited by space and available contacts with nearby amino acids Outside the core, proteins are composed of loops and structural elements in contact with water, solvent, other proteins and other structures. 2/15/07 CAP5510 33

SPDBV RASMOL CHIME Viewing Protein Structures 2/15/07 CAP5510 34

Structural Classification of Proteins Over 1000 protein families known Sequence alignment, motif finding, block finding, similarity search SCOP (Structural Classification of Proteins) Based on structural & evolutionary relationships. Contains ~ 40,000 domains Classes (groups of folds), Folds (proteins sharing folds), Families (proteins related by function/evolution), Superfamilies (distantly related proteins) 2/15/07 CAP5510 35

SCOP Family View 2/15/07 CAP5510 36

CATH: Protein Structure Classification Semi-automatic classification; ~36K domains 4 levels of classification: Class (C), depends on sec. Str. Content α class, β class, α/β class, α+β class Architecture (A), orientation of sec. Str. Topolgy (T), topological connections & Homologous Superfamily (H), similar str and functions. 2/15/07 CAP5510 37

DALI/FSSP Database Completely automated; 3724 domains Criteria of compactness & recurrence Each domain is assigned a Domain Classification number DC_l_m_n_p representing fold space attractor region (l), globular folding topology (m), functional family (n) and sequence family (p). 2/15/07 CAP5510 38

Structural Alignment What is structural alignment of proteins? 3-d superimposition of the atoms as best as possible, i.e., to minimize RMSD (root mean square deviation). Can be done using VAST and SARF Structural similarity is common, even among proteins that do not share sequence similarity or evolutionary relationship. 2/15/07 CAP5510 39

Other databases & tools MMDB contains groups of structurally related proteins SARF structurally similar proteins using secondary structure elements VAST Structure Neighbors SSAP uses double dynamic programming to structurally align proteins 2/15/07 CAP5510 40

5 Fold Space classes Attractor 1 can be characterized as alpha/beta, attractor 2 as all-beta, attractor 3 as all-alpha, attractor 5 as alpha-beta meander (1mli), and attractor 4 contains antiparallel beta-barrels e.g. OB-fold (1prtF). 2/15/07 CAP5510 41

Fold Types & Neighbors Structural neighbours of 1urnA (top left). 1mli (bottom right) has the same topology even though there are shifts in the relative orientation of secondary structure elements. 2/15/07 CAP5510 42

Sequence Alignment of Fold Neighbors 2/15/07 CAP5510 43

Frequent Fold Types 2/15/07 CAP5510 44

Protein Structure Prediction Holy Grail of bioinformatics Protein Structure Initiative to determine a set of protein structures that span protein structure space sufficiently well. WHY? Number of folds in natural proteins is limited. Thus a newly discovered proteins should be within modeling distance of some protein in set. CASP: Critical Assessment of techniques for structure prediction To stimulate work in this difficult field 2/15/07 CAP5510 45

PSP Methods homology-based modeling methods based on fold recognition Threading methods ab initio methods From first principles With the help of databases 2/15/07 CAP5510 46

Best method for PSP ROSETTA As proteins fold, a large number of partially folded, low-energy conformations are formed, and that local structures combine to form more global structures with minimum energy. Build a database of known structures (I-sites) of short sequences (3-15 residues). Monte Carlo simulation assembling possible substructures and computing energy 2/15/07 CAP5510 47

Threading Methods See p471, Mount http://www.bioinformaticsonline.org/links/ch_10_t_7.html 2/15/07 CAP5510 48

2/15/07 CAP5510 49