CAP 5510 Lecture 3 Protein Structures

Similar documents
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Basics of protein structure

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Structures. 11/19/2002 Lecture 24 1

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

ALL LECTURES IN SB Introduction

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

RNA and Protein Structure Prediction

Protein Secondary Structure Prediction

Analysis and Prediction of Protein Structure (I)

From Amino Acids to Proteins - in 4 Easy Steps

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Physiochemical Properties of Residues

Introduction to" Protein Structure

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Protein Structures: Experiments and Modeling. Patrice Koehl

Week 10: Homology Modelling (II) - HHpred

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Bioinformatics. Macromolecular structure

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Getting To Know Your Protein

Molecular Modeling lecture 2

Protein Structure Prediction

Protein Structure Basics

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Protein Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction

Motif Prediction in Amino Acid Interaction Networks

Introduction to Computational Structural Biology

Conformational Geometry of Peptides and Proteins:

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Useful background reading

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Protein Structure Determination

Protein Structure Prediction and Display

Principles of Physical Biochemistry

Major Types of Association of Proteins with Cell Membranes. From Alberts et al

Orientational degeneracy in the presence of one alignment tensor.

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

INDEXING METHODS FOR PROTEIN TERTIARY AND PREDICTED STRUCTURES

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Protein structure alignments

The Structure and Functions of Proteins

Protein Structure: Data Bases and Classification Ingo Ruczinski

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Building 3D models of proteins

BME Engineering Molecular Cell Biology. Structure and Dynamics of Cellular Molecules. Basics of Cell Biology Literature Reading

Model Mélange. Physical Models of Peptides and Proteins

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Ch 3: Chemistry of Life. Chemistry Water Macromolecules Enzymes

SUPPLEMENTARY MATERIALS

Biological Macromolecules

Packing of Secondary Structures

7 Protein secondary structure

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Biomolecules: lecture 9

Details of Protein Structure

8 Protein secondary structure

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Computational Genomics and Molecular Biology, Fall

STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Transcription:

CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1

Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2

Protein Conformational Structures Hydrophobicity (lack affinity to water), hydrogen bonding, handedness, and tension between hierarchy and interactions (electrostatic and van der Waals), a protein structure is a complex geometric pattern of polypeptide, side chains, and the solvent environment. The protein in solvent has the conformation of minimum free energy. Molecular dynamics of the potential energy with some nonlinear force terms gives the conformation structure. 8/19/2005 Su-Shing Chen, CISE 3

Bond Distance d O C Bond Angle Φ (Carbonyl) Φ Cα (α Carbon) H Torsion Angle Ψ N (Nitrogen) d Ψ (Hydrogen) Geometric Features C n-1 O n-1 Cα n-1 8/19/2005 Su-Shing Chen, CISE 4

COOH (Carbonyl group) H 2 N Cα H (amino group) R (side chain) Basic structure of an amino acid 8/19/2005 Su-Shing Chen, CISE 5

Secondary Structures Alpha helix repeated curvature (bond) and torsion φ, ψ angles, repeating patterns of hydrogen bonding between CO of residue n and NH of residue n+4. Beta sheet repeating patterns of hydrogen bonding between distant parts of the backbone. Random coil 8/19/2005 Su-Shing Chen, CISE 6

Secondary Structures 8/19/2005 Su-Shing Chen, CISE 7

Structure Databases 3-D biomolecular structures of protein amino acid sequences. 3-D structures are determined by X-ray crystallography and nuclear magnetic resonance (NMR). Protein folding is a grand challenge problem: A primary protein sequence determines its 3-D structure Anfinsen et al 1961 8/19/2005 Su-Shing Chen, CISE 8

How to Form 3-D Structures Start from the NH 2 terminus, we identify each amino acid side chain by comparing the atomic structure of each residue with the chemical structure of the 20 amino acids. Each atom has x,y,z coordinate, together a ball-and-stick structure is formed. A chemical graph of chemical data associated with the ball-and-stick model. 8/19/2005 Su-Shing Chen, CISE 9

Atoms, Bonds and Energy The bond length: average length of a stable X-X bond is about? angstroms. The bond (curvature) angle Φ = κ The torsion angle Ψ=τ Potential Energy = (1/2) Σ c d (d-d 0 ) 2 + (1/2) Σ c κ (κ κ 0 ) 2 + (1/2) Σ c τ (1+cos(nτδ) + Σ (Α/r 12 Β/r 6 + q 1 q 2 /Dr). 8/19/2005 Su-Shing Chen, CISE 10

RMSE (Root Mean Square Error) Similarity measure of 3-D structures. X = {(x 1, y 1, z 1 ),, (x n, y n, z n )} X = {(x 1, y 1, z 1 ),, (x n, y n, z n )} R(X,X ) = squareroot Σ (x i -x i ) 2 +(y i - y i ) 2 +(z i -z i ) 2 R(X,X ) = squareroot Σ (d i -d i ) 2 +(κ i -κ i ) 2 +(τ i -τ i ) 2 8/19/2005 Su-Shing Chen, CISE 11

Inverse Protein Folding - Threading Find amino acid sequences folding into a known 3-D structure. Sequence similarities > 30 %. Profile method: Compatible environments: area of buried residue inaccessible to solvent, side chains of polar O, N atoms, local secondary structures. 8/19/2005 Su-Shing Chen, CISE 12

Protein Superfamilies & Domain Superfolds Many protein structures are similar. Protein domains of more than 30% sequence similarity adopt the same fold structure. Some proteins with statistically insignificant sequence similarity have similar fold. Dayhoff: Families > 50% similarity, superfamilies > 30-40% similarity. 8/19/2005 Su-Shing Chen, CISE 13

8/19/2005 Su-Shing Chen, CISE 14

Geometric Features of Proteins S. Chen, Characterizing and learning of protein conformations, 1993. A set of points P(i) on the backbone. A right handed orthonormal basis.{ti, Ni, Bi}. Ti is the (tangent) vector P(i)P(i+1). The binormal vector is Bi=Ti-1xTi/ Ti- 1xTi, normal to the plane P(i-1), P(i), P(i+1). The normal is Ni=BiXTi. The curvature ki is the angle between Ti-1 and Ti. The torsion is Bi and Bi+1. 8/19/2005 Su-Shing Chen, CISE 15

P(i+1) Ti P(i) Ti+1 P(i+2) Ni+1 Bi+1 8/19/2005 Su-Shing Chen, CISE 16

Motivations to Study Protein Structures Proteins are interesting to look at! Gene-sequencing projects are accumulating gene data and protein sequences at a rapid rate. However information about their structure is available for only a small fraction. Understanding them might help reduce this gap. 8/19/2005 Su-Shing Chen, CISE 17

Secondary Structures Prediction Protein structure prediction is one of the most significant tasks tackled in computational structural biology. It has the aim of determining the threedimensional structure of proteins from their amino acid sequences. In more formal terms, this is the prediction of protein tertiary structure from primary structure. Protein structure is a valuable resource in drug design and is an highly active field of research. The output of experimentally determined protein structures, typically by time-consuming and relatively expensive X-ray crystallography or NMR spectroscopy, is lagging far behind the output of protein sequences 8/19/2005 Su-Shing Chen, CISE 18

Chou-Fasman Based on frequencies of residues in alpha helices, beta sheets and turns. Accuracy 50-60% 8/19/2005 Su-Shing Chen, CISE 19

Chou-Fasman 8/19/2005 Su-Shing Chen, CISE 20

Chou-Fasman Assign Pij values 1. Assign all of the residues the appropriate set of parameters T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 P(E) 147 75 55 147 83 37 130 105 93 75 147 75 P(turn) 114 143 152 114 66 74 59 60 95 143 114 156 8/19/2005 Su-Shing Chen, CISE 21

Chou-Fasman Scan peptide for α helix regions Identify regions where 4/6 have a P(H) >100 alpha-helix nucleus T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 8/19/2005 Su-Shing Chen, CISE 22

Chou-Fasman Extend α-helix nucleus Extend helix in both directions until a set of four residues have an average P(H) <100. Repeat steps 1 3 for entire peptide T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 8/19/2005 Su-Shing Chen, CISE 23

Chou-Fasman Scan peptide for β-sheet regions Identify regions where 3/5 have a P(E) >100 b-sheet nucleus Extend b-sheet until 4 continuous residues an have an average P(E) < 100 T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 P(E) 147 75 55 147 83 37 130 105 93 75 147 75 If region average > 105 and the average P(E) > average P(H) then b-sheet 8/19/2005 Su-Shing Chen, CISE 24

Chou-Fasman To identify a bend at residue number j, calculate the following value p(t) = f(j)f(j+1)f(j+2)f(j+3) where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for P(turn) > 1.00 in the tetrapeptide; and (3) the averages for the tetrapeptide obey the inequality P(a-helix) < P(turn) > P(bsheet), then a beta-turn is predicted at that location 8/19/2005 Su-Shing Chen, CISE 25

Comparative (homolog) Modeling Homology modeling is based on the reasonable assumption that two homologous proteins will share very similar structures. Given the amino acid sequence of a unknown structure and the solved structure of a homologous protein, each amino acid in the solved structure is mutated, computationally, into the corresponding amino acid from the unknown structure. 8/19/2005 Su-Shing Chen, CISE 26

Comparative (homology) Modeling 8/19/2005 Su-Shing Chen, CISE 27

Homology Modeling In homology modeling the over all fold of a protein is known. The goal is to try to predict the detailed conformation of a protein given a homologous protein Comparative ("homology") modeling approximates the 3D structure of a target protein for which only the sequence is available, provided an empirical 3D "template" structure is available with >30% sequence identity Suppose you want to know the 3D structure of a target protein that has not been solved empirically by X-ray crystallography or NMR. You have only the sequence. If an empirically determined 3D structure is available for a sufficiently similar protein (50% or better sequence identity would be good), you can use software that arranges the backbone of your sequence identically to this template. This is called "comparative modeling" or "homology modeling". It is, at best, moderately accurate for the positions of alpha carbons in the 3D structure, in regions where the sequence identity is high. It is inaccurate for the details of sidechain positions, and for inserted loops with no matching sequence in the solved structure. 8/19/2005 Su-Shing Chen, CISE 28

SWISS-PDB Viewer 8/19/2005 Su-Shing Chen, CISE 29

Protein Threading Protein threading scans the amino acid sequence of a unknown structure against a database of solved structures. In each case, a scoring function is used to assess the compatibility of the sequence to the structure, thus yielding possible three-dimensional models. Its possible that two protein have less than 25% pairwise sequence identity but however have similar protein structure. In these cases remote homology modelling is required. 8/19/2005 Su-Shing Chen, CISE 30

Protein Threading The algorithm starts with target protein sequence aligned with SWISS-PORT protein sequences. The resulting multiple sequence is converted into a 1D structural profile. So the amino acid sequences now been translated into a 1D string of structure symbols. Now the idea is to find a 3D fold that is similar to our structure. Finally, predicted and observed 1D structure profiles were optimally aligned by a dynamic programming algorithm The best hit of the alignment procedure is recorded and a 3D model is build from there. 8/19/2005 Su-Shing Chen, CISE 31

Protein Threading 8/19/2005 Su-Shing Chen, CISE 32

Protein Threading 8/19/2005 Su-Shing Chen, CISE 33

Ab Intito Folding Researchers have pursued the problem of predicting three-dimensional protein structure only from the amino acid sequence Ab initio folding is based on the global optimization of a potential energy function and in general does not use knowledge of experimentally determined protein structures. Present ab initio folding methods require intense and exhaustive computing time, which increases as a function of the length of the protein. This limitation is due in part to the assumption that the initial condition for the ab initio folding protein is the linear sequence of residues comprising the protein as encoded by the gene. It is also due to optimizing based on all atom potential energy functions and the use of suboptimal global optimization techniques 8/19/2005 Su-Shing Chen, CISE 34

Prediction of transmembrane proteins transmembrane proteins - the polypeptide chain actually traverses the lipid bilayer. 8/19/2005 Su-Shing Chen, CISE 35

Why are they important Membrane proteins are important for several processes and functions in all biological systems Receptors for neurotransmitters or hormones Form ion channels Serve as the respiratory chain Nearly 30% of known proteins are membrane bound 8/19/2005 Su-Shing Chen, CISE 36

Why Is Prediction Of Transmembrane Regions Important? Bad News Even though X-Ray crystallography is becoming more popular transmembrane proteins are very difficult to crystallize Good News It is commonly accepted that topology prediction of transmembrane proteins is easier and yields higher accuracy than the prediction of the secondary structure of globular proteins 8/19/2005 Su-Shing Chen, CISE 37

Properties Of A Membrane Protein Traverses the lipid bi-layer once or several times Generally possess sequences of hydrophobic residues α-helical transmembrane structure Typically 17 to 25 residues in length 8/19/2005 Su-Shing Chen, CISE 38

Brief Transmembrane Prediction History Cell membrane is a lipid nonpolor layer First attempts used this information to label sequences of non-polar residues as potential transmembrane regions Accuracy was increased by considering the charge distribution between inside the cell and outside the cell segments Environment in the cell different from outside the cell Prediction using neural nets Using HMM (Hidden Markov Model) 8/19/2005 Su-Shing Chen, CISE 39

Polar and nonpolar amino acids 8/19/2005 Su-Shing Chen, CISE 40

Neural Networks The network attempts to determine the next state given the current state and input. This approach is recursive because the state calculated is used in the next step as the previous state for the network. The choice of neural networks as the empirical learning system on which to build was made for a couple of reasons. One basic reason is that networks provide a very general mechanism for representing concepts. A neural network, given the proper number of hidden units and hidden layers, can learn almost any type of concept. A second reason for using neural networks is that they generally deal very well with noisy and incorrect data. limitations of neural networks, one basic problem is how to go about selecting the topology of the network 8/19/2005 Su-Shing Chen, CISE 41

Example 8/19/2005 Su-Shing Chen, CISE 42

Neural Network for Protein Structure Prediction 8/19/2005 Su-Shing Chen, CISE 43

Hidden Markov Model Widely used in bioinformatics Sequence alignment, generating profiles for protein families and database searching Can be tailored to particular problems Any known structural knowledge can be incorporated into the models architecture in order to obtain a more accurate prediction A set of states, rules for changing states, and probabilities of state transitions 8/19/2005 Su-Shing Chen, CISE 44

HMM Architecture 8/19/2005 Su-Shing Chen, CISE 45

Parameters Of The Model Fixed Length Sequences Helix Length Min 17 and Max 25 residues Tail Length Min 1 and Max 15 Residues Train HMM 8/19/2005 Su-Shing Chen, CISE 46

HMM By defining states for transmembrane helix residues and other states for residues in loops, residues on either side of the membrane, and connecting them in a cycle, we can produce a model that in architecture closely resembles the biological system we are modelling. If the model parameters are tuned to capture the biological reality, the path of a protein sequence through the states with the highest probability should be able to predict the true topology. 8/19/2005 Su-Shing Chen, CISE 47

HMM Results 8/19/2005 Su-Shing Chen, CISE 48

Problem Studied in Earlier Classes 8/19/2005 Su-Shing Chen, CISE 49

No structures for Cellulose Synthase 8/19/2005 Su-Shing Chen, CISE 50