Protein Secondary Structure Prediction

Similar documents
Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Proteins: Characteristics and Properties of Amino Acids

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

Sequence comparison: Score matrices

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES

Lecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction.

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

C h a p t e r 2 A n a l y s i s o f s o m e S e q u e n c e... methods use different attributes related to mis sense mutations such as

The Select Command and Boolean Operators

Part 4 The Select Command and Boolean Operators

7 Protein secondary structure

Range of Certified Values in Reference Materials. Range of Expanded Uncertainties as Disseminated. NMI Service

Protein Secondary Structure Prediction

Protein Secondary Structure Prediction

8 Protein secondary structure

Periodic Table. 8/3/2006 MEDC 501 Fall

Protein Structure Prediction and Display

12 Protein secondary structure

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Physiochemical Properties of Residues

Protein Structure Bioinformatics Introduction

Basics of protein structure

Lecture 14 Secondary Structure Prediction

Lecture 14 - Cells. Astronomy Winter Lecture 14 Cells: The Building Blocks of Life

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Translation. A ribosome, mrna, and trna.

Enzyme Catalysis & Biotechnology

Properties of amino acids in proteins

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Improved Protein Secondary Structure Prediction

Proteome Informatics. Brian C. Searle Creative Commons Attribution

Chemistry Chapter 22

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Lecture 14 Secondary Structure Prediction

Introduction to Bioinformatics Lecture 13. Protein Secondary Structure Elements and their Prediction. Protein Secondary Structure

Protein Struktur (optional, flexible)

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

INTRODUCTION. Amino acids occurring in nature have the general structure shown below:

1. Amino Acids and Peptides Structures and Properties

A Model for Protein Secondary Structure Prediction Meta - Classifiers

Solutions In each case, the chirality center has the R configuration

Bioinformatics: Secondary Structure Prediction

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

All Proteins Have a Basic Molecular Formula

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013

Amino Acids and Peptides

DATA MINING OF ELECTROSTATIC INTERACTIONS BETWEEN AMINO ACIDS IN COILED-COIL PROTEINS USING THE STABLE COIL ALGORITHM ANKUR S.

CAP 5510 Lecture 3 Protein Structures

Protein Secondary Structure Assignment and Prediction

Protein Secondary Structure Prediction Using a Vector-Valued Classifier

Hypergraphs, Metabolic Networks, Bioreaction Systems. G. Bastin

Lecture 15: Realities of Genome Assembly Protein Sequencing

Proteomics. November 13, 2007

Protein Secondary Structure Prediction using Pattern Recognition Neural Network

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 03

Protein Structure. Role of (bio)informatics in drug discovery. Bioinformatics

Getting To Know Your Protein

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39

Problem Set 1

EXAM 1 Fall 2009 BCHS3304, SECTION # 21734, GENERAL BIOCHEMISTRY I Dr. Glen B Legge

Basic Principles of Protein Structures

A rapid and highly selective colorimetric method for direct detection of tryptophan in proteins via DMSO acceleration

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Bioinformatics: Secondary Structure Prediction

Discussion Section (Day, Time): TF:

4. The Michaelis-Menten combined rate constant Km, is defined for the following kinetic mechanism as k 1 k 2 E + S ES E + P k -1

Towards Understanding the Origin of Genetic Languages

Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Protein secondary structure prediction with a neural network

How did they form? Exploring Meteorite Mysteries

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

Classification of DNA Sequences by a MLP and SVM Network

Structures in equilibrium at point A: Structures in equilibrium at point B: (ii) Structure at the isoelectric point:

Evidence from Evolution Activity 75 Points. Fossils Use your textbook and the diagrams on the next page to answer the following questions.

Scoring Matrices. Shifra Ben-Dor Irit Orr

SUPPLEMENTARY MATERIALS

12/6/12. Dr. Sanjeeva Srivastava IIT Bombay. Primary Structure. Secondary Structure. Tertiary Structure. Quaternary Structure.

Modelinig and Simulation of Amino Acide

The Structure of Enzymes!

The Structure of Enzymes!

UNIT TWELVE. a, I _,o "' I I I. I I.P. l'o. H-c-c. I ~o I ~ I / H HI oh H...- I II I II 'oh. HO\HO~ I "-oh

Section Week 3. Junaid Malek, M.D.

Biophysical Society On-line Textbook

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

BCH 4053 Exam I Review Spring 2017

Studies Leading to the Development of a Highly Selective. Colorimetric and Fluorescent Chemosensor for Lysine

Proteins: Structure & Function. Ulf Leser

Discussion Section (Day, Time):

Generation Date: 12/07/2015 Generated By: Tristan Wiley Title: Bio I Winter Packet

Improved Prediction of Signal Peptides: SignalP 3.0

Dental Biochemistry EXAM I

Protein Secondary Structure Prediction using Feed-Forward Neural Network

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Transcription:

part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011

the goal is the prediction of the secondary structure conformation which is local each amino acid will be assigned one of the secondary structure conformations the problem is expected to be easier then tertiary structure prediction afterwards: predict arrangement of secondary structure elements classify the fold according to SCOP (Structural Classification of Proteins)

Secondary Structure Conformations as easy as... H for helix E for extended β-sheet - for other as complicated as... H for α-helix (period 4) I for π-helix (period 5) G for 3 10 -helix (period 3) E for extended β-sheet B for β-bridge T for turn S for bend (region of high curvature) L for loop for other

Chou-Fasman Method Chou and Fasman developed a simple prediction algorithm in the 1970s. The parameter table is calculated as follow: it is based on the probability with which each an amino acid a of type i will appear in a helix H, strand E or turn T propensities are calculated from a set of proteins with known structure: p(a i, H) = 1+log f(a i H) E(a i H) = F(a i,h) F(H) F(a i ) 20 i=1 F(a i) f(a i H) is the observed count for amino acid a i in helices E(a i H) is the expected count for amino acid a i in helices F(a i, H) is the count for amino acid a i in helix conformation F(H) is the count for amino acids in helix conformation F(a i ) is the count for amino acid a i 20 i=1 F(a i) is the total number of amino acids

Chou-Fasman Method conformation parameters Amino acid p(a i, H) p(a i, E) p(a i, T) f(j) f(j + 1) f(j + 2) f(j + 3) Alanine 1.42 0.83 0.66 0.06 0.076 0.035 0.058 Arginine 0.98 0.93 0.95 0.070 0.106 0.099 0.085 Aspartic Acid 1.01 0.54 1.46 0.147 0.110 0.179 0.081 Asparagine 0.67 0.89 1.56 0.161 0.083 0.191 0.091 Cysteine 0.70 1.19 1.19 0.149 0.050 0.117 0.128 Glutamic Acid 1.39 1.17 0.74 0.056 0.060 0.077 0.064 Glutamine 1.11 1.10 0.98 0.074 0.098 0.037 0.098 Glycine 0.57 0.75 1.56 0.102 0.085 0.190 0.152 Histidine 1.00 0.87 0.95 0.140 0.047 0.093 0.054 Isoleucine 1.08 1.60 0.47 0.043 0.034 0.013 0.056 Leucine 1.41 1.30 0.59 0.061 0.025 0.036 0.070 Lysine 1.14 0.74 1.01 0.055 0.115 0.072 0.095 Methionine 1.45 1.05 0.60 0.068 0.082 0.014 0.055 Phenylalanine 1.13 1.38 0.60 0.059 0.041 0.065 0.065 Proline 0.57 0.55 1.52 0.102 0.301 0.034 0.068 Serine 0.77 0.75 1.43 0.120 0.139 0.125 0.106 Threonine 0.83 1.19 0.96 0.086 0.108 0.065 0.079 Tryptophan 1.08 1.37 0.96 0.077 0.013 0.064 0.167 Tyrosine 0.69 1.47 1.14 0.082 0.065 0.114 0.125 Valine 1.06 1.70 0.50 0.062 0.048 0.028 0.053 most frequent in helices: glutamate, methionine, and alanine most frequent in strands: valine and isoleucine most frequent in turns: proline and glycine

Chou-Fasman Method continued find nucleation region helix: 4 out of 6 contiguous residues with p(h) > 1.00 strand: 3 out of 5 contiguous residues with p(e) > 1.00 extend the nucleation regions in both directions helix: until four-residue window from position j to j + 3 has p j (H)+p j+1 (H)+p j+2 (H)+p j+3 (H) 4 < 1.0 strand: analogous minimum length for structure conformations helix: 5 residues strand: 3 residues resolve regions with helix and strand annotation helix: if average p(h) for the region > average p(e) strand: if average p(e) for the region > average p(h)

Chou-Fasman Method continued predict β-turns calculate p(t) = f(j) f(j + 1) f(j + 2) f(j + 3) where f(x) are the bend frequences in the position x of the β-turn (see parameter table) calculate p(s) = p j(s)+p j+1 (S)+p j+2 (S)+p j+3 (S) 4 for S {H, E, T} if p(t) > 0.000075 and if average p(t) > 1.0 and if p(h) < p(t) > p(e) for the same window then a β-turn is predicted

The GOR Method by Garnier, Osguthorpe and Robson (late 1970s) improves the secondary structure prediction by introducing the conditional probabilities of an amino acid to form a particular secondary structure element, given that its neighbors already possess that structure. works on windows of size 17, a j 8,..., a j,..., a j+8 based on data: collecting statistics for 20 17 sequences instead assume: the central residue depends on its neighbors but the neighbors are independent of each other four matrices, each 17 20, reflect the probabilities of the central residue being in a helical, sheet, turn, or coil conformation uses Bayesian statistics P(S a) = P(S, a)/p(a)

The GOR Method continued calculates the information for the joint occurence of the structure S and the amino acid a I(S, a) = log P(S a)/p(s) gives I(S, a) = log f S,a /f S hypothesis residue a is in structure S: I(S, a) hypothesis residue a is in a different conformation: I(notS, a) the difference is I( S, a) = I(S, a) I(notS, a) = log f S,a f S log 1 f S,a 1 f S The values I( S, a) are summed over a j 8,..., a j 1, a j+1,..., a j+8 to approximate the value for I( S j,(a j 8,..., a j+8 ))

Neural Networks the Perceptron inputs: x 1,..., x n weigths: w 1,..., w n the perceptron calculates the sum: a = n i=1 w ix i if a >= θ the output is +1 if a < θ the output is 1

Classification by Perceptrons y x 0 x+y <= 2 +1 y otherwise 1 1 x x + y < 2 +1 y otherwise 1 x 0 y x> 0 +1 otherwise 1 Σ i > 2 +1 otherwise 1 0 1 0 x y > 0 +1 otherwise 1

Goal and Training If the weigths are not know in advance, the perceptron(s) must be trained. Goal: find weigthes such that the perceptron returns the correct answere for all cases in the training set. take the input vector x and the weigths w multiply the vectors and use θ to categorize the output as positive or negative if the perceptron is wrong, update the weigths: calculate w x if false positive calculate w + x if false negative repeat until the perceptron classifies all examples in the training set correctly Proof that the training procedure converges under the assumption that a set of weigths exists that labels all examples correctly.

Step function or sigmoidal curve? Use sigmoidal curve instead of step function because it is differentiable. 1.00 0.50 0.00 0.50 step function 1.00 4.00 2.00 0.00 2.00 4.00 Θ(a) = { +1 if a >= θ -1 : otherwise σ(a) = tanh(a) The output perceptron therefore returns a real number between 1 and +1 instead of either 1 or +1.

Error Minimization Goal: Find a w that minimizes E( w). Let T be the set of trainings examples, and t( x) the target value of example x T E( w) = xint 1 2 (σ( w x) t( x)) 2 This is the formular for squared error which is used to quantify how close the observed value σ( w x) is to the target value t( x).

Overfitting Task: fit a line to the set of points with a neural network green line not a good fit blue line good fit purple line overfitted, no data points support the fancy shape between 0 and 4

Neuronal Network Method How are neuronal networks applied to predict protein secondary structures?

Secondary Structure Prediction from Multiple Alignments

How good are the methods? Chou & Fasman 55% GOR 65% neural network on multiple sequences (PHD) 72% weigthed consensus of many methods moderate improvement database search method (PSI-BLAST) 77%

Example

Literature