Improved Protein Secondary Structure Prediction

Similar documents
Protein Structures: Experiments and Modeling. Patrice Koehl

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Protein Secondary Structure Prediction

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

#33 - Genomics 11/09/07

Protein Secondary Structure Prediction

Basics of protein structure

SUPPLEMENTARY MATERIALS

Protein Structure Prediction and Display

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Bioinformatics: Secondary Structure Prediction

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Lab 5: 16 th April Exercises on Neural Networks

Bioinformatics: Secondary Structure Prediction

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Course 395: Machine Learning - Lectures

Artifical Neural Networks

An Artificial Neural Network Classifier for the Prediction of Protein Structural Classes

Neural Networks DWML, /25

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

Revision: Neural Network

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Neural Networks (Part 1) Goals for the lecture

y(x n, w) t n 2. (1)

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Classification with Perceptrons. Reading:

Computational Intelligence Winter Term 2017/18

Single layer NN. Neuron Model

Prediction of protein secondary structure by mining structural fragment database

Simple Neural Nets For Pattern Classification

Computational Intelligence

In the Name of God. Lecture 11: Single Layer Perceptrons

Master s Thesis June 2018 Supervisor: Christian Nørgaard Storm Pedersen Aarhus University

Protein Secondary Structure Assignment and Prediction

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Artificial Neural Networks. Edward Gatt

Protein Secondary Structure Prediction using Pattern Recognition Neural Network

Radial-Basis Function Networks

Protein Secondary Structure Prediction

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

COMP-4360 Machine Learning Neural Networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

Protein Structure Prediction

Neural networks III: The delta learning rule with semilinear activation function

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

ECE521 Lectures 9 Fully Connected Neural Networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

A New Similarity Measure among Protein Sequences

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Artificial Neural Network

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Neural Networks Task Sheet 2. Due date: May

NN V: The generalized delta learning rule

addresses: b Department of Mathematics and Statistics, G.N. Khalsa College, University of Mumbai, India. a.

SINGLE-SEQUENCE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEAREST-NEIGHBOR CLASSIFICATION OF PROTEIN WORDS

ADALINE for Pattern Classification

AI Programming CS F-20 Neural Networks

CSC242: Intro to AI. Lecture 21

Single Layer Perceptron Networks

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Artificial Neural Network

Learning and Neural Networks

Computational statistics

Multi-layer Perceptron Networks

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Lecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction.

Protein Secondary Structure Prediction using Feed-Forward Neural Network

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Multilayer Perceptron

CSC Neural Networks. Perceptron Learning Rule

Protein Secondary Structure Prediction Using a Vector-Valued Classifier

Protein Structure Prediction

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Machine Learning

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Intro to Neural Networks and Deep Learning

Part 8: Neural Networks

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Introduction To Artificial Neural Networks

Lecture 6. Notes on Linear Algebra. Perceptron

Artificial Neural Networks. Historical description

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

CAP 5510 Lecture 3 Protein Structures

Multilayer Perceptrons and Backpropagation

Multi-layer Neural Networks

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n)

CHAPTER 3. Pattern Association. Neural Networks

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Predictors (of secondary structure) based on Machine Learning tools

Artificial Neural Network : Training

Numerical Learning Algorithms

Transcription:

Improved Protein Secondary Structure Prediction

Secondary Structure Prediction! Given a protein sequence a 1 a 2 a N, secondary structure prediction aims at defining the state of each amino acid ai as being either H (helix), E (extended=strand), or O (other) (Some methods have 4 states: H, E, T for turns, and O for other).! The quality of secondary structure prediction is measured with a 3-state accuracy score, or Q 3. Q 3 is the percent of residues that match reality (X-ray structure).

Limitations of Q 3 ALHEASGPSVILFGSDVTVPPASNAEQAK hhhhhooooeeeeoooeeeooooohhhhh Amino acid sequence Actual Secondary Structure ohhhooooeeeeoooooeeeooohhhhhh (useful prediction) hhhhhoooohhhhooohhhooooohhhhh (terrible prediction) Q3=22/29=76% Q3=22/29=76% Q3 for random prediction is 33% Secondary structure assignment in real proteins is uncertain to about 10%; Therefore, a perfect prediction would have Q3=90%.

Accuracy! Both Chou and Fasman and GOR have been assessed and their accuracy is estimated to be Q3=60-65%. (initially, higher scores were reported, but the experiments set to measure Q3 were flawed, as the test cases included proteins used to derive the propensities!)

Neural networks The most successful methods for predicting secondary structure are based on neural networks. The overall idea is that neural networks can be trained to recognize amino acid patterns in known secondary structure units, and to use these patterns to distinguish between the different types of secondary structure. Neural networks classify input vectors or examples into categories (2 or more). They are loosely based on biological neurons.

The perceptron X1 w1 X2 XN w2 wn N S = i= 1 X i W i T 1 0 S S > < T T Input Threshold Unit Output The perceptron classifies the input vector X into two categories. If the weights and threshold T are not known in advance, the perceptron must be trained. Ideally, the perceptron must be trained to return the correct answer on all training examples, and perform well on examples it has never seen. The training set must contain both type of data (i.e. with 1 and 0 output).

The perceptron Notes: - The input is a vector X and the weights can be stored in another vector W. - the perceptron computes the dot product S = X.W - the output F is a function of S: it is often set discrete (i.e. 1 or 0), in which case the function is the step function. For continuous output, often use a sigmoid: F( X ) 1 = 1 + e X 0 1/2 0 1 - Not all perceptrons can be trained! (famous example: XOR)

The perceptron Training a perceptron: Find the weights W that minimizes the error function: E = P ( i i F X W t X ) 2 (. ) ( ) i= 1 P: number of training data X i : training vectors F(W.X i ): output of the perceptron t(x i ) : target value for X i Use steepest descent: - compute gradient: - update weight vector: - iterate E E E E E =,,,..., w1 w2 w3 W new = W ε E old (e: learning rate) w N

Neural Network A complete neural network is a set of perceptrons interconnected such that the outputs of some units becomes the inputs of other units. Many topologies are possible! Neural networks are trained just like perceptron, by minimizing an error function: E = Ndata ( ) NN X i t X i 2 ( ) ( ) i= 1

Neural networks and Secondary Structure prediction Experience from Chou and Fasman and GOR has shown that:! In predicting the conformation of a residue, it is important to consider a window around it.! Helices and strands occur in stretches! It is important to consider multiple sequences

PHD: Secondary structure prediction using NN

PHD: Input For each residue, consider a window of size 13: 13x20=260 values

PHD: Network 1 Sequence Structure 13x20 values 3 values Network1 Pα(i) Pβ(i) Pc(i)

3 values PHD: Network 2 Structure For each residue, consider a window of size 17: 17x3=51 values Structure 3 values Network2 Pα(i) Pβ(i) Pc(i) Pα(i) Pβ(i) Pc(i)

PHD! Sequence-Structure network: for each amino acid aj, a window of 13 residues aj-6 aj aj+6 is considered. The corresponding rows of the sequence profile are fed into the neural network, and the output is 3 probabilities for aj: P(aj,alpha), P(aj, beta) and P(aj,other)! Structure-Structure network: For each aj, PHD considers now a window of 17 residues; the probabilities P(ak,alpha), P(ak,beta) and P(ak,other) for k in [j-8,j+8] are fed into the second layer neural network, which again produces probabilities that residue aj is in each of the 3 possible conformation! Jury system: PHD has trained several neural networks with different training sets; all neural networks are applied to the test sequence, and results are averaged! Prediction: For each position, the secondary structure with the highest average score is output as the prediction

PSIPRED Jones. Protein secondary structure prediction based on position specific scoring matrices. J. Mol. Biol. 292: 195-202 (1999) Convert to [0-1] Using: 1 1+ e x Add one value per row to indicate if Nter of Cter

Performances (monitored at CASP) CASP YEAR # of Targets <Q3> CASP1 1994 6 63 CASP2 1996 24 70 Group Rost and Sander Rost CASP3 1998 18 75 Jones CASP4 2000 28 80 Jones

Secondary Structure Prediction - Available servers: - JPRED : http://www.compbio.dundee.ac.uk/~www-jpred/ - PHD: http://cubic.bioc.columbia.edu/predictprotein/ - PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/ - NNPREDICT: http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html - Chou and Fassman: http://fasta.bioch.virginia.edu/fasta_www/chofas.htm - Interesting paper: - Rost and Eyrich. EVA: Large-scale analysis of secondary structure prediction. Proteins 5:192-199 (2001)