Protein Secondary Structure Prediction using Pattern Recognition Neural Network

Similar documents
Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

IT og Sundhed 2010/11

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Basics of protein structure

Physiochemical Properties of Residues

Protein Secondary Structure Prediction

Protein Structure Prediction and Display

Predicting Protein Structural Features With Artificial Neural Networks

CAP 5510 Lecture 3 Protein Structures

Protein Secondary Structure Prediction using Feed-Forward Neural Network

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Protein Secondary Structure Prediction

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Protein Structures: Experiments and Modeling. Patrice Koehl

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

SUPPLEMENTARY MATERIALS

An Artificial Neural Network Classifier for the Prediction of Protein Structural Classes

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Bioinformatics: Secondary Structure Prediction

Improved Protein Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction

PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Orientational degeneracy in the presence of one alignment tensor.

Optimization of the Sliding Window Size for Protein Structure Prediction

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure Prediction Using Neural Networks

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

Protein 8-class Secondary Structure Prediction Using Conditional Neural Fields

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Bioinformatics. Macromolecular structure

Protein Structure: Data Bases and Classification Ingo Ruczinski

Supersecondary Structures (structural motifs)

Sequence analysis and comparison

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

Motif Prediction in Amino Acid Interaction Networks

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Analysis and Prediction of Protein Structure (I)

addresses: b Department of Mathematics and Statistics, G.N. Khalsa College, University of Mumbai, India. a.

ALL LECTURES IN SB Introduction

The Relative Importance of Input Encoding and Learning Methodology on Protein Secondary Structure Prediction

Improving Protein 3D Structure Prediction Accuracy using Dense Regions Areas of Secondary Structures in the Contact Map

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Improving Protein Secondary-Structure Prediction by Predicting Ends of Secondary-Structure Segments

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Computational Biology: Basics & Interesting Problems

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Introduction to" Protein Structure

Automated Assignment of Backbone NMR Data using Artificial Intelligence

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Protein structure alignments

Prediction of protein secondary structure by mining structural fragment database

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Getting To Know Your Protein

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Bayesian Models and Algorithms for Protein Beta-Sheet Prediction

BIOCHEMISTRY Course Outline (Fall, 2011)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Artifical Neural Networks

BIRKBECK COLLEGE (University of London)

DATE A DAtabase of TIM Barrel Enzymes

Packing of Secondary Structures

Accelerating Biomolecular Nuclear Magnetic Resonance Assignment with A*

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I

Protein Secondary Structure Assignment and Prediction

Protein quality assessment

Lecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction.

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Proteins: Structure & Function. Ulf Leser

From Amino Acids to Proteins - in 4 Easy Steps

RNA and Protein Structure Prediction

Protein Structure Prediction

Protein Secondary Structure Prediction using Logical Analysis of Data.

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Using Knowledge-Based Neural Networks to Improve Algorithms: Refining the Chou-Fasman Algorithm for Protein Folding

7 Protein secondary structure

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

BSc and MSc Degree Examinations

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

8 Protein secondary structure

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Transcription:

Protein Secondary Structure Prediction using Pattern Recognition Neural Network P.V. Nageswara Rao 1 (nagesh@gitam.edu), T. Uma Devi 1, DSVGK Kaladhar 1, G.R. Sridhar 2, Allam Appa Rao 3 1 GITAM University, 2 Endocrine and Diabetes Centre, Visakhapatnam, 3 JNTUK, Kakinada, India ABSTRACT Proteins are key biological molecules with diverse functions. With newer technologies producing more data (genomics, proteomics) than can be annotated manually, in silico methods of predicting their structure and thereafter their function has been christened the Holy Grail of structural bioinformatics. Successful secondary structure prediction provides a starting point for direct tertiary structure modeling; in addition it improves sequence analysis and sequence-structure binding for structure and function determination. Using machine learning and data mining process, we developed a pattern recognition technique based on statistical for predicting protein secondary structure from the component amino acid sequence. By applying this technique, a performance score of Q 8 =72.3% was achieved. This compares well with other established techniques, such as NN-I and GOR IV which achieved Q 3 scores of 64.05% and 63.19% respectively when predictions are made on single sequence alone. Key words: Secondary Structure, Pattern Recognition, Neural Network. 1. INTRODUCTION The prediction of protein structure from amino acid sequence has become the target of scientists since Anfinsen(1973) 1, who showed that the information necessary for protein folding resides completely within the primary structure. The emergence of rapid methods of DNA sequencing and the translation of the genetic code into protein sequences has boosted the need for automated methods of interpreting these linear sequences into threedimensional structure 2. Although the development of advanced molecular biology laboratory techniques reduced the amount of time necessary to determine a protein structure by X-ray crystallography, a crystal structure determination may still require many months. NMR techniques helped in determining protein structure, but NMR is also costly, time-consuming, requires large amounts of protein of high solubility and is severely limited by protein size 2. The conclusion is that current experimental methods of determining protein structure will not meet the requirements of the present and future needs for protein structure determination. 2. RELATED WORKS There are two main different approaches in determining protein structure theoretically: a molecular mechanics approach based on the assumption that a correctly folded protein occupies a minimum energy conformation, most likely a conformation near the global minimum of free energy. Potential energy is obtained by summing the terms due to bonded and non-bonded components estimated from these force field parameters and then can be minimized as a function of atomic coordinates in order to reach the nearest local minimum 3,4. This approach is very sensitive to the protein conformation of the molecules at the beginning of the simulation. One way to address this problem is to use molecular dynamics to simulate the way the molecule would move away from that initial state. Newton s laws and Monte Carlo methods were used to reach to a global energy minima. The approach of molecular mechanics is faced by problems of inaccurate force field parameters and spectrum of multiple minima 2. The second approach of predicting protein structures from sequence alone is based on the data sets of known protein structures and sequences. This approach attempts to find common features in these data sets which can be generalized to provide structural models of other proteins. Many statistical methods used the different frequencies of amino acid types: helices, strands, and loops in sequences to predict their location 5-10. The main idea is that a segment or motif of a target protein that has a sequence similar to a segment or motif with known structure is assumed to have the same structure. ISSN: 0975-5462 1752

Protein secondary structure prediction means the prediction of the formation of regular local structures such as α helices, β strands, coils, etc. Solving the protein folding problem will pave the way to rapid progress in the fields of protein engineering and drug design. As the number of protein sequences is growing much faster than our ability to solve their structures experimentally in the molecular biology laboratories, in silico prediction methods will narrow the gap between available sequences and structures. Previous research showed that it is promising to derive general rules for predicting protein structure from existing data and then applying them to unknown structures. Several methods have utilized this approach 5,11-14. Many statistically based methods use the different frequencies of amino acid types in sequences to predict their location in the secondary structure conformations: helices, strands, and coils 5-10. The basic idea is that a segment or motif of a target protein that has a sequence similar to a segment or motif with known structure is assumed to have the same structure. Unfortunately, for many proteins there is not enough homology to any protein sequence or of known structure to allow application of this technique. The GOR method was first proposed by 15 and named after its authors Garnier-Osguthorpe-Robson. The GOR method attempts to include information about a slightly longer segment of the polypeptide chain. Instead of considering tendency for a single residue, position-dependent tendencies have been calculated for all residue types. Thus the prediction will therefore be influenced not only by the actual residue at that position, but also to some extent by other neighbouring residues 16. The propensity stables to some extent reflect the fact that positively charged residues are more often found in the C-terminal end of helices and that negatively charged residues are found in the N-terminal end. 3. PROPOSED METHOD The dssp database (http://swift.cmbi.kun.nl /gv/dssp/) is an archive of protein sequence with its secondary structure. Each file describes the primary structure of the protein and secondary structure of each amino acid in a columnar fashion. A set of 625 non redundant proteins with more than 25% sequence similarity were extracted. A sniffer is written to extract the sequence and its secondary structure from the.dssp file. A sample.dssp file is presented in Fig.1. ==== Secondary Structure Definition by the program DSSP, updated CMBI version by ElmK / April 1,2000 ==== DATE=7 OCT 2009 REFERENCE W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577 2637 HEADER HYDROLASE 30 MAR 09 3GUP. COMPND 2 MOLECULE: LYSOZYME;. SOURCE 2 ORGANISM_SCIENTIFIC: ENTEROBACTERIA PHAGE T4;. AUTHOR L.LIU,B.W.MATTHEWS. 324 2 2 2 0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN). 16027.0 ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2). 234 72.2 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(J), SAME NUMBER PER 100 RESIDUES. 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS IN PARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES. 24 7.4 TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I 5), SAME NUMBER PER 100 RESIDUES. 4 1.2 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I 4), SAME NUMBER PER 100 RESIDUES. 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I 3), SAME NUMBER PER 100 RESIDUES. 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I 2), SAME NUMBER PER 100 RESIDUES. 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I 1), SAME NUMBER PER 100 RESIDUES. 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I+0), SAME NUMBER PER 100 RESIDUES 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I+1), SAME NUMBER PER 100 RESIDUES 14 4.3 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I+2), SAME NUMBER PER 100 RESIDUES 27 8.3 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I+3), SAME NUMBER PER 100 RESIDUES 163 50.3 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I+4), SAME NUMBER PER 100 RESIDUES 6 1.9 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I) >H N(I+5), SAME NUMBER PER 100 RESIDUES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 *** HISTOGRAMS OF ***. 0 0 0 0 3 1 2 4 2 0 0 2 2 2 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 RESIDUES PER ALPHA HELIX. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PARALLEL BRIDGES PER LADDER. 2 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ANTIPARALLEL BRIDGES PER LADDER. 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 LADDERS PER SHEET. # RESIDUE AA STRUCTURE BP1 BP2 ACC N H >O O >H N N H >O O >H N TCO KAPPA ALPHA PHI PSI X CA Y CA Z CA 1 1 A M 0 0 56 0, 0.0 2, 0.3 0, 0.0 234, 0.1 0.000 360.0 360.0 360.0 147.0 1.6 15.1 50.8 2 2 A N > 0 0 16 156, 0.0 4, 2.7 232, 0.0 5, 0.2 0.934 360.0 80.4 162.6 176.4 4.0 14.6 53.7 3 3 A I H > S+ 0 0 10 2, 0.3 4, 2.6 1, 0.2 5, 0.2 0.824 125.8 52.5 63.1 31.9 4.1 12.9 57.1 4 4 A F H > S+ 0 0 16 2, 0.2 4, 2.4 1, 0.2 1, 0.2 0.947 112.6 43.1 68.9 48.3 2.2 15.7 58.8 ISSN: 0975-5462 1753

5 5 A E H > S+ 0 0 76 2, 0.2 4, 2.0 1, 0.2 2, 0.2 0.901 113.4 54.2 61.2 38.7 0.6 15.7 56.4 6 6 A M H X S+ 0 0 0 4, 2.7 4, 2.1 2, 0.2 2, 0.2 0.944 112.1 41.6 58.9 54.3 0.7 11.9 56.5 7 7 A L H X>S+ 0 0 0 4, 2.6 4, 3.0 2, 0.2 5, 0.7 0.877 109.0 59.4 68.6 32.9 1.1 11.7 60.2 8 8 A R H X5S+ 0 0 95 4, 2.4 4, 1.3 5, 0.2 1, 0.2 0.926 109.4 45.4 60.2 37.5 3.5 14.6 60.2 9 9 A I H <5S+ 0 0 81 4, 2.0 2, 0.2 5, 0.2 1, 0.2 0.926 118.8 40.8 68.4 42.7 5.7 12.5 57.9 10 10 A D H <5S+ 0 0 7 4, 2.1 2, 0.2 1, 0.2 3, 0.2 0.880 129.9 24.9 77.2 38.5 5.3 9.3 60.0 11 11 A E H <5S 0 0 16 4, 3.0 19, 0.4 5, 0.2 3, 0.2 0.704 92.8 159.1 96.5 28.8 5.6 10.7 63.5 12 12 A G << 0 0 27 4, 1.3 2, 0.4 5, 0.7 18, 0.2 0.102 26.9 79.8 64.9 171.0 7.6 13.8 62.8 13 13 A L + 0 0 79 16, 0.2 2, 0.4 4, 0.1 16, 0.2 0.991 46.7 169.5 131.8 122.6 7.6 16.8 65.2 14 14 A R E A 28 0A 165 14, 1.7 14, 2.0 2, 0.4 4, 0.1 0.986 23.6 154.0 131.7 143.5 9.7 17.1 68.4 15 15 A L E S+ 0 0 69 2, 0.4 43, 1.7 12, 0.2 2, 0.3 0.516 73.8 56.8 104.9 3.3 9.4 19.7 71.1 16 16 A K E S C 57 0B 149 41, 0.2 41, 0.2 12, 0.1 10, 0.1 0.923 97.3 79.2 126.8 156.7 10.8 17.8 74.1 17 17 A I E + 0 0 24 39, 1.0 2, 0.3 2, 0.3 10, 0.2 0.180 55.6 165.7 49.4 127.5 9.8 14.5 75.8 18 18 A Y E A 26 0A 37 8, 3.2 8, 2.4 6, 0.1 2, 0.5 0.943 40.3 105.0 134.8 163.1 11.0 11.3 74.0 19 19 A K E A 25 0A 132 2, 0.3 6, 0.2 6, 0.2 2, 0.0 0.790 37.8 141.7 84.4 133.3 10.3 7.6 74.1 20 20 A D > 0 0 26 4, 2.4 3, 0.9 2, 0.5 1, 0.0 0.142 36.6 86.2 78.8 177.7 8.3 6.5 71.1 21 21 A a T 3 S+ 0 0 32 1, 0.2 1, 0.1 2, 0.1 2, 0.0 0.640 132.0 54.9 66.9 14.9 8.9 3.2 69.3 22 22 A E T 3 S 0 0 44 2, 0.2 1, 0.2 120, 0.1 120, 0.0 0.655 120.3 111.0 86.6 18.3 6.6 1.5 71.8 23 23 A G S < S+ 0 0 37 3, 0.9 2, 0.3 1, 0.3 2, 0.1 0.597 74.1 129.5 98.1 17.7 8.7 2.8 74.7 24 24 A Y 0 0 69 1, 0.0 4, 2.4 9, 0.0 1, 0.3 0.792 68.6 94.1 106.6 149.2 6.2 5.3 76.1 25 25 A Y E +AB 19 34A 36 9, 0.6 8, 3.0 11, 0.4 9, 1.1 0.435 56.8 165.6 66.0 130.9 6.8 9.0 76.9 26 26 A T E +AB 18 32A 2 8, 2.4 8, 3.2 6, 0.3 2, 0.2 0.877 15.6 179.2 149.3 150.5 5.7 11.1 73.9 27 27 A I E > + B 0 31A 0 4, 1.5 4, 2.2 2, 0.3 12, 0.2 0.937 51.9 12.4 150.0 174.7 6.1 14.8 72.5 28 28 A G E 4 S A 14 0A 1 14, 2.0 14, 1.7 2, 0.2 2, 1.0 0.393 125.7 8.7 55.8 129.7 5.1 17.0 69.5 29 29 A I T 4 S 0 0 6 34, 0.3 1, 0.2 16, 0.2 16, 0.2 0.732 129.1 50.3 99.7 79.3 3.6 15.0 66.7 30 30 A G T 4 S+ 0 0 11 2, 1.0 2, 0.6 19, 0.4 2, 0.2 0.682 83.0 167.8 71.4 20.4 3.3 11.6 68.2 Fig.1. The dssp file showing the primary structure and secondary structure of a protein (shown up to 30 residues only). Methodology: To predict the secondary structure of a protein, a Pattern Recognition Neural Network is designed. The neural network is defined with one input layer, one hidden layer and one output layer. The protein sequence is represented as a sliding window of size W(changing from 15 to 29) and the prediction is made on the structural state of the central residue of the window. Thus a protein segment of windows size W is represented as a 20 x W. Thus the input layer R consists of 20xW input units, i.e., W groups of 20 inputs each for each window. All the proteins that are used to train the neural network are encoded and are stored in vector. Each target is also represented as a boolean array of size 8, which represents one of the secondary structural state of the amino acid at that position in the protein sequence. The secondary structural states defined according to dssp are H,I,G,E,B,T,S and C. Thus H is represented as 10000000, I is represented as 01000000 and finally C is represented as 00000001. Thus the output layer of the neural network consists of eight units, one for each of the considered structural states(or classes). The target matrix is also prepared. The size of the hidden layer is taken as 2xW+1. The pattern recognition network is trained with the Scaled Conjugate Gradient algorithm. At each training cycle, the training sequences are presented to the network through the sliding window defined above, one residue at a time. Each hidden unit transforms the signals received from the input layer by using a transfer function log sigmoid to produce an output signal that is between and close to either 0 or 1. Weights are adjusted so that the error between the observed output from each unit and the desired output specified by the target matrix is minimized. One of the common problem data overfitting, while training the neural network, is eliminated by dividing the data into three subsets: (i) the training set, which is used for computing the gradient and updating the network weights and biases; (ii) the validation set, whose error is monitored during the training process because it tends to increase when data is overfitted; and (iii) the test set(not seen earlier by the neural network), whose error can be used to assess the quality of the division of the data set. The training process stopped automatically when any one of the several conditions like epochs, goal, validation errors is met. ISSN: 0975-5462 1754

4. RESULTS AND DISCUSSION P.V. Nageswara Rao et. al. / International Journal of Engineering Science and Technology To analyze the network response, confusion matrix is computed by considering the outputs of the trained network and comparing with the expected results(targets), shown in Fig. 2. Fig. 2. Confusion Matrix showing the performance of the classifier. The diagonal cells show the number of residue positions that were correctly classified for each structural class. The off-diagonal cells show the number of residue positions that were misclassified (e.g. helical predicted as coil). The rightmost cell in the last row shows the total percentage of correctly predicted residues (upper number) and the total percentage of incorrectly predicted residues (lower number). By applying this technique, a performance score of Q 8 =72.3% is achieved. This compares well with state of art techniques, such as NN-I and GOR IV which achieved Q 3 scores of 64.05% and 63.19% respectively when predictions are made on single sequence alone. The Receiver Operating Characteristic (ROC) curve, a plot of the true positive rate (sensitivity) versus the false positive rate (1 - specificity) is also drawn and shown in Fig.3. ISSN: 0975-5462 1755

5. CONCLUSION Fig.3. ROC Curve showing the performance of the classifier The prediction accuracy can be improved by: Increasing the number of training vectors, with appropriate distribution of all the classes. Increasing the window size or adding more relevant information, such as biochemical properties of the amino acids. Increase the number of hidden layers and neurons. ACKNOWLEDGEMENTS The authors would like to thank Acharya Nagarjuna University and GITAM University for providing computational facility and access to e-journals to carry out this research. REFERENCES 1. Anfinsen, C.B. (1973). Principles That Govern The Folding Of Protein Chains. Science. 181:223-230. 2 Stephen, R. Holbrook, Steven, M., Muskal and Sung-Hou Kim. (1990). Predicting Protein Structural Features With Artificial Neural Networks. Artificial Intelligence and Molecular Biology. 3 Weiner, P.K. and Kollman, P.A. (1981). AMBER: Assisted Model Building With Energy Refinement. A General Program For Modeling Molecular and Their Interactions. Journal of Computational Chemistry. 2:287:303. 4 Weiner, S.J., Kollman, P.A., Case, D.A., Singh, U.C., Chio, C., Alagona, G., Profeta, S. and Weiner, P.K.(1984). A New Force Field For Molecular Mechanical Simulation Of Nucleic Acids and Proteins. Journal of American Chemical Societies. 106:765-784. 5 Chou, P.Y. and Fasman, G.D. (1974). Prediction of Protein Conformation. Biochemistry. 13:222-245. 6 Garnier, J., Osguthorpe, D.J. and Robson, B. (1978). Analysis Of The Accuracy and Implications Of simple Methods For Predicting The Secondary Structure Of Globular Proteins. Journal of Molecular Biology. 120:97-120. 7 Lim, V.I. (1974). Algorithms For the Prediction of Alpha-Helical and Beta-Structural Regions in Globular Proteins. Journal of Molecular Biology. 88:873-894. 8 Blundel, T., sibanda, B.L. and Pearl, L. (1983). Three-Dimensional Structure, Specificity and Catalytic Mechanism Of Renin. Nature. 304:273-275. ISSN: 0975-5462 1756

9 Greer, J. (1981). Comparative Model-Building Of The Mammalian Serine Proteases. Journal of Molecular Biology. 153:1027-1042. 10 Warme, P.K., Momany, F.A., Rumball, S.V., Tuttle, R.W. and Scheraga, H.A. (1974). Computation Of Structures Of Homologous Proteins. Alpha-Lactalbumin From Lysozyme. Biochemistry. 13:768-782. 11 Richardson, J.S.(1981). The Anatomy and Taxonomy of Protein Structures. Advances in Protein Chemistry. 34:168-339. 12 Kringbaum, W.R., and Knutton, S.P. (1973). Prediction of The amount Of Secondary Structure in A Globular Protein From Its Amino acid Composition. Proceedings of the National Academy of Science. USA. 70(10):2809-2813. 13 Qian, N. and Sejnowski, T.J. (1988). Predicting The Secondary Structure Of Globular Proteins Using Neural Network Models. Journal of Molecular Biology. 202(4):865-884. 14 Crik, F. (1989). The Recent Excitement About Neural Networks. Nature. 337:129-132. 15 Garnier, J. Robson, B. (1989). The GOR Method For Predicting Secondary Structure in Proteins. Prediction Of Protein Structure and The Principles Of Protein Conformation. New York: Plenum Press. 417-465. 16 Garnier, J. and Robson, B.(1989). The GOR Method For Predicting Secondary Structures in Proteins. Prediction of Protein Structure and The Principles of Protein Conformation. New York:Plenum Press. 417-465. ISSN: 0975-5462 1757