Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

Similar documents
Basics of protein structure

CAP 5510 Lecture 3 Protein Structures

Artificial Neural Network Method of Rock Mass Blastability Classification

Neural Networks and the Back-propagation Algorithm

ANN and Statistical Theory Based Forecasting and Analysis of Power System Variables

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

From Amino Acids to Proteins - in 4 Easy Steps

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Artificial Neural Network

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

What is the central dogma of biology?

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Ch 3: Chemistry of Life. Chemistry Water Macromolecules Enzymes

Improved Protein Secondary Structure Prediction

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

BIOINFORMATICS: An Introduction

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

Unit 1: Chemistry - Guided Notes

Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics

Motif Prediction in Amino Acid Interaction Networks

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Secondary Structure Prediction

Human Biology. The Chemistry of Living Things. Concepts and Current Issues. All Matter Consists of Elements Made of Atoms

Artifical Neural Networks

An Artificial Neural Network Classifier for the Prediction of Protein Structural Classes

Protein Secondary Structure Prediction

Protein Secondary Structure Prediction using Pattern Recognition Neural Network

The Structure and Functions of Proteins

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Radial Basis Function Neural Networks in Protein Sequence Classification ABSTRACT

Lecture 7 Artificial neural networks: Supervised learning

Protein Secondary Structure Prediction using Feed-Forward Neural Network

Protein Structure Prediction Using Neural Networks

BIOCHEMISTRY Unit 2 Part 4 ACTIVITY #6 (Chapter 5) PROTEINS

Computational Biology: Basics & Interesting Problems

Supersecondary Structures (structural motifs)

Introduction to" Protein Structure

Using Neural Networks for Identification and Control of Systems

Protein Structure Basics

Biophysics II. Key points to be covered. Molecule and chemical bonding. Life: based on materials. Molecule and chemical bonding

Convolutional Associative Memory: FIR Filter Model of Synapse

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Design Collocation Neural Network to Solve Singular Perturbed Problems with Initial Conditions

Protein Structure Prediction and Display

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Analysis and Prediction of Protein Structure (I)

Proteins. Division Ave. High School Ms. Foglia AP Biology. Proteins. Proteins. Multipurpose molecules

Part 7 Bonds and Structural Supports

Bonds and Structural Supports

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Biomolecules. Energetics in biology. Biomolecules inside the cell

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Figure ) Letter E represents a nucleic acid building block known as a. Answer: nucleotide Diff: 3 Page Ref: 54

PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES

Properties of amino acids in proteins

Learning and Memory in Neural Networks

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

Computational Biology 1

A New Similarity Measure among Protein Sequences

Bi 8 Midterm Review. TAs: Sarah Cohen, Doo Young Lee, Erin Isaza, and Courtney Chen

Biological Macromolecules

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ALL LECTURES IN SB Introduction

COMP-4360 Machine Learning Neural Networks

Automated Assignment of Backbone NMR Data using Artificial Intelligence

Bioinformatics: Secondary Structure Prediction

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Biomolecules: lecture 10

A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

BIRKBECK COLLEGE (University of London)

4. Multilayer Perceptrons

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

RNA and Protein Structure Prediction

BA, BSc, and MSc Degree Examinations

BME Engineering Molecular Cell Biology. Structure and Dynamics of Cellular Molecules. Basics of Cell Biology Literature Reading

Enzyme Catalysis & Biotechnology

Model Mélange. Physical Models of Peptides and Proteins

Bioinformatics: Secondary Structure Prediction

SUPPLEMENTARY MATERIALS

A Novel Activity Detection Method

1/23/2012. Atoms. Atoms Atoms - Electron Shells. Chapter 2 Outline. Planetary Models of Elements Chemical Bonds

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

An ANN based Rotor Flux Estimator for Vector Controlled Induction Motor Drive

Artificial Neural Network : Training

Rainfall Prediction using Back-Propagation Feed Forward Network

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Transcription:

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting locally defined protein structures from the sequence of amino acids. It is a challenging and elucidating part of the field of bioinformatics. Several methods are attempting to meet these challenges. But the Artificial Neural Network (ANN) technique is turning out to be the most successful. In this work, an ANN based multi level classifier is designed for predicting secondary structure of the proteins. In this method ANNs are trained to make them capable of recognizing amino acids in a sequence following which from these amino acids secondary structures are derived. Then based on the majority of the secondary structure final structure is derived. This work shows the prediction of secondary structure of proteins employing ANNs though it is restricted initially to four structures only. Keywords: Artificial Neural network, Protein Structure Prediction. 1 Introduction Protein Secondary Structure Prediction (PSSP) is the most challenging and influencing area of research in the field of bioinformatics. It deals with the prediction and analysis of macromolecules i.e. DNA, RNA and protein. Proteins are the fundamental molecules of all organisms. They are unique chains of amino acids. They adopt unique three dimensional structures which allow them to carry out intricate biological functions. The specifications of each protein are given by Hemashree Bordoloi Deptt. of Electronics &Communication Technology Gauhati University, Guwahati e mail: hbmaini@gmail.com Kandarpa Kumar Sarma Deptt. of Electronics &Communication Technology Gauhati University, Guwahati e mail: kandarpaks@gmail.com S. Patnaik & Y.-M. Yang (Eds.): Soft Computing Techniques in Vision Sci., SCI 395, pp. 137 146. springerlink.com Springer-Verlag Berlin Heidelberg 2012

138 H. Bordoloi and K.K. Sarma the sequence of amino acids [1]. Proteins are the building blocks of all biological organisms. The basis of proteins is amino acids. There are 20 different amino acids which are made up of carbon, hydrogen, nitrogen, oxygen and sulphur. Basically proteins have four different structures Primary: Sequence of amino acids is the primary structure. Secondary: Locally defined highly regular sub structures. Alpha helix, beta sheets and coil are the three secondary structures of proteins. Tertiary: Three dimensional structure or spatial arrangement of secondary structure. Quaternary: Complex of several protein molecules or polypeptide chains. Fig. 1 Four protein Structures The general approach to predict the secondary structure of a protein is done by comparing the amino acid sequence of a particular protein to sequences of the known databases. In protein secondary structure prediction, amino acid sequences are inputs and the resulting output is conformation or the predicted structure which is the combination of alpha helices, beta sheets and loops. In this work an ANN is trained with Amino Acids to predict the protein sequence. After the prediction of the protein sequence, a second ANN classifier is configured to determine the secondary structures.

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier 139 2 Basic Biological Concept Notion of homology is the basis of bioinformatics. In genomic bioinformatics function of a gene is predicted by the homology technique. It follows the rule that if the sequence of gene A, whose function is known, is homologous to the sequence of gene B, whose function is unknown, one could infer that B may share A's function. In the structural bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins. In homology modeling technique, this information is used to predict the structure of a protein once the structure of a homologous protein is known. Presently it remains the only way to predict protein structures reliably. By determining the structures of viral proteins it would enable researchers design drugs for specific viruses [2]. The secondary structure has 3 regular forms: helical (alpha (α) helices), extended (beta (β) sheets) and loops (also called reverse turns or coils). In the protein secondary structure prediction, the inputs are the amino acid sequences while the output is the predicted structure also called conformation, which is the combination of alpha helices, beta sheets and loops [2]. A typical protein sequence and its conformation class are shown below: ProteinSequence: ABABABABCCQQFFFAAAQQAQQA Conformation Class: HHHH EEEE HHHHHHHH where H means Helical, E means Extended, and blanks are the remaining coiled conformations. A typical protein contains about 32% alpha helices, 21% beta sheets and 47% loops or non-regular structure [2]. With a given protein sequence of amino acids a 1, a 2,...,a n the problem of secondary structure prediction is to predict whether each amino acid a i is in a α- helix, a β-sheet or neither. If the actual secondary structure for each amino acid is known by the studies of structural biology then the three state accuracy is the percent of residues for which the prediction matches reality. It is refers to as three state because each residue can be in one of three states: α, β or other (0) [3].Three state accuracy is given by Q 3 = [(P α +P β +P coil )/T] 100% where P α, P β, and P loop are number of residues predicted correctly in state alpha helix, beta strand and loop respectively and T is the total number of residues [2]. 3 Necessities of PSSP PSSP becomes the most influencing area of research due to the following: The basis of an organism that DNA, RNA and proteins, also called macromolecules can be predicted and analyzed by PSSP. Structure function relationship can also be provided by PSSP. It means that a particular protein structure is responsible for particular function which is

140 H. Bordoloi and K.K. Sarma to be known by PSSP. So by changing the structure of the proteins or by synthesizing new proteins, functions could be added or removed or desired functions could be obtained [5]. PSSP can determine the structure of the viral proteins which leads to the design of drugs for specific viruses. PSSP reduces the sequence structure gap [4]. One of the best example of the sequence structure gap is the large scale sequencing projects such as Human Genome Project. In such type of projects, protein sequences are produced at a very fast speed which results in a large gap between the number of known protein sequences (>150,000) and the no. of known protein structures (>4,000). This gap is called sequence structure gap and PSSP can successfully reduce this gap. Experimental techniques are not capable of structure determination of some proteins such as membrane proteins. So the prediction of protein structure using computational tool is of great interest [6]. 4 Basics of Artificial Neural Network An Artificial Neural Network (ANN) is a massively parallel distributed processor that has a natural propensity for storing experimental knowledge and making it available for use. They are made up of simple processing units called artificial neurons. It resembles the brain in two respects-- Knowledge is acquired by the network from its environment through a learning process. Interneuron connection strengths known as synaptic weights, are used to store the acquired knowledge[4]. Fig. 2 An Artificial Neural Network 5 Methodology This work consists of several steps as described below:

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier 141 1. Data Set: In our work we have considered four proteins that are Hemoglobin, Sickle Cell Anemia and Myoglobin and Insulin. We have considered these four proteins because though their function is different, these three proteins are closely related to each other. 2. Coding of Proteins: Based on the chemical structure of the amino acids a coding scheme is generated. BCD codes are used for coding. There is a unique BCD code for each component or symbol in the chemical structure. Considering 20 amino acids, each amino acid is coded using the generated coding scheme. Then the four proteins i.e. Hemoglobin, Sickle Cell Anemia, Myoglobin and Insulin are coded with the help of coded amino acids. 3. Configuration of ANN: A fully connected Multi Layer Perceptron (MLP) feedforward neural network is used for our proposed work. Backpropagation algorithm is used to update the weight of the network. Network comprises of only one hidden layer. Three ANN classifiers are designed for our work. First network classifies the amino acids, the second network classifies the protein primary structure and the third network classifies the secondary structure of proteins. Table 1 Parameters of Proposed ANN ANN Data Set Size Training Type Maximum No.of Epochs Variance in Training Data MLP Training:1000 Testing:985 TRAINLM 2500 50% 4. Training and Testing of ANN: The network is trained with four coded protein structures. Then testing is done with the coded data to obtain the results. 6 System Model The system model as shown in Figure 3 comprises of three classifiers. The system is formed by two level classifiers. The first level classifier uses amino acids as inputs and provides the identification of the Amino Acids. These are applied to the second level classifiers which contain two MLPs. These classifiers receive identified Amino Acids for predicting alpha, beta and coils.

142 H. Bordoloi and K.K. Sarma Fig. 3 System Model for proposed work EPOCH TIME MSE % RATE 1233 31 secs 10-3 94% 1566 53 secs 10-4 95% 1750 65 secs 10-5 98% 2500 80 secs 10-6 100% 7 Results With the four protein structures the ANN during training shows 100% accuracy while validating the learning phase with the same set. The ANN is configured to handle an input of length 985 holding coded values of the protein structure. Four such samples representing the classes where initially taken during training. The ANN successfully recognizes the required parameters. The training is carried out with (error) back propagation algorithm with Levenberg-Marquardt optimization. The ANN is given a performance goal of around 10-6 which is attained after certain number of session though the time taken is around 80 seconds. With variations of upto ±20% in the training sequence the ANN handles the recognition without any variation in performance. It shows its robustness to variations after it is trained properly. The results also validate the coding scheme used for the work. Some problems, however, were observed due to the larger data sets which the ANN classifier was given to handle. It generates certain computational constrains. This shall be removed in subsequent stages of the work and extend it for the prediction of some unknown protein structure with subclasses. Further we are trying to go for GUI approach of our work.

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier 143 Fig. 4 Performance graph for goal 0.01 Fig. 5 Performance graph for goal 0.001

144 H. Bordoloi and K.K. Sarma Fig. 6 Performance of Hemoglobin in percent Fig. 7 Performance of Insulin in percent Insulin

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier 145 Fig. 8 Performance of Myoglobin in percent Fig. 9 Performance of Sickel Cell Anemia in percent 8 Conclusions This work shows how multi level ANN classifier can be configured for PSSP. The work may be extended to include more protein structures which shall make it a reliable set up for research in bioinformatics.

146 H. Bordoloi and K.K. Sarma References [1] Xiu-fen, Z., Zi-shu, P., Lishan, K., Chu-yu, Z.: The Evolutionary computation Techniques for Protein Structure Prediction A Survey. WU411 Wuhan University Journal of Natural Sciences 8(1B), Article ID: 1007-1202(2003)01Pr0297-06 (2003) [2] Akkaladevi, S., Katangur, A.K., Belkasim, S., Pan, Y.: GA 30303 Protein Secondary Structure Prediction using Neural Network and Simulated Annealing Algorithm. In: Proceedings of the 26th Annual International Conference of the IEEE EMBS, San Francisco, CA, USA, September 1-5 (2004) [3] Ghoting, A.: Protein Secondary Structure Prediction using Neural Networks [4] Haykin, S.: Neural Networks A Comprehensive Foundation, 2nd edn. Pearson Education, New Delhi (2003) [5] Kim, H., Park, H.: Protein secondary structure prediction based on an improved support vector machine approach. Protein Eng. 16(8), 553 560 (2003) [6] Afzal, A.: Applications of neural networks in protein structure prediction