Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach
|
|
- Stephanie Lambert
- 6 years ago
- Views:
Transcription
1 Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology (CIT), Islamic University of Technology (IUT), Board bazar, Gazipur-1704, Dhaka, Bangladesh Abstract In recent years representatives (e.g. RS126, CB513) of the protein sequences from the protein data bank (e.g. RS126, CB513) are used to predict the secondary structure of the protein. This structure describes the 3D structure of the protein. 3D structure defines the proper functionality of the protein so that it can be used to discover new drugs more accurately. This paper proposes a new method for identifying representatives of protein sequence and SVM (Support Vector Machine) has been used that can be utilized for predicting secondary protein structure more accurately and exactly. This paper uses CATH database as protein data bank. It uses protein sequences divided into 1110 super-families. Proposed method identifies 1110 representatives from these super-families by using Partitioning algorithm. Keywords: Protein, Secondary Structure, Representative, CATH database, SVM (Support Vector Machine) 1 Introduction Prediction of secondary structure of protein is very important for discovering new drugs. Secondary structure of a protein represents the 3D structure of the protein that describes the characteristics of the protein. Based on the characteristics, proteins are used to discover new drugs. So it is important to know the secondary structure of the unknown protein. At present, there exists a severe information asymmetry between the researchers who are working on the sequencing of organism genomes and those in elucidating the 3D structure of biomolecules. On one hand, there are more and more genomes being sequenced, and on the other hand, protein structure information accumulation at Protein Data Bank (PDB) (Berman et al., 2002) is growing very slowly. Since the structural determination in PDB is heavily relied on the experimental methods such as x-ray crystallography and NMR. These methods are accurate but it is expensive and time consuming. It takes long time to determine the structure of one unknown protein sequence. So it is impossible to use this method for all the sequences in PDB. Because of this, protein structure prediction by homology modeling or computer simulating is therefore emerging as an alternative or complementary approach by using small domain from the total protein sequences of any protein database. There are proteins in the CATH v3.0.0 database. These are divided into Classes and Architectures. Architecture has topologies. Proteins those share similar secondary 3D structure are grouped in a single topology. The whole protein world is divided into such 1110 topologies. Basically these 1110 topologies represent the whole protein world and its easier to sample and work with them instead of the proteins. [1] In this research a method is proposed for finding a list of protein that will represent each of these topologies, namely representative of each topology. To find a representative from a topology first to identify the ideal characteristics present in that topology that represents all the proteins under that topology. Then using those ideal characteristics, a model need to be built that will match with all the proteins to find those ones that hold the most ideal characteristics to consider them as a candidate. From the candidate list Final representative is selected. Our algorithm does the whole process. 2 Existing Approaches The Rost & Sander dataset [2] which has been used to train several early secondary structure prediction methods like PHD [2] and PREDATOR [3] contains 126 proteins that share a pairwise sequence identity of below 25%. But, as shown by Cuff and Barton [4], it contains some clear homologues if homology is not only measured on a simple sequence identity basis. Based on their studies Cuff and Barton proposed a set of 513 protein chains in 1999 [4] (in the following referred to as CB513 set). This set contains the 117 non-homologous chains of the Rost and Sander (RS117) set as well as 396 additional protein chains (CB396) carefully selected to exclude homologues as far as possible. The structural resolution of all proteins in the set is better than or equal to 2.5 Å. In total the set contains amino acid residues, with a helix content of 34.5%, a sheet content of 22.7%, and a coil content of 42.8%. For our studies we will use the CB513 protein set to do an initial training and testing of our prediction method as well as to optimize several parameters. Therefore, the CB513 set is split into the CB396 set which will be used for training and the RS117 set which will be used for testing purposes. Since the proteins in the CB513 set do not provide a comprehensive sampling of the fold space known today, our final prediction method is trained on a larger set of proteins. This final set is based on the SCOP [5] (Structural Classification of Proteins) database. Similar to a method to compile a protein dataset described by Cuff and Barton [4] we retrieved one representative protein chain for each of the 1290 SCOP superfamilies in release 1.65 (December 2003) from the ASTRAL [6] database. From this list, all superfamilies
2 containing only members whose structure was resolved by NMR technique or with a resolution worse than 2.5 Å as well as superfamilies belonging to the classes for transmembrane and multi-domain proteins (classes E and F) were removed, resulting in a final dataset of 940 protein chains that are a representative subset of the fold space known today. The final database, named SCOP-SFR, contains 219 all-alpha, 202 allbeta, 190 α/β, 281 α+β and 48 small proteins with residues in total and a helix content of 36.79%, a sheet content of 22.78% and a coil content of 40.42%. A fair comparison of our approach with other methods is difficult, since for most methods it is not published which proteins were used to train the method in its current state. Therefore only a blind test comparison provides a fair testing setup when comparing our method against others. This need is supplied by the EVA server [7] which provides a set of proteins which were published recently together with blind test secondary structure predictions of those proteins made by important prediction servers like PSI-PRED and PHD. Since the SCOP version 1.65 has been released in December 2003, all proteins added to the EVA server after that time (01. January November 2004) are blind test examples for our prediction method which is trained only on proteins available in the SCOP 1.65 release. This result in a test set of 105 proteins (named EVA105), with 9837 residues in total, for which predictions of PSI-PRED, PHD and PROF-SEC are available. 3 Method and Data Preparation An authentic reference dataset is needed for any learning method and in this paper all the data are collected from CATH v3.0.0 database, where 86,151 domains are divided into 1110 topologies. The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank [1]. The database has four major levels of hierarchy or organization: Class, Architecture, Topology (fold family) and Homologous superfamily [8]. In order to generate training dataset for the SVM classifier, protein sequences need to be collected in such a manner where the sequence represents each topology. In standard method for predicting secondary structure, there are three major parts which are listed below, 1. Generate datasets or representatives of primary sequences from databank 2. Sequence to Structure layer including the detection of frequent amino acid patterns, the feature representation of those patterns, the selection of interesting features and how we use the features to predict the secondary structure of a protein with unknown structure. The sequence to structure prediction method developed in this thesis employs Support Vector Machines in a similar way as discussed by Liu et al. [15] and Ward et al. [16]. 3. Features and methods employed in Structure to Structure layer. 3.1 Representative Protein Extraction Identifying the protein sequences which hold or represent the characteristics of a given topology is the first step for the analysis. For this a model needs to be developed that will match with all the proteins to find the ones that hold the most ideal characteristics to be considered as a candidate. From the candidate list the final representatives are selected, where domains are divided into 1110 topologies. Each topology contains a certain number of proteins; let that be N. Dividing each protein sequence of length L into three equal or almost equal part in terms of length produces three segments Ψ 1, Ψ 2, Ψ 3. Where L1, L2 and L3 is the length of the segment Ψ 1, Ψ 2, Ψ 3 respectively and L=L1+L2+L3. Let R= LMOD3, then L1=L2=L3-R. Total Primary Protein Sequences in CATH database : Total Topologies in CATH database : 1110 So, the target is to find 1110 representatives from Primary Protein Sequences. Total Frequency Matrix Generation (For each Topology) 1. In each topology: Topology: Helicase, Ruva Protein; domain 3 Total Primary Protein Sequences in this Topology : 370 i. Divide each Primary Sequence into 3 equal length parts: Primary Protein Sequence : PAPTPSSSPVPTLSPEQQEMLQAFSTQSGMNLEWSQKC LQDNNWDYTRSAQAFTHLKAKGEIPEVAFMK Length : 69 3 Equal Length Part : 69/3=23 Another If the Length (for example 14) is not divisible by 3 then 14/3=4, and Remainder is 2. Part 1: 4 Part 2: 4 Part 3: 4+Remainder (2) =6. So, we get From: PAPTPSSSPVPTLSPEQQEMLQAFSTQSGMNLEWSQKC LQDNNWDYTRSAQAFTHLKAKGEIPEVAFMK (Length: 69) To: Part 1 (23) : PAPTPSSSPVPTLSPEQQEMLQA (Ψ1) Part 2 (23): FSTQSGMNLEWSQKCLQDNNWDY (Ψ2) Part 3 (23): TRSAQAFTHLKAKGEIPEVAFMK (Ψ3) ii. For each part of sequences count the number of appearance of each of 20 Amino Acids Part 1 : PAPTPSSSPVPTLSPEQQEMLQA Table 1 Number of Appearances of 20 Amino Acids: i. G: 0 xi. S: 4 ii. A: 2 xii. T: 2 iii. P: 6 xiii. C: 0 iv. V: 1 xiv. N: 0 v. L: 2 xv. Q: 3 vi. I: 0 xvi. K: 0 vii. M: 1 xvii. H: 0 viii. F: 0 xviii. R: 0 ix. Y: 0 xix. D: 0 x. W: 0 xx E: 2
3 Calculate this for other 2 Parts also. iii. The binary characteristics matrix B for a segment (e.g. Ψ 1, Ψ 2, Ψ 3 ) of a particular sequence has dimension i*j where j is the 20 amino acids and i is the number of occurrence of an amino acid in the specific segment. That means store Number of Appearance of all 20 Amino Acids into a Binary Matrix. Row (Number of Appearance), Column (20 Amino Acids) B can be defined as 1if jth amino acid occurs i times B ij = { (1) 0 otherwise Thus there will be three such matrices for a particular sequence. Combining each segment (e.g. Ψ1, Ψ2, Ψ3) individually for all sequences within a topology the Total frequency matrices are obtain. There will be three such Total Frequency Matrices T for the entire topology with the dimension same as B matrix and they are defined as T ij = Bij (2) ij A weight matrix W is then generated for each of 3 T matrices with the same dimension as T matrix. It can be defined as Wij = ( Tij / N) *100 (3) Now using the equation (2) and (3) the Total characteristics matrix TC is generated for each segment (Ψ1, Ψ2, Ψ3). This TC matrix holds all information about the characteristics of that topology. Therefore there will be three matrices for the entire topology with the same dimension as T matrix and they are defined as Tij* Wij for i = 0 TCij = (4) Tij * W ij*i(number of occurance) otherwise From these three TC matrices three Binary Ideal Characteristics matrices IC is generated for the topology, which will represent the ideal/representative characteristics. IC matrix can be defined as 1if TCij holds the highest value of column j IC ij = { (5) 0 otherwise Finally these IC matrices are used for the selection of candidate protein for representative of topology. Using strict/loose matching technique each three segments of each of the proteins sequences is matched with the one corresponding IC matrix of three. The sequence that matches most with the IC matrices is selected as a candidate to be representative of that topology. In a strict matching technique the binary characteristics B matrix of each sequence is compared with the IC matrix and checked whether they are identical i.e. mismatch error is 0. A mismatch counter is calculated and the mismatch threshold is set to maximum six. For calculating mismatch error for a protein sequence using strict matching technique comparison between B matrix and IC matrix for each segment is required (e.g. B matrix of Ψ1 with IC matrix for Ψ1 and so on ) : (6) Then the Ψ1 is calculated by summing up all the j value of all 20 amino acid column of the Ψ1 segment. Similarly Ψ 2 and Ψ3 are calculated for Ψ2, Ψ3. (7) To calculate the total mismatch error across all the three segments for a protein sequence (8) If only one sequence with min<=6 is found then it is considered as the representative of that topology, if there is more than one sequence found with the same min a candidate list is generated. From that candidate list one is selected as representative of that topology. If no sequences come across to meet within the threshold value a loose matching approach is then followed. In a loose matching approach the threshold value is increased to sixty (3 Matrices x 20 Amino Acids = 60). To calculate mismatch error for a protein sequence using loose matching technique comparison between B matrix and IC matrix for each segment is required (e.g. B matrix of Ψ1 with IC matrix for Ψ1 and so on): (9) Then the Ψ1 is calculated by summing up all the j value of all 20 amino acid column of the Ψ1 segment/part. Similarly Ψ2 and Ψ3 are calculated for Ψ2, Ψ3. (10) To calculate the total mismatch error across all the three segments for a protein sequence (11) If only one sequence with min is found then it is considered as the representative of that topology, if there is more than one sequence found with the same min a candidate list is generated. Then we have to refine this candidate list using refining process. The refining process is an iterative process. Now iterate the whole process until now (Calculating IC matrix of a topology, strict matching/ loose matching) N=number of proteins in the candidate list Iterate the whole process on the list of N number of protein found in candidate list If the number of candidates in the list reduces per iteration continue refining iteration If the number of candidates in the list doesn t decrease by 2 consecutive iterations, consider it as Final Candidate list. If only one sequence in the candidate list then it is considered as the representative of that topology, if there is more than one select one of them as representative of that topology. 3.2 Feature Extraction According to the CATH database the 1110 topologies are divided into 4 divisions based on their structural types: Mainly Alpha (α), Mainly Beta (β), Alpha Beta (α-β) and Few Secondary Structures (fss). Because the representative sequences, which are selected during the previous stage (Representative Protein Extraction), were initially collected from CATH database, their corresponding structural information can also be obtained from the database itself. This aspect helps the process of preparing model for the SVM. Taking four amino acids at a time from 20 different amino acids all possible combinations are constructed. So a total of
4 160,000 different quadramer are created (20*20*20*20). Taking all representatives of a particular structural type a Current list of representative is created while the representatives of other 3 structural types are stored in Other list. The frequency of each of this quadramer in both list is then calculated. Let fci,j be the frequency of ith quadramer in jth candidate sequence from the Current list and foij be the frequency of ith quadramer in jth candidate sequence from the Other list. A difference matrix (Diffi) is computed based the the absolute difference between the number of occurrences of those 160,000 possible quadramers of amino acids in Current and Other list. Diffi= fci,j - foij Where, Diffi is the absolute difference of the occurrence of the ith quadramer. Sorting Diffi, in descending order and first 4000 quadramers with the highest frequency are selected as feature set for SVM training. For normalization of values of these features the minimum and maximum frequency of each of these top 4000 quadramers from both Current and Other list are taken into account. So a normalization parameter normi can be obtained according to the following equation. 3.3 SVM Training SVM, a supervised machine-learning technique has been used for computational biological problems as it can handle computationally expensive and noise data in a very efficient way which occurs very frequently in biology. It can also solve multi-class classification problems using the structural minimization principle. Given a training set in a vector space, SVM can find the best decision hyper plane, which separates two classes. The quality of the decision hyper plane depends on the difference margin between the two hyperplanes defined by the SVM [13, 14]. For the SVM training of the proposed method the Libsvm (version-2.84) is used [9]. Libsvm is available free, simple, easy-to-use, and efficient software for SVM classification and regression. It is also fast and memory efficient implementation of a SVM. The training is done using the 1110 representatives from the CATH database selected thorough the representative selection procedure discussed before. These representatives are classified in four secondary classes. The training and testing of the proposed method is done in two ways, one-against-one classification and oneagainst-others classification or multi-class prediction [10]. Table 2 The training data set built by taking the representative protein sequences from each T-level fold family of 1110 total topology families Classes Number of fold families Range of chain length Α Β α- β Fss Result and Analysis (12) In the One-against-One classification one is trained on data from two classes [11]. A binary classifier is constructed which maps examples of one of the class to +1 and the other to 1. The prediction accuracy of each one-against-one classification using proposed method is shown in Table 2. SVM parameter S=1 and T=2 are used for this test. The prediction accuracy of each one-against-one classification using existing best known SVM method is shown in Table 3[12]. Table 3 Prediction accuracy of one-against one classification of proposed method vs existing method Accuracy Classifiers (%) Proposed Existing Method Method[12] α vs. β α vs. α β α vs. fss β vs. α β β vs. fss α β vs. fss Average Accuracy Table 4 The binary classification accuracies of class folds (one-against-other) of proposed method vs existing method Classifier α vs. other (including β, α β and fss) β vs. other (including α, α β and fss) α β vs. other (including α, β and fss) fss vs. other (including α, β and α β) Accuracies of dipeptide frequency (%) Proposed Existing Method Method[12] Average accuracies Four One-against-Others classifications are used in this proposal and the prediction accuracy displays promising result for most of them. The results are presented in Table 3. Two pair of optimized parameter such as (S=1, T=2 for α vs. other and α β vs. other) and (S=0, T=2 for β vs. other and fss vs. other ) are used for this test. From these two pair we the best accuracy for the multi-class is taken. The classifier fss vs. others gives the highest prediction accuracy, being about 92% and The β vs. others also gives good accuracy, in the range of 82% 83% for the parameter (S=0,T=2). The classifier α vs. others, for example, only give about 84% accuracy, and the another classifier α β vs. other gives around 88% accuracy for the parameter (S=0, T=2). The Table 4 displays remarkable result for all four classes of protein. Also 150 random protein sequences were collected and tested using the proposed model, which is generated
5 during the SVM training process. Each class was tested against the proposed model of that respective class. Table 5 shows the test result for 150 random sequences using proposed model. Table 5 Test result for 150 random sequences using proposed model Classifier Correctly Identified (Out of 150) Accuracy (%) Mainly α Mainly β α β Fss Average Accuracy Future Development So far for the characteristics extraction for each topology we ve divided each protein sequence of topologies into three equal parts. For betterment of result in this sector we can divide each protein sequence into three variable parts. For making the candidate list for the representative of each topology and choosing the representative from the candidate list we can create a more complex decision model. For the feature extraction of each structural division we created and compared with a list of all possible amino acid sequence of length four. Using this length as five may increase the result quality. 6 Conclusion The results presented in this paper are the representatives of the topologies. From the result analysis, it is proved that these 1110 topologies will represent the whole protein world more accurately. These representatives can be used as benchmark like RS (126) and CB (513). So that each structure prediction method can be trained by these representatives and compare their result with each other. CATH database using SVM approach, no result or data is found regarding the prediction of random protein sequences, so that it is not possible to compare the proposed result of prediction with the existing one. But so far it is found that from any other prediction method, the proposed method will be able to accurately generate the secondary structure of protein at a satisfactory level (more than the existing accuracy of near about 60 70%). In addition to that the computation time will be reduced. References [1] Web URL: [2] Rost B. and Sander C. (1993) Prediction of Protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol., 232, [3] Frishman D. and Argos P. (1996) Incorporation of nonlocal interactions in protein secondary structure prediction from the amino acid sequence. Protein Engineering, 2, [4] Cuff, J. A. and Barton G. J. (1999) Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction. Proteins, 34, [5] Murzin A.G., Brenner S. E., Hubbard T. J. P. and Chothia C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 247, [6] Brenner S. E., Koehl P. and Levitt M. (2000) The ASTRAL Compendium for Protein Structure and Sequence Analysis, Nucleic Acids Res., 28, [7] Frishman D. and Argos P. (1995) Knowledge-based protein secondary structure assignment, Proteins, 23, [8] Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM, CATH-A Hierarchic Classification of Protein Domain Structures, Structure, 1997, 5: [9] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, Software available at [10] Minh N. Nguyen, Jagath C. Rajapakse, Prediction of Protein Secondary Structure with two-stage multi-class SVMs, Genome Informatics 14, 2003, pp [11] KreBel, U., Pairwise classifcation and support vector machines, In Advances in Kernel Methods- Support Vector Learning, Cambridge, 1999, MA: MIT Press, pp: [12] X.-D. Sun and R.-B. Huang, Prediction of protein structural classes using support vector machines, Amino Acids (2006),Volume 30, Number 4 / June, 2006: [13] Rost B. and Sander C. (1993) Prediction of Protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol., 232, [14] Frishman D. and Argos P. (1996) Incorporation of nonlocal interactions in protein secondary structure prediction from the amino acid sequence. Protein Engineering, 2, [15] Liu Y., Carbonell J., Klein-Seetharaman J. and Gopalakrishnan V. (2004) Context Sensitive Vocabulary And its Application in Protein Secondary Structure Prediction, SIGIR 04, Sheffield, South Yorkshire UK. [16] Ward J. J., McGuffin L. J., Buxton B. F. and Jones D. T. (2003) Secondary structure prediction with support vector machines. Bioinformatics, 13,
Protein Structure: Data Bases and Classification Ingo Ruczinski
Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationNumber sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence
Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationStatistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia
More informationAmino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1
Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings
More informationEfficient Remote Homology Detection with Secondary Structure
Efficient Remote Homology Detection with Secondary Structure 2 Yuna Hou 1, Wynne Hsu 1, Mong Li Lee 1, and Christopher Bystroff 2 1 School of Computing,National University of Singapore,Singapore 117543
More informationA General Model for Amino Acid Interaction Networks
Author manuscript, published in "N/P" A General Model for Amino Acid Interaction Networks Omar GACI and Stefan BALEV hal-43269, version - Nov 29 Abstract In this paper we introduce the notion of protein
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationResearch Article Extracting Physicochemical Features to Predict Protein Secondary Structure
The Scientific World Journal Volume 2013, Article ID 347106, 8 pages http://dx.doi.org/10.1155/2013/347106 Research Article Extracting Physicochemical Features to Predict Protein Secondary Structure Yin-Fu
More informationProtein Structure Prediction using String Kernels. Technical Report
Protein Structure Prediction using String Kernels Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationSTRUCTURAL BIOLOGY AND PATTERN RECOGNITION
STRUCTURAL BIOLOGY AND PATTERN RECOGNITION V. Cantoni, 1 A. Ferone, 2 O. Ozbudak, 3 and A. Petrosino 2 1 University of Pavia, Department of Electrical and Computer Engineering, Via A. Ferrata, 1, 27, Pavia,
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationProtein Structure and Function Prediction using Kernel Methods.
Protein Structure and Function Prediction using Kernel Methods. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Huzefa Rangwala IN PARTIAL FULFILLMENT OF THE
More informationPresentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy
Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Burkhard Rost and Chris Sander By Kalyan C. Gopavarapu 1 Presentation Outline Major Terminology Problem Method
More informationImproved Protein Secondary Structure Prediction
Improved Protein Secondary Structure Prediction Secondary Structure Prediction! Given a protein sequence a 1 a 2 a N, secondary structure prediction aims at defining the state of each amino acid ai as
More informationProtein Secondary Structure Prediction
Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms
More informationAnalysis and Prediction of Protein Structure (I)
Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng
More information1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)
Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationK-means-based Feature Learning for Protein Sequence Classification
K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract
More informationA New Similarity Measure among Protein Sequences
A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence
More informationSUPPLEMENTARY MATERIALS
SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationHeteropolymer. Mostly in regular secondary structure
Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!
More informationConditional Graphical Models
PhD Thesis Proposal Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute University Thesis Committee Jaime Carbonell (Chair) John Lafferty Eric P. Xing
More informationAlpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and
More informationSubstitution Matrix based Kernel Functions for Protein Secondary Structure Prediction
Substitution Matrix based Kernel Functions for Protein Secondary Structure Prediction Bram Vanschoenwinkel Vrije Universiteit Brussel Computational Modeling Lab Pleinlaan 2, 1050 Brussel, Belgium Email:
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationImproving Protein 3D Structure Prediction Accuracy using Dense Regions Areas of Secondary Structures in the Contact Map
American Journal of Biochemistry and Biotechnology 4 (4): 375-384, 8 ISSN 553-3468 8 Science Publications Improving Protein 3D Structure Prediction Accuracy using Dense Regions Areas of Secondary Structures
More informationProtein structure similarity based on multi-view images generated from 3D molecular visualization
Protein structure similarity based on multi-view images generated from 3D molecular visualization Chendra Hadi Suryanto, Shukun Jiang, Kazuhiro Fukui Graduate School of Systems and Information Engineering,
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationJessica Wehner. Summer Fellow Bioengineering and Bioinformatics Summer Institute University of Pittsburgh 29 May 2008
Journal Club Jessica Wehner Summer Fellow Bioengineering and Bioinformatics Summer Institute University of Pittsburgh 29 May 2008 Comparison of Probabilistic Combination Methods for Protein Secondary Structure
More information1-D Predictions. Prediction of local features: Secondary structure & surface exposure
1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local
More informationImproving Protein Secondary-Structure Prediction by Predicting Ends of Secondary-Structure Segments
Improving Protein Secondary-Structure Prediction by Predicting Ends of Secondary-Structure Segments Uros Midic 1 A. Keith Dunker 2 Zoran Obradovic 1* 1 Center for Information Science and Technology Temple
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationBetter Bond Angles in the Protein Data Bank
Better Bond Angles in the Protein Data Bank C.J. Robinson and D.B. Skillicorn School of Computing Queen s University {robinson,skill}@cs.queensu.ca Abstract The Protein Data Bank (PDB) contains, at least
More informationGenome Databases The CATH database
Genome Databases The CATH database Michael Knudsen 1 and Carsten Wiuf 1,2* 1 Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark 2 Centre for Membrane Pumps in Cells and Disease
More informationTMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg
title: short title: TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg lecture: Protein Prediction 1 (for Computational Biology) Protein structure TUM summer semester 09.06.2016 1 Last time 2 3 Yet another
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationPROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES
PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES by Lipontseng Cecilia Tsilo A thesis submitted to Rhodes University in partial fulfillment of the requirements for
More informationTwo-Stage Multi-Class Support Vector Machines to Protein Secondary Structure Prediction. M.N. Nguyen and J.C. Rajapakse
Two-Stage Multi-Class Support Vector Machines to Protein Secondary Structure Prediction M.N. Nguyen and J.C. Rajapakse Pacific Symposium on Biocomputing 10:346-357(2005) TWO-STAGE MULTI-CLASS SUPPORT VECTOR
More informationAnalysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases
Sliding helices and strands in structural comparisons 921 Analysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases V S GOWRI, K ANAMIKA, S GORE 1 and
More informationProtein Secondary Structure Prediction using Feed-Forward Neural Network
COPYRIGHT 2010 JCIT, ISSN 2078-5828 (PRINT), ISSN 2218-5224 (ONLINE), VOLUME 01, ISSUE 01, MANUSCRIPT CODE: 100713 Protein Secondary Structure Prediction using Feed-Forward Neural Network M. A. Mottalib,
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationThe Homology Kernel: A Biologically Motivated Sequence Embedding into Euclidean Space
The Homology Kernel: A Biologically Motivated Sequence Embedding into Euclidean Space Eleazar Eskin Department of Computer Science Engineering University of California, San Diego eeskin@cs.ucsd.edu Sagi
More informationProtein tertiary structure prediction with new machine learning approaches
Protein tertiary structure prediction with new machine learning approaches Rui Kuang Department of Computer Science Columbia University Supervisor: Jason Weston(NEC) and Christina Leslie(Columbia) NEC
More informationProtein structure analysis. Risto Laakso 10th January 2005
Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM
More informationSVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition Iain Melvin 1,2, Eugene Ie 3, Rui Kuang 4, Jason Weston 1, William Stafford Noble 5, Christina Leslie 2,6 1 NEC
More informationfrmsdalign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity
1 frmsdalign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity HUZEFA RANGWALA and GEORGE KARYPIS Department of Computer Science and Engineering
More informationProtein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society
1 of 5 1/30/00 8:08 PM Protein Science (1997), 6: 246-248. Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society FOR THE RECORD LPFC: An Internet library of protein family
More informationPROTEIN FOLD RECOGNITION USING THE GRADIENT BOOST ALGORITHM
43 1 PROTEIN FOLD RECOGNITION USING THE GRADIENT BOOST ALGORITHM Feng Jiao School of Computer Science, University of Waterloo, Canada fjiao@cs.uwaterloo.ca Jinbo Xu Toyota Technological Institute at Chicago,
More informationA Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries
A Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries Betty Yee Man Cheng 1, Jaime G. Carbonell 1, and Judith Klein-Seetharaman 1, 2 1 Language Technologies
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More information#33 - Genomics 11/09/07
BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33
More informationProtein Fold Recognition Using Gradient Boost Algorithm
Protein Fold Recognition Using Gradient Boost Algorithm Feng Jiao 1, Jinbo Xu 2, Libo Yu 3 and Dale Schuurmans 4 1 School of Computer Science, University of Waterloo, Canada fjiao@cs.uwaterloo.ca 2 Toyota
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationInterpolation and Polynomial Approximation I
Interpolation and Polynomial Approximation I If f (n) (x), n are available, Taylor polynomial is an approximation: f (x) = f (x 0 )+f (x 0 )(x x 0 )+ 1 2! f (x 0 )(x x 0 ) 2 + Example: e x = 1 + x 1! +
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationDecember 2, :4 WSPC/INSTRUCTION FILE jbcb-profile-kernel. Profile-based string kernels for remote homology detection and motif extraction
Journal of Bioinformatics and Computational Biology c Imperial College Press Profile-based string kernels for remote homology detection and motif extraction Rui Kuang 1, Eugene Ie 1,3, Ke Wang 1, Kai Wang
More informationPredicting the Probability of Correct Classification
Predicting the Probability of Correct Classification Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract We propose a formulation for binary
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More informationProtein Structure Prediction and Display
Protein Structure Prediction and Display Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each
More informationPROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES
PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2
More informationINDEXING METHODS FOR PROTEIN TERTIARY AND PREDICTED STRUCTURES
INDEXING METHODS FOR PROTEIN TERTIARY AND PREDICTED STRUCTURES By Feng Gao A Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for
More informationBioinformatics. Macromolecular structure
Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain
More information2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon
A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationIMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS
IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS Aslı Filiz 1, Eser Aygün 2, Özlem Keskin 3 and Zehra Cataltepe 2 1 Informatics Institute and 2 Computer Engineering Department,
More informationNRProF: Neural response based protein function prediction algorithm
Title NRProF: Neural response based protein function prediction algorithm Author(s) Yalamanchili, HK; Wang, J; Xiao, QW Citation The 2011 IEEE International Conference on Systems Biology (ISB), Zhuhai,
More informationSupport Vector Machine. Industrial AI Lab. Prof. Seungchul Lee
Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationProtein Complex Identification by Supervised Graph Clustering
Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie
More informationMSAT a Multiple Sequence Alignment tool based on TOPS
MSAT a Multiple Sequence Alignment tool based on TOPS Te Ren, Mallika Veeramalai, Aik Choon Tan and David Gilbert Bioinformatics Research Centre Department of Computer Science University of Glasgow Glasgow,
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationAnalysis of N-terminal Acetylation data with Kernel-Based Clustering
Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationA new prediction strategy for long local protein. structures using an original description
Author manuscript, published in "Proteins Structure Function and Bioinformatics 2009;76(3):570-87" DOI : 10.1002/prot.22370 A new prediction strategy for long local protein structures using an original
More informationProtein Structure & Motifs
& Motifs Biochemistry 201 Molecular Biology January 12, 2000 Doug Brutlag Introduction Proteins are more flexible than nucleic acids in structure because of both the larger number of types of residues
More informationA Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationPrediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines
Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,
More informationRadial Basis Function Neural Networks in Protein Sequence Classification ABSTRACT
(): 195-04 (008) Radial Basis Function Neural Networks in Protein Sequence Classification Zarita Zainuddin and Maragatham Kumar School of Mathematical Sciences, University Science Malaysia, 11800 USM Pulau
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationBIOINFORMATICS ORIGINAL PAPER doi: /bioinformatics/btl642
Vol. 23 no. 9 2007, pages 1090 1098 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl642 Structural bioinformatics A structural alignment kernel for protein structures Jian Qiu 1, Martial Hue
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationMolecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007
Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationBioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter
Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction Institute of Bioinformatics Johannes Kepler University, Linz, Austria Chapter 4 Protein Secondary
More informationPredictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 3 (2017) pp. 461-469 Research India Publications http://www.ripublication.com Predictive Analytics on Accident Data Using
More informationMeasuring quaternary structure similarity using global versus local measures.
Supplementary Figure 1 Measuring quaternary structure similarity using global versus local measures. (a) Structural similarity of two protein complexes can be inferred from a global superposition, which
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationThe CATH Database provides insights into protein structure/function relationships
1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No. 1 275 279 The CATH Database provides insights into protein structure/function relationships C. A. Orengo, F. M. G. Pearl, J. E. Bray,
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationA profile-based protein sequence alignment algorithm for a domain clustering database
A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing
More information