A profile-based protein sequence alignment algorithm for a domain clustering database
|
|
- Osborn Watkins
- 6 years ago
- Views:
Transcription
1 A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing Technology, Chinese Academy of Sciences 2, Graduate School of Chinese Academy of Sciences 3, National Natural Science Foundation of China Abstract- Aiming at the two main shortcomings in Homology Modeling, we have designed and established a domain clustering database. Searching the database is a fundamental work for it. However, current alignment algorithms are mainly based on the sequences, ignoring the structure conservation in domain. This paper proposed a profile-based alignment which considers the structure information into the profile, based on the character of our domain database. We designed an experiment within the database. The results show that both the quality and sensitivity of our scheme are better than pure Smith-Waterman and sequence-based profile algorithms. We strongly believe that this work can help to improve the protein structure prediction. I. INTRODUCTION Sequence alignment is a fundamental tool in Computational Biology and Bioinformatics. With this tool we can get a lot of useful information, such as which genes have the same function, which RNAs belong to the same class and which proteins have the same structure topology, etc. Moreover, in the area of protein structure prediction, obtaining the alignment between structure-unknown protein sequence (query) and its structure-known homologies (templates) is the most fundamental step in the modeling processing, and the quality of the alignment affects the prediction result greatly. Generally speaking, there are three categories of methods to create an alignment: single sequence based, multiple sequence alignments and profile based. Single sequence based methods use the standard dynamic programming algorithm to generate the alignment, for example, Needelman-Wunch algorithm [] and Smith-Waterman algorithm [2]. Since this method only utilizes the sequence information, the quality of the alignment will drop greatly when the sequence identity is less than 30%. Multiple sequence alignments create alignment between more than three sequences. Since simultaneous alignment of several sequences is a NP-hard computational problem, most of the methods use a heuristic algorithm, such as ClustalW [3], DIALIGN [4] and T-COFFEE [5]. However, the alignment quality and computational cost are two critical problems in this kind method. Profile-based methods have greatly accelerated with the development of the PSI-BLAST program by Alschul et al [6]. These methods improve the alignment quality by using a profile to describe the characters in the similar sequences and aligning a sequence or a profile with other profile. Because the profile accurate records the most relevant information from the multiple sequence alignment, the quality of this method is better than the others. Several groups have published profile-to-profile alignment methods, such as PSI-BLAST [6] and HMMER [7]. Most of profile-based methods use standard Smith-Waterman local alignment method, but they vary significantly in a number of important respects, such as scoring functions, gap penalties, weighting schemes and whether adding a secondary-structure substitution matrix. Although all these methods use different information and different methodologies, an accurate alignment still remains a major challenge, especially when the sequence similarity fell into the twilight zone (<=25% sequence identity). This is largely resulted from the fact that often it is very difficult to obtain a correct scoring matrix where not only mutations but more importantly insertions and deletions /06/$ IEEE
2 occurring during evolution. It is generally accepted that structural alignment based only on the three-dimensional coordinates would accurately represent the corresponding residues as well as the boundary and site of any gaps. As we stated above, sequence alignment is a critical factor in protein structure prediction. Also, it is important to note that existing modeling techniques still use sequence alignment to select structural templates, however these techniques suffer from several shortcomings, which limit the practical applicability of comparative modeling:. Since the number of the known structures is smaller than that of sequences greatly, the lack of templates is a big problem. 2. The query-template alignment quality drops greatly when the sequence identities fall blow 30%. Facing the first problem, we designed and constructed a domain-based templates database in our previous study. For each template in this database, we superimposed the corresponding structures and provided a multiple structure alignment based only on the three-dimensional information of structure involved. The detail is described in [8]. Facing the second problem, we present a method to extract a profile from the multiple structure alignment of each template in our database. Also we develop a query-template alignment method using the profile. Preliminary experimental results show that our profile-based alignment method significant improves the accuracy of selecting structural template. The organization of this paper is as follows. The next section briefly reviews the construction of domain-based template database. The subsequent section describes the profile-based alignment using our domain-based template. Then we provide and discuss experimental results on some datasets. Finally, we present the main conclusion of the paper and discuss for the future work. II. CONSTRUCTUION DOMAIN-BASED TEMPLATE DATABASE Since proteins evolve with their structural and functional domains as independent units, proteins and their structures can be largely described as combinations of conserved protein domains. This motivates us to construct a domain-based template database to increase the likelihood of widely applicable structure templates. We first searched all the InterPro [9] domains in PDB [0] using the program iprscan software [], and mapped the corresponding protein structures in the PDB. All the PDB protein sequences in this project were parsed directly from the structural records reorganized by MSD database. From the protein family and superfamily databases, such as Pfam [2], SCOP [3], SMART [4] and TIGERFAM [5], we then used the program HMMER [7] to obtain the consensus sequences for each InterPro domain. Next, we partitioned the structural correspondences from the PDB for each InterPro domain, and constructed a primary domain cluster. For each domain cluster, we compared all the sequences of domains involved with the relevant consensus sequence based on sequence similarity, then chose and refined the domain cluster by removing the structure whose sequence identity or structure similarity is less than a pre-defined threshold. Since all the domains in each of clusters are conserved in both sequence and structure, we adopt the conserved structures as template for the relevant domain cluster. The domain based template database can be accessed by website: A B Fig.. Two typical structural ensembles of conserved domain clusters Two typical structure ensembles are illustrated in Fig.. Ensemble A shows the structure ensemble of domain cluster IPR00008, and includes 8 individual structures. All the RMSD of the structures involved are less than Å. Ensemble B shows the structure ensemble of PDZ domain cluster (IPR00478) which including 4 structures. All the RMSD of the structures involved are less than 3 Å. To highlight the conserved structural regions in each domain cluster, we superimposed these conserved structures
3 using Dali [6] or CE [7] algorithm. Based on the conserved structural ensemble for each domain cluster, we also generated a multiple structural alignment for each domain cluster, purely from the backbone coordinates of residues. Since this structural alignment is independent of the sequence similarity, it provides more sensitive and position-specific signatures than the sequence alignment. The detail of construction the database is described in [8]. III. PROFILE-BASED ALIGNMENT USING THE DOMAIN-BASED TEMPLATE DATABASE As we know, an accurate and complete query-templates alignment is very critical in comparative modeling. Most of current modeling techniques are based on sequence information to generate the query-templates alignment, ignoring structure information. Pair-wise alignment algorithm such as Smith-Waterman, FASTA [8] and BLAST [9] can not capture the full joint information content of the group even when the multiple-alignment consensus sequence is used as the query. Since Gribskov first introduced the idea of profiles to search database [20], sequence alignment profiles have been shown to be very powerful in creating accurate sequence alignment. In recent years, many strategies, including structural information and surface accessibility, were proposed to determine the profile or the position-specific gap penalties. Since our template database provided a multiple structural alignment and a superimposed structure ensemble for each domain cluster (template), we strongly believe that we can build more reliable and sensitive profiles, using both the multiple structure alignment and relevant structure information. In order to build a profile for each domain, we analyzed the sequence and structure characters of the database by a statistics way, since we believe that most domains in the database are conserved in sequence and structure. Then a profile for each domain cluster can be built based on the multiple structure alignment and relevant structure information. The query sequence is then searched the database by the position-specific scoring matrix of each domain cluster to valid our profile-based alignment. A. Conservation Statistics from Sequence and Structure Information in Domain Database In order to build the profile for each template from its sequence and structure information, we first analyzed the relationship of the residues and structural information in the template database by a statistics way. For each position of amino acid in each template, we classified the coordinates and residue type based on the superimposed structure ensemble. Here, the spatial location of the alpha carbon atom is considered as presentation of each amino acid. Then, we extracted the alpha carbon atoms from each structure, aligned these positions and clustered them by calculating the distance between each two positions. We used a hierarchical clustering algorithm to implement it, and set Å as distance cutoff. We listed some statistics results for several domain clusters in the database, as shown in table I. In the table, column is the domain cluster ID, column 2 is the percent of only one amino acid type in one coordinate cluster, and column 3 to 7 are the percent of two to six residue types in one coordinate cluster respectively. The table indicated the distribution of the number of amino acid type in each coordinate cluster. It shows that amino acids in one coordinate cluster are more likely to be one residue type, when using cutoff Å to cluster the positions. TABLE I SOME STATISTICS RESULTS FROM THE DATABASE ID IPR IPR IPR IPR IPR IPR IPR IPR IPR Fig.2 shows the same analysis results for the whole database, one residue type in a coordinate cluster is 95.66% for the entire domain clusters, two residue types is 3.63%, three residue types is 0.49%, and more than four residue
4 types in one cluster is only 0.23%. Therefore, an overwhelming majority of the amino acids in one position belong to the same residue type. This conservative property between sequential and structural information can help us to build more accurate profile. Fig 2. The distribution of different types in one coordinate cluster B. Building Profile from Sequence and Structure Information Based on the sequence and structure information, we build a profile for each domain cluster. The profile is defined as a sequence position-specific scoring matrix M(p,a) composed of 2 columns and m rows (m = length of alignment). The first 20 columns of each row specify the score of the 20 amino acid residues respectively. An additional column contains a penalty for insertions or deletions at that position. In position p of alignment A (N structures), AA(a) is defined as the class of amino acid type a, SS(i) is the class of carbon alpha coordinates clustering i (which is mentioned in the last section) and the W(p,a) is the weight for the appearance of amino acid a at position p. For the sequence information, the weight of each amino acid type is determined as follows: Supposed that there are n(a) items in residue class AA(a), then the average weight for class AA(a) is W (p,a) = n(a)/n. For the structure information, the weight for each class is determined as follows: Supposed that there are n(s i ) items in class SS(i), then the weight is W 2 (p,s i ) = n(s i )/N. Then, the W(p,a) can be calculated with W and W 2. W ( p, a) = [ W ( p, a) * AllSS ( i) n( a, i) * W 2 ( p, Here, σ is a normalized unit which ensures that W ( p, a) =. a { a mino acid type} s i.00% 2.00% 3.00% >3% )]* σ And n(a,i) is the number of class SS(i) at position a. Then the position-specific scoring matrix M(p,a) is made by the equation that: M ( p, a) = 20 b= W ( p, b) * Y( a, b) Where Y(a,b) is a scoring matrix, such as BLOSUM62. The profile specific position-dependent penalties for insertions and deletions can be set a high value to prevent insertions in positions where no gaps occurs and set a low value to allow insertions in regions where insertions are observed in the alignment. The penalty applied, gap(l), for creating a gap during the match of profile to query is given by gap(l) = gap [gap_open+gap_ext*l], in which gap is the penalty given in the last column of the profile, L is the number of residue positions in the gap, and gap_open and gap_ext are the penalties for gap opening and gap extension, respectively. C. profile-based alignment Since our profile accurate record both sequence and structure properties, with the profile of each domain cluster, we can use the Smith-Waterman local alignment algorithm to find which domain the query sequence more likely belongs to. The major difference of our profile-based alignment from dynamic programming algorithm and other profile-based alignment algorithms lies in the scoring scheme. Our profile-based alignment uses not only the sequence information derived from domain cluster, but also uses the structure information extracted from superimposed structure ensembles, whereas, in the raw dynamic programming algorithm, the score is based on the comparison of amino acids in the corresponding positions in two sequences, other profile-based alignment algorithms mostly use the sequence information derived by family sequences. IV. RESULTS AND DISCUSSIONS To evaluate the performance of the alignment scheme described in this paper, we tested it within the whole database. There is a reference sequence whose structural distance between others in one domain cluster is the
5 smallest. Also we selected the sequence whose structural distance is the remotest to the reference as the benchmark. Then the benchmark sequence was searched by our profile-based alignment algorithm with the whole database. With the statistics information got from the database, we classified the domain clusters into four types: sequence and structure conserved; structure conserved; sequence conserved and mixed. The conservation is defined as the number of amino acid type or 3D coordinates cluster less than half of the total number at each position in the alignment. The sequence and structure conserved, is the domain cluster whose amino acid type and 3D coordinates are both conserved; the structural conserved, is the domain which only the 3-D coordinates are conserved; the third one is only the amino acid type accord with the conserved condition; the last one consists of both amino acid conserved parts and 3D coordinates conserved parts. We listed the number of domain cluster in each type, as shown in table II. TABLE II. THE NUMBER OF DOMAINS CLUSTER IN EACH TYPE. Class type Number of Domain cluster sequential and structural conserved 05 structural conserved 28 the amino acid type number to weight on that type. Using these 3 different alignment methods, we compared the query (total,05 datasets) with the entire database, table III shows the number of hits and false for each method. From the table we can see that profile-based method improves the alignment significance. Using the consensus sequences aligned by Smith-Waterman algorithm, we can only got 788 (~25%) hits in this cluster type. It has high false rate. Sequence-based profile brings the sequence information into the profile and scoring scheme, it improves the hit rate up to 88%, but there still remain 22 false hits. Our profile contains not only the sequence information but also the structure information, so it can improve the hit rate up to 9%. TABLE III THE NUMBER OF HITS AND FALSE FOR EACH METHOD hits false Smith-Waterman 788 (75%) 263 (25%) Sequence-based profile alignment 929 (88%) 22 (2%) Combined profile alignment 952 (9%) 99 ( 9%) A sequential conserved 784 mixed type 974 We picked up some domain clusters from each type to evaluate our score scheme of alignment algorithm. In each domain cluster, we selected a query and then aligned it to the whole database. A. Sequence and Structure Conserved Domain It can be said that domain cluster in this type is the most conserved one. Within this type, our scoring scheme can reflect the conservative features. To evaluate the alignment significant, we compared our profile-based alignment with pure Smith-Waterman sequence alignment and sequence-based profile alignment algorithms. The score matrix in Smith-Waterman algorithm is BLOSUM62. The gap open and gap extension is 2 and 2 respectively. The sequence-based profile alignment is one normal profile-based alignment. It builds the profile by counting B C Fig 2. (A), a segment of the multiple structure alignment in cluster IPR (B), the relevant structure superposition. (C), the alignment between query and cluster IPR by Smith-Waterman algorithm. Number of Entries Score Fig 3. Distribution of alignment scores for comparing a query from IPR00029 with the whole database.
6 Fig.2 and Fig.3 demonstrate another example that our method has more sensitivity than other two methods. Here, we chose a query, labeled 2dln_248_276, from the domain cluster IPR Fig. 2A and 2B show a segment of the multiple structure alignment and the relevant structure superposition in the cluster. We can find that these domains very similar in structure level but have some difference in sequence level. Also, we note that there is a domain in cluster IPR has a segment, which is sequential identity with the query, as shown in Fig. 2C. So both Smith-Waterman and sequence-based profile alignment identified the query belongs to cluster IPR However, our profile-based method can distinguish the query form other clusters. Fig.3 shows that the alignment scores for comparing the query with the whole database. In this figure, the highest score is 60, which is the alignment score between the query and the profile of cluster IPR00029, the right domain cluster. So using our profile, we can improve significantly the alignment sensitivity. the results with the structure information. We chose a query, labeled blba_75, form the domain cluster IPR Fig.4A shows the structure superposition in the domain cluster. Since the amino acid type in some positions is variable, both Smith-Waterman and sequence-based profile alignment methods gave the highest score to the consensus of cluster IPR0024. Although some segment in the alignment were matched well, as shown in Fig.4B, the result was wrong. However our profile-based alignment method can give the highest scores to the consensus of right domain cluster, as shown in Fig.5. The highest score is 9, which is the alignment score between the query and the profile of cluster IPR B. Structure Conserved Domain The domain in this type is only structural conserved. The structure topology in one domain cluster takes on the same shape. But their amino acid type in some position is variable. In biology the amino acid type can be mutated while structure and function is the same. This phenomenon is difficult to handle with sequence alignment schemes, such as local, global or sequence-based profile alignment. TABLE Ⅳ THE NUMBER OF HITS AND FALSE FOR EACH METHOD hits false Smith-Waterman 20 (7%) 8 (29%) Sequence-based profile alignment 23 (82%) 5 (8%) Combined profile alignment 27 (96%) ( 4%) In this cluster type, we tested the 3 kinds of alignment methods, the hits and false results were shown in table Ⅳ. Although there are only 28 domain clusters in this type, the results still show that our profile-based method can improve the hit rate. Because the profile reflects the characters of a family, sequence-based profile method improves the hit rate a little. Furthermore, combined profile method improves Fig. 4. (A), the structure superposition in cluster IPR (B) the alignment between query and consensus of cluster IPR0024. Number of Entries Score Fig. 5. The distribution of alignment scores for comparing the query from cluster IPR00064 with the whole database. C. Sequential Conserved Domain TABLE V THE NUMBER OF HITS AND FALSE FOR EACH METHOD hits false Smith-Waterman 665 (85%) 9 (5%) Sequence-based profile alignment 735 (94%) 49 ( 6%) Combined profile alignment 745 (95%) 39 ( 5%) There are 784 domain clusters belong to this type in our database. Table V shows the results to compare the 3 kind
7 of alignment methods. Here we selected a query from domain cluster IPR Fig.6A shows that there are some variable regions in these domains, and Fig.6B shows that the sequences are more conserved. Fig.6C shows the highest alignment score is 57, between the query to the profile form cluster IPR00356, whereas other two methods implied that the query belongs to the cluster IPR Therefore, our scheme proved again to improve the alignment results and sensitivity. veracity through combining the sequence and structure information, although there is only one percent improve in hit rate than sequence-based profile alignment. We selected randomly a query from cluster IPR in this type. Fig. 7 shows the distribution of alignment scores for comparing the query with the whole database. The highest score implies that the query belongs to cluster IPR This figure shows again that our method have more sensitivity than other 2 methods. TABLE VI THE NUMBER OF HITS AND FALSE FOR EACH METHOD hits false Smith-Waterman Sequence-based profile alignment A Combined profile alignment Number of Entries 00 0 B Score Number of Entries Score Fig6. (A), the structure superposition and (B) multiple structure alignment in domain cluster IPR (C). the distribution of alignment scores for comparing the query from cluster IPR00356 with the C whole database. D. Mixed Type Table Ⅵ shows the result comparison of three methods using the mixed type datasets. Because there are some variable regions in sequence level in this kind of domain cluster, the Smith-Waterman algorithm behaves much worse than others. Our profile-based method can improve the Fig 7. The distribution of alignment scores for comparing the query from cluster IPR with the whole database. V. CONCLUSION AND FUTURE WORKS In this paper we proposed a profile-based alignment algorithm, used to our domain-based template database. The statistics analysis shows that most of the domain clusters in our database are conserved both in structural and sequential level, so each element in our profile combines the structural clustering information and the sequence information. With this profile, we developed a profile-based query-template alignment method. To validate if our method is more accurate and sensitivity than other query-template alignment methods, we divided our database into four types, based on sequence and structure conservation. In each type, we made some experiments. The results form each type show that our profile can accurate describe the feature of that domain cluster, as well
8 as, our profile-based method can align the query to right template with low-fault. It show that our method have more sensitivity than other query-template alignment methods. As described above, our final goal is protein structure prediction. So, how to use our domain-based template database and our profile-based query-template alignment method to improve the prediction of protein structure will be investigated in our next work. ACKNOWLEDGMENT This work was supported by the National Natural Science Foundation of China project under and key project under REFERENCES [] Needleman S, Wunsch C, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J Mol Biol, 997, vol.48, p [2] Smith T, Waterman M, Identification of common molecular subsequences, J Mol Biol, 98, vol.47, p [3] J. Thompson, D. Higgins, and T. Gibson, CLUSTALW: improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting Position Specific Gap Penalties and Weight Matrix Choice, Nucleic Acids Res, 994, vol. 22, p [4] Michael Brudno, Michael Chapman, Berthold Gottgens, Serafim Batzoglou and Burkhard Morgenstern, Fast and sensitive multiple Res, (Database Issue): p. D [0] Berman, H.M., et al., The Protein Data Bank. Acta Crystallogr D Biol Crystallogr, (Pt 6 No ): p [] Zdobnov, E.M. and R. Apweiler, InterProScan--an integration platform for the signature-recognition methods in InterPro, Bioinformatics, 200. vol.7(9), p [2] Bateman, A., et al., The Pfam protein families database. Nucleic Acids Res, (Database issue): p. D38-4. [3] Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, (4): p [4] Letunic, I., et al., SMART 4.0: towards genomic data integration. Nucleic Acids Res, (Database issue): p. D [5] Haft. D.H., J.D. Selengut, and O. White, The TIGRFAMs database of protein families, Nucleic Acids Res, 2003, 3(), p [6] Holm. L. and C. Sander, Protein structure comparison by alignment of distance matrices, J Mol Biol (), p [7] Shindyalov IN, Bourne PE, Protein structure alignment by incremental combinatorial extension (CE) of the optimal path, Protein Engineering, 998, vol. (9), p [8] Pearson. W.R, Rapid and sensitive sequence comparison with FASTP and FASTA, Methods Enzymol, 990, vol.83, p [9] S. F. Altschul, W. Gish, W. miller, E. W. Myers and D. J. Lipman, Basic Local Alignment Search Tool, J. Mol. Biol , p [20] Gribskov, M., McLachlan, A.D., and Eisenberg, D, Profile analysis: Detection of distantly related proteins, Proc. Natl. Acad. Sci, 987 vol.84, p alignment of large genomic sequences, Bioinformatics 2003, vol.4, p [5] C. Notredame, D. Higgins, J. Heringa, T-Coffee: A novel method for multiple sequence alignments, J Mol Biol, 2000, vol.302, p [6] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z. and et al, Gapped BLAST and PSI-BLAST: A new generation of database programs, Nucleic Acids Res, 997, vol.25, p [7] SR Eddy, Profile hidden markov models, Bioinformatics, 998, Vol 4, p [8] Fa Zhang, Jingchun Chen, Zhiyong Liu and Bo Yuan, The construction of Structural Templates for the Modeling of Conserved Protein Domains, International Conference on Bioinformatics and its Applications(ICBA 04), Fort Lauderdle. Florida. USA. [9] Mulder, N.J., et al., InterPro, progress and status in Nucleic Acids
An Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationIn-Depth Assessment of Local Sequence Alignment
2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationPROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES
PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationBiologically significant sequence alignments using Boltzmann probabilities
Biologically significant sequence alignments using Boltzmann probabilities P. Clote Department of Biology, Boston College Gasson Hall 416, Chestnut Hill MA 02467 clote@bc.edu May 7, 2003 Abstract In this
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationAlignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationOptimization of a New Score Function for the Detection of Remote Homologs
PROTEINS: Structure, Function, and Genetics 41:498 503 (2000) Optimization of a New Score Function for the Detection of Remote Homologs Maricel Kann, 1 Bin Qian, 2 and Richard A. Goldstein 1,2 * 1 Department
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationMSAT a Multiple Sequence Alignment tool based on TOPS
MSAT a Multiple Sequence Alignment tool based on TOPS Te Ren, Mallika Veeramalai, Aik Choon Tan and David Gilbert Bioinformatics Research Centre Department of Computer Science University of Glasgow Glasgow,
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More information2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon
A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationfrmsdalign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity
1 frmsdalign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity HUZEFA RANGWALA and GEORGE KARYPIS Department of Computer Science and Engineering
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationGrouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
More informationMultiple sequence alignment
Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple
More informationGrundlagen der Bioinformatik, SS 08, D. Huson, May 2,
Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, 2008 39 5 Blast This lecture is based on the following, which are all recommended reading: R. Merkl, S. Waack: Bioinformatik Interaktiv. Chapter 11.4-11.7
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationDetecting Distant Homologs Using Phylogenetic Tree-Based HMMs
PROTEINS: Structure, Function, and Genetics 52:446 453 (2003) Detecting Distant Homologs Using Phylogenetic Tree-Based HMMs Bin Qian 1 and Richard A. Goldstein 1,2 * 1 Biophysics Research Division, University
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationSEQUENCE alignment is an underlying application in the
194 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 8, NO. 1, JANUARY/FEBRUARY 2011 Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific
More informationChapter 7: Rapid alignment methods: FASTA and BLAST
Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem Search strategies FASTA BLAST Introduction to bioinformatics, Autumn 2007 117 BLAST: Basic Local Alignment Search Tool BLAST (Altschul
More informationSequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5
Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More informationProtein Structure Prediction
Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on
More informationEfficient Remote Homology Detection with Secondary Structure
Efficient Remote Homology Detection with Secondary Structure 2 Yuna Hou 1, Wynne Hsu 1, Mong Li Lee 1, and Christopher Bystroff 2 1 School of Computing,National University of Singapore,Singapore 117543
More informationSmall RNA in rice genome
Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and
More informationStructural Alignment of Proteins
Goal Align protein structures Structural Alignment of Proteins 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationDo Aligned Sequences Share the Same Fold?
J. Mol. Biol. (1997) 273, 355±368 Do Aligned Sequences Share the Same Fold? Ruben A. Abagyan* and Serge Batalov The Skirball Institute of Biomolecular Medicine Biochemistry Department NYU Medical Center
More informationSimilarity searching summary (2)
Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationK-means-based Feature Learning for Protein Sequence Classification
K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract
More informationProtein sequence alignment with family-specific amino acid similarity matrices
TECHNICAL NOTE Open Access Protein sequence alignment with family-specific amino acid similarity matrices Igor B Kuznetsov Abstract Background: Alignment of amino acid sequences by means of dynamic programming
More informationA New Similarity Measure among Protein Sequences
A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence
More informationStatistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences
BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. The fully-formatted PDF version will become available shortly after the date of publication, from the
More informationSequence Database Search Techniques I: Blast and PatternHunter tools
Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered
More informationIntroduction to sequence alignment. Local alignment the Smith-Waterman algorithm
Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationBiochemistry 324 Bioinformatics. Pairwise sequence alignment
Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationCombining pairwise sequence similarity and support vector machines for remote protein homology detection
Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William
More informationTruncated Profile Hidden Markov Models
Boise State University ScholarWorks Electrical and Computer Engineering Faculty Publications and Presentations Department of Electrical and Computer Engineering 11-1-2005 Truncated Profile Hidden Markov
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationIntroductory course on Multiple Sequence Alignment Part I: Theoretical foundations
Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationSequence Comparison. mouse human
Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity
More informationSegment-based scores for pairwise and multiple sequence alignments
From: ISMB-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Segment-based scores for pairwise and multiple sequence alignments Burkhard Morgenstern 1,*, William R. Atchley 2, Klaus
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationCombining pairwise sequence similarity and support vector machines for remote protein homology detection
Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William
More informationPROTEIN CLUSTERING AND CLASSIFICATION
PROTEIN CLUSTERING AND CLASSIFICATION ori Sasson 1 and Michal Linial 2 1The School of Computer Science and Engeeniring and 2 The Life Science Institute, The Hebrew University of Jerusalem, Israel 1. Introduction
More informationLecture 5,6 Local sequence alignment
Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution
More informationNumber sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence
Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence
More informationMPIPairwiseStatSig: Parallel Pairwise Statistical Significance Estimation of Local Sequence Alignment
MPIPairwiseStatSig: Parallel Pairwise Statistical Significance Estimation of Local Sequence Alignment Ankit Agrawal, Sanchit Misra, Daniel Honbo, Alok Choudhary Dept. of Electrical Engg. and Computer Science
More informationReducing storage requirements for biological sequence comparison
Bioinformatics Advance Access published July 15, 2004 Bioinfor matics Oxford University Press 2004; all rights reserved. Reducing storage requirements for biological sequence comparison Michael Roberts,
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationThe PRALINE online server: optimising progressive multiple alignment on the web
Computational Biology and Chemistry 27 (2003) 511 519 Software Note The PRALINE online server: optimising progressive multiple alignment on the web V.A. Simossis a,b, J. Heringa a, a Bioinformatics Unit,
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationIMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS
IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS Aslı Filiz 1, Eser Aygün 2, Özlem Keskin 3 and Zehra Cataltepe 2 1 Informatics Institute and 2 Computer Engineering Department,
More informationproteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * 1 Department of Biological Sciences, College
More informationGoals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions
Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke 1,2 and Carlos Camacho 3 1 Bioengineering and Bioinformatics Summer Institute,
More informationBIOINFORMATICS ORIGINAL PAPER doi: /bioinformatics/btm017
Vol. 23 no. 7 2007, pages 802 808 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm017 Sequence analysis PROMALS: towards accurate multiple sequence alignments of distantly related proteins
More informationSequence Analysis and Databases 2: Sequences and Multiple Alignments
1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre (jmgonzalez@cnio.es) 2 Sequence Comparisons:
More informationPhylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches
Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell
More information