The PRALINE online server: optimising progressive multiple alignment on the web

Size: px
Start display at page:

Download "The PRALINE online server: optimising progressive multiple alignment on the web"

Transcription

1 Computational Biology and Chemistry 27 (2003) Software Note The PRALINE online server: optimising progressive multiple alignment on the web V.A. Simossis a,b, J. Heringa a, a Bioinformatics Unit, Faculty of Sciences, Vrije Universiteit, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands b Division of Mathematical Biology, The National Institute for Medical Research, The Ridgeway, NW7 1AA, London UK Received 9 September 2003; received in revised form 9 September 2003; accepted 9 September 2003 Abstract We introduce the online server for PRALINE ( an iterative versatile progressive multiple sequence alignment (MSA) tool. PRALINE provides various MSA optimisation strategies including weighted global and local profile pre-processing, secondary structure-guided alignment and a reliability measure for aligned individual residue positions. The latter can also be used to optimise the alignment when the profile pre-processing strategies are iterated. In addition, we have modelled the server output to enable comprehensive visualisation of the generated alignment and easy figure generation for publications. The alignment is represented in five default colour schemes based on: residue type, position conservation, position reliability, residue hydrophobicity and secondary structure; depending on the options set. We have also implemented a custom colour scheme that allows the user to select which colour will represent one or more amino acids in the alignment. The grouping of sequences, on which the alignment is based, can also be visualised as a dendrogram. The PRALINE algorithm is designed to work more as a toolkit for MSA rather than a one step process. Keywords: PRALINE; Multiple sequence alignment; Profile pre-processing; Secondary structure-guided MSA; Positional reliability 1. Introduction Biological data processing tools are applied in many disciplines and vary largely in complexity and specificity. One very important and complex problem is multiple sequence alignment (MSA), which comprises a cornerstone field in bioinformatics. A wide range of disciplines in computational biology such as phylogeny, function prediction, secondary and tertiary structure prediction, modelling, sequence analysis and many more, all largely based on MSA. The MSA problem has been addressed in various ways and many strategies have been developed over the last two decades to try and improve the quality and reliability of MSA over a vast number of alignment cases (for reviews see Heringa et al., 1997; Notredame, 2002; Simossis et al., 2003). One of the most successful alignment strategies is progressive alignment (Hogeweg and Hesper, 1984; Feng and Doolittle, 1987), which is implemented in most top per- Corresponding author. Tel.: ; fax: address: heringa@cs.vu.nl (V.A. Simossis). forming MSA methods (Thompson et al., 1994; Heringa, 1999; Notredame et al., 2000, Holmes, 2003). Commonly in progressive alignment, a dendrogram is precompiled based on sequence similarity scores, and used in progressively ordering the most related, and thus least error prone, sequences to be aligned first. However, the main problem with progressive alignment is that once a sequence has been aligned into the growing MSA it cannot be altered, even if newly added sequences require it ( once a gap always a gap, Feng and Doolittle, 1987). Therefore, early alignment errors are carried into the successive alignment steps and can cause further, larger errors to arise (error propagation). Such error propagation becomes even more detrimental to the alignment quality when the progressive strategy is used iteratively (Heringa, 1999, 2000, 2002). To counteract this weakness of progressive alignment, various researches have developed optimisation steps to minimise the probability of early errors. Amongst the most successful progressive alignment methods using optimisation strategies are PRA- LINE, whose strategies will be briefly discussed (for elaborate accounts see Heringa, 1999, 2002), ClustalW with a number of heuristics (Thompson et al., 1994) and T-Coffee /$ see front matter doi: /j.compbiolchem

2 512 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) with the matrix extension strategy (Notredame et al., 2000). PRALINE follows a methodology similar to other progressive alignment methods but comprises three novel optimisation strategies: global profile pre-processing, local profile pre-processing, and secondary structure-guided alignment. These optimisation strategies can be used as single steps or in combination to construct a MSA, and can also be further optimised by iteration. PRALINE is a well characterised alignment method (Heringa, 1999, 2000, 2002) and has recently been parallelised to minimise its processing time when aligning large datasets (Kleinjung et al., 2002). 2. Profile pre-processing The profile pre-processing philosophy (Heringa, 1999) is to use information from other, related sequences in the sequence set to be aligned. In combination with position-specific gap penalties, it allows increased matching of distant sequences and likely placement of gaps outside un-gapped core regions during progressive alignment (Heringa 1999, 2002). Initially, a score is calculated for all pairs of sequences, representing their degree of similarity. This similarity score is calculated for each pair of sequences by performing pairwise global or local alignments for the global and local strategies, respectively. A global pre-alignment is then created for each sequence, which only includes sequences that have a similarity score higher than a user-specified threshold. Consequently, the pre-alignment only contains sequences that are as closely related as the user requires, which leads to an increase in the information each sequence carries into the MSA and at the same time minimises the incorporation of misleading input arising from incorrect alignment (Heringa 1999, 2002). Each pre-alignment is then converted to a global or local pre-profile, according to the strategy used, which represents each of the original sequences for the final MSA. In effect, the final MSA is no longer an alignment of a set of sequences, but rather a set of profiles that contain more useful information relative to each individual sequence. During the profile pre-processing strategies, sequences can be included in more than one pre-alignment depending on whether their similarity score overcomes the preset threshold. The consistency with which they align in all pre-alignments is used to generate a reliability measure for each aligned residue (Heringa, 1999). The more consistent an aligned position is, the higher its reliability score. Since each of the pre-processed profiles can contain information about their related sequences, except if they are distant outliers and do not have other sequences in their pre-profile and visa versa, each sequence in the final alignment can be assessed in terms of the degree of consistency reached across the pre-profiles, which is translated in a reliability score for each amino acid in the final MSA. 3. Iteration In addition, the consistency of pre-processed profiles can be used to optimise the alignment through iteration by keeping the consistent pre-profile positions and realigning the inconsistent segments. Iteration is guided by these obtained scores, which are used as weights in the construction of alignments during the next MSA step (Heringa, 1999, 2002). From the resulting set of iterative alignments, the one with the highest cumulative score over all pairwise matched amino acids in the alignment can be selected as a safeguard to prevent alignments from wandering away to less optimal areas in the alignment space (Heringa, 2002). 4. Secondary structure-guided MSA The conservation of secondary structure elements across related sequences is usually much higher than that of single residues ( structure is more conserved than sequence, Clothia and Lesk, 1986; Sander and Schneider, 1991; Rost, 1999). Therefore, in an alignment of related sequences, the secondary structure elements should align in the same regions. By taking into consideration the secondary structure identity of each sequence position we apply a local weight to the global alignment keeping secondary structure element regions ungapped. The algorithm proceeds by initially constructing a MSA without information about the corresponding secondary structure. If the structure of a sequence is known, i.e. it has a PDB entry (Berman et al., 2000), its secondary structure is determined using DSSP (Kabsch and Sander, 1983), otherwise the secondary structure is predicted by the PREDATOR (Frishman and Argos, 1996, 1997) or the widely-used PHD method (Rost and Sander, 1993), although in principle prediction could be done by any available secondary structure prediction method, and a new alignment is constructed, now using the corresponding secondary structure. The initial alignment is constructed using a default residue exchange matrix (e.g. the BLOSUM62 matrix) and related gap penalties (Henikoff and Henikoff, 1992). After secondary structure prediction, resulting in a tentative secondary structure for each sequence or in a single secondary structure when using a single sequence-based or an MSA-reliant method, respectively, PRALINE interchangeably uses three secondary structure-specific residue exchange matrices (Lüthy et al., 1994) and associated gap penalties. The residue exchange weights for matched sequence positions with identical secondary structure states is taken from the corresponding residue exchange matrix, while matched sequence positions with non-identical secondary structure states are assigned the corresponding value from the default exchange matrix; e.g. the BLO- SUM62 matrix (Heringa, 2002). If the PHD prediction method is used, due to its dependability on the alignment structure this strategy can be iterated, each iteration producing a better MSA that is passed on to the next iteration,

3 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) guiding a more accurate secondary structure prediction, which in turn guides alignment and so on. Ultimately, as the iteration cycles are supervised convergence, divergence or limit cycle are detected and reported back to the user. In the remainder of this paper, we introduce the online server for the MSA method PRALINE. PRALINE is fully customisable and with appropriate use of its options can perform equally or better than the current leading method T-Coffee and other popular methods such as ClustalW and Dialign (Morgenstern, 1999) on benchmarking standards such as BAliBASE (Thompson et al., 1999a,b). Also, it has been shown to perform much better on specific biological examples such as the alignment of the flavodoxin family members (Heringa, 1999), where the other methods get confused. The most important aspect of PRALINE is that it allows the user to use different settings to optimise an alignment. As a result, although PRALINE still retains the automated aspect of running the program with its default settings, it allows purposeful tweaking and optimisation of alignment parameters for specific problems. 5. Online accessibility The PRALINE Server is accessible on the IBIVU website at the Free University of Amsterdam (URL: vu.nl/programs/pralinewww/) or at the mirror site on the Department of Mathematical Biology Server at the National Institute of Medical Research in London (URL: vsimoss/pralinewww/). 6. The PRALINE server The PRALINE server aims to provide both the nonspecialist as well as the specialist users with a fast and informative approach to align protein sequences. We provide Fig. 1. The PRALINE server standard user interface. (a) Text area for FASTA or PIR sequences, (b) path for uploading a FASTA or PIR file, (c) submit job for default run, (d) gap penalties and amino acid exchange weights matrix selection, (e) alignment method selection, (f) secondary structure information (no iteration at present), (g) select tree representation, (h) select user-defined colour scheme, (i) select final alignment file format.

4 514 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) online help sections for each of the different parameters PRALINE may be set with, containing background information and examples and an online documentation section describing how PRALINE uses this information The standard user interface The standard user interface is targeted mainly towards non-specialist users. The sequences to be aligned must be in FASTA (Pearson, 1999) or PIR (Barker et al., 2000) format and can either be entered manually in the text field provided (see Fig. 1a) or uploaded as a file (see Fig. 1b). PRALINE can be run using its default settings (gap opening penalty 12.0, gap extension penalty 1.0 and the amino acid substitution matrix BLOSUM62, to do a single global alignment of the sequences) or otherwise, there is a help section to describe how the gap penalties work and some example combinations for standard amino acid substitution matrices. At present, the amino acid substitution matrices available are PAM250, BLOSUM50, BLOSUM62 (Dayhoff et al., 1983), and GON250 (Gonnet et al., 1992). There is a help section to aid the choice of the ideal matrix depending on the type of sequences the user wants to align (see Fig. 1d). The PRALINE server provides three different alignment optimisation strategies: global, global with profile pre-processing and global with local profile pre-processing (see Fig. 1e). The profile pre-processing threshold values for the latter two methods are alignment-dependant and therefore, it is up to the user to decide on an optimal value. All pairwise scores are saved in a list that is available on the results page, after an initial run. This means that it would be sensible to run an alignment using a threshold value of 0, which will include all sequences, and then choose an optimal threshold value from the score list on the results page and re-run the alignment using that threshold value. If a negative threshold value x is used ( x), the threshold scores are weighted each time according to the sequence lengths; otherwise, the length is not taken into consideration. Heringa (2002) recommends a setting of 9.5 for the length dependent threshold value. Fig. 2. The PRALINE server advanced user interface. (a) Text area for FASTA or PIR sequences, (b) path for uploading a FASTA or PIR file, (c) command line for PRALINE options, (d) path for uploading user-defined amino acid exchange weights matrix, (e) select user-defined colour scheme, (f) complete PRALINE options list.

5 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) In addition, the two profile pre-processing methods also provide iteration capabilities from 0 to 10 iterations. Finally, when using a profile pre-processing method, PRA- LINE produces the alignment providing reliability scores for each amino acid position as well as an average reliability for each alignment position, at each iteration. The reliability scores are represented in the position reliability colour scheme. PRALINE can use either PREDATOR or PHD to predict the secondary structure of the input sequences, but not together. It is also possible to search the PDB (Berman et al., 2000) to find 3D structure information for the input sequences and use the DSSP derived secondary structure for the alignment. If both DSSP and a prediction method are selected, then predictions will only be done on the sequences that do not have a PDB file (see Fig. 1f). The predicted structures are represented in the secondary structure colour scheme (vide infra). PRALINE can also provide the grouping of the sequences the alignment is based on as a dendrogram representation at each iteration (see Fig. 1g). The tree is available as one of the viewing formats on the results page (vide infra). PRALINE can currently save the final alignment into a file either in MSF (Genetics Computer Group, 1993) or FASTA format for possible further use (see Fig. 1h) The advanced user interface The advanced user interface is targeted mainly towards specialist users. Similarly to the standard user interface, the input sequences must be in FASTA or PIR format and can either be entered in the text field provided or uploaded as a file (see Fig. 2a,b). Instead of selectable options, the advanced user interface has a command line so that the user can manually enter more options than provided in the standard interface (see Fig. 2c). In addition, we provide the user with the ability to use a custom amino acid substitution matrix that can be uploaded for use in the same way as an input sequence file (see Fig. 2d). A sample amino acid substitution matrix is made available for viewing in the format that PRALINE can read it in. Finally, the user can use the reference options table that has all the options currently available to PRALINE with a short description of each option and restrictions on the different combinations (see Fig. 2e) Alignment representation methods: the results page When a PRALINE job is submitted, the user is presented with a holding page that refreshes automatically and displays the results page once the job is complete. Fig. 3. The results page headers.

6 516 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) Fig. 4. The default colour schemes.

7 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) Fig. 5. The user-defined colour table (left) and alignment representation (right).

8 518 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) The results page contains various parts depending on the options selected (see Fig. 3). Firstly, if the iteration number selected is greater than 0, a subtitle informs the user which iteration cycle results are presented on the page. The alignment from each iteration cycle is presented on a different page and is accessible by the corresponding links. In addition, it informs the user of the total time taken for the process to complete, whether all the iterations were completed or whether the iterations halted due to alignment convergence or limit cycle convergence and which iteration was the last. Secondly, if profile pre-processing is selected the user has the option of viewing the profile pre-processing scores for all pairwise alignments for deriving an optimum cut-off value. Finally, if selected, there is a link to download the alignment file in the selected format (MSF or FASTA) and also view the PRALINE output raw data Colour schemes The default colour schemes are based on residue type, conservation by alignment position, reliability by alignment position and position average reliability, hydrophobicity and finally secondary structure (see Fig. 4). Each scheme has a short explanation of how to interpret the colours and also a colour reference key at the top of the alignment. The default representation is the conservation scheme. Residue specific colours have been used in accordance with the colouring scheme of ClustalX (Thompson et al., 1997) and hydrophobicity scaling has been assigned according to Eisenberg et al. (1984). The reliability colours are only available if profile pre-processing methods have been used. The secondary structure representation is in three states (H-helix, E-strand and blank-other). It is only available if secondary structure has been used to guide the alignment. Apart from the default five colour schemes, we also provide a user-defined colour scheme. The user-defined colour scheme is optional. It enables the user to select from a table of eight pre-set colours and assign any of them to one or more amino acids, in any combination desired for viewing (see Fig. 5). This is particularly useful when a specific position or a motif needs to stand out in the alignment, or if specific amino acids need to be depicted for illustrative purposes. 7. Caveats The PRALINE server has some limitations that need to be clear to the user. Firstly, PRALINE is not a DNA alignment program and does not accept DNA sequence as an input, nor does it translate it into protein. Secondly, profile pre-processing, secondary structure prediction and iterations make a huge improvement in alignment quality and information feedback, but can make PRALINE slow, albeit a parallelised version has been made available (Kleinjung et al., 2002). Finally, all alignment methods will produce some sort of alignment whether biologically meaningful or not. However, the ability to manually optimise parameters and the position reliability scores provided by PRALINE allow the user to make a reasonable assessment of the alignment quality and choose the best resulting alignment. 8. Concluding remarks The PRALINE server offers some unique features that make it a versatile and useful alignment tool. It provides the user with feedback about the quality of the alignment produced in an iterative scenario and in addition enables the user to use this information to optimise the alignment by having fully customisable parameters. Another feature is that it provides more than one alignment strategy and can use secondary structure input, thus covering a wide range of alignment cases. In addition, the multiple representations of the alignment offer a convenient and diverse way for alignment illustration according to the users needs. Apart from being an accurate method, the PRALINE Server is a toolbox for protein sequence alignment that gives users the opportunity to learn more about their alignment problem, the means to find a best possible solution and present it in more detailed and educational form. Acknowledgements This project was funded by the generous contributions of the Medical Research Council and the Free University Amsterdam. References Barker, W.C., Garavelli, J.S., Huang, H., McGarvey, P.B., Orcutt, B.C., Srinivasarao, G.Y., Xiao, C., Yeh, L.S., Ledley, R.S., Janda, J.F., Pfeiffer, F., Mewes, H.W., Tsugita, A., Wu, C., The Protein Information Resource (PIR). Nucleic Acids Res. 28, Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E., The Protein Data Bank. Nucleic Acids Res. 28, Clothia, C., Lesk, A.M., The relationship between the divergence of sequence and structure in proteins. EMBO J. 5, Dayhoff, M.O., Barker, W.C., Hunt, L.T., Establishing homologies in protein sequences. Methods Enzymol. 91, Eisenberg, D., Schwarz, E., Komaromy, M., Wall, R., Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179 (1), Feng, D.F., Doolittle, R.F., Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, Frishman, D., Argos, P., Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 9 (2), Frishman, D., Argos, P., Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27 (3), Genetics Computer Group, Program manual for the GCG package, version 8, 575 Science Drive, Madison, WI.

9 V.A. Simossis, J. Heringa / Computational Biology and Chemistry 27 (2003) Gonnet, G.H., Cohen, M.A., Benner, S.A., Exhaustive matching of the entire protein sequence database. Science 256 (5062), Henikoff, S., Henikoff, J.G., Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, Heringa, J., Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput. Chem. 23, Heringa, J., Computational methods for protein secondary structure prediction using multiple sequence alignment. Curr. Protein Pept. Sci. 1, Heringa, J., Local weighting schemes for protein multiple sequence alignments. Comput. Chem. 26, Heringa, J., Frishman, D., Argos, P., Computational Methods Relating Sequence and Structure: in Protein: A Comprehensive Treatise, Ch. 4, vol. 1. JAI Press Inc, Greenwich, CT, pp Hogeweg, P., Hesper, B., The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20, Holmes, I., Using guide trees to construct multiple-sequence evolutionary HMMs. Bioinformatics 19 (Suppl. 1), I147 I157. Kabsch, W., Sander, C., A dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22, Kleinjung, J., Douglas, N., Heringa, J., Parallelized multiple alignment. Bioinformatics 18 (9), Lüthy, R., Xenarios, I., Bucher, P., Improving the sensitivity of the sequence profile method. Protein Sci. 3, Morgenstern, B., DIALIGN 2: improvement of the segmentto-segment approach to multiple sequence alignment. Bioinformatics 15, Notredame, C., Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3 (1), Notredame, C., Higgins, D.G., Heringa, J., T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, Pearson, W.R., Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132, Rost, B., Twilight zone of protein sequence alignment. Protein Eng. 12, Rost, B., Sander, C., Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, Sander, C., Schneider, R., Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins: Struct. Funct. Genet. 9, Simossis V.A., Kleinjung J., Heringa, J., 2003 An overview of Multiple Sequence Alignment in: Current Protocols in Bioinformatics, Wiley & Sons Inc., in press. Thompson, J.D., Higgins, D.G., Gibson, T.J., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties, and weight matrix choices. Nucleic Acids Res. 22 (22), Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24, Thompson, J.D., Plewniak, F., Poch, O., 1999a. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27 (13), Thompson, J.D., Plewniak, F., Poch, O., 1999b. BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15,

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Chapter 11 Multiple sequence alignment

Chapter 11 Multiple sequence alignment Chapter 11 Multiple sequence alignment Burkhard Morgenstern 1. INTRODUCTION Sequence alignment is of crucial importance for all aspects of biological sequence analysis. Virtually all methods of nucleic

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

A profile-based protein sequence alignment algorithm for a domain clustering database

A profile-based protein sequence alignment algorithm for a domain clustering database A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing

More information

A New Similarity Measure among Protein Sequences

A New Similarity Measure among Protein Sequences A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Substitution matrices

Substitution matrices Introduction to Bioinformatics Substitution matrices Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Optimization of a New Score Function for the Detection of Remote Homologs

Optimization of a New Score Function for the Detection of Remote Homologs PROTEINS: Structure, Function, and Genetics 41:498 503 (2000) Optimization of a New Score Function for the Detection of Remote Homologs Maricel Kann, 1 Bin Qian, 2 and Richard A. Goldstein 1,2 * 1 Department

More information

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

More information

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Multiple Sequence Alignment: A Critical Comparison of Four Popular Programs

Multiple Sequence Alignment: A Critical Comparison of Four Popular Programs Multiple Sequence Alignment: A Critical Comparison of Four Popular Programs Shirley Sutton, Biochemistry 218 Final Project, March 14, 2008 Introduction For both the computational biologist and the research

More information

Optimization of the Sliding Window Size for Protein Structure Prediction

Optimization of the Sliding Window Size for Protein Structure Prediction Optimization of the Sliding Window Size for Protein Structure Prediction Ke Chen* 1, Lukasz Kurgan 1 and Jishou Ruan 2 1 University of Alberta, Department of Electrical and Computer Engineering, Edmonton,

More information

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

Introducing Hippy: A visualization tool for understanding the α-helix pair interface Introducing Hippy: A visualization tool for understanding the α-helix pair interface Robert Fraser and Janice Glasgow School of Computing, Queen s University, Kingston ON, Canada, K7L3N6 {robert,janice}@cs.queensu.ca

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

More information

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

Copyright 2000 N. AYDIN. All rights reserved. 1

Copyright 2000 N. AYDIN. All rights reserved. 1 Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

ChemAlign: Biologically Relevant Multiple Sequence Alignment Using Physicochemical Properties

ChemAlign: Biologically Relevant Multiple Sequence Alignment Using Physicochemical Properties Brigham Young University BYU ScholarsArchive All Faculty Publications 2009-11-01 ChemAlign: Biologically Relevant Multiple Sequence Alignment Using Physicochemical Properties Hyrum Carroll hyrumcarroll@gmail.com

More information

Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices. Shifra Ben-Dor Irit Orr Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

E-SICT: An Efficient Similarity and Identity Matrix Calculating Tool

E-SICT: An Efficient Similarity and Identity Matrix Calculating Tool 2014, TextRoad Publication ISSN: 2090-4274 Journal of Applied Environmental and Biological Sciences www.textroad.com E-SICT: An Efficient Similarity and Identity Matrix Calculating Tool Muhammad Tariq

More information

Statistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences

Statistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. The fully-formatted PDF version will become available shortly after the date of publication, from the

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

Probalign: Multiple sequence alignment using partition function posterior probabilities

Probalign: Multiple sequence alignment using partition function posterior probabilities Sequence Analysis Probalign: Multiple sequence alignment using partition function posterior probabilities Usman Roshan 1* and Dennis R. Livesay 2 1 Department of Computer Science, New Jersey Institute

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Better Bond Angles in the Protein Data Bank

Better Bond Angles in the Protein Data Bank Better Bond Angles in the Protein Data Bank C.J. Robinson and D.B. Skillicorn School of Computing Queen s University {robinson,skill}@cs.queensu.ca Abstract The Protein Data Bank (PDB) contains, at least

More information

Overview Multiple Sequence Alignment

Overview Multiple Sequence Alignment Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments

More information

Protein Secondary Structure Prediction

Protein Secondary Structure Prediction Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms

More information

Local Alignment Statistics

Local Alignment Statistics Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison

More information

Do Aligned Sequences Share the Same Fold?

Do Aligned Sequences Share the Same Fold? J. Mol. Biol. (1997) 273, 355±368 Do Aligned Sequences Share the Same Fold? Ruben A. Abagyan* and Serge Batalov The Skirball Institute of Biomolecular Medicine Biochemistry Department NYU Medical Center

More information

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17: Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Ch. 9 Multiple Sequence Alignment (MSA)

Ch. 9 Multiple Sequence Alignment (MSA) Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Biologically significant sequence alignments using Boltzmann probabilities

Biologically significant sequence alignments using Boltzmann probabilities Biologically significant sequence alignments using Boltzmann probabilities P. Clote Department of Biology, Boston College Gasson Hall 416, Chestnut Hill MA 02467 clote@bc.edu May 7, 2003 Abstract In this

More information

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins J. Baussand, C. Deremble, A. Carbone Analytical Genomics Laboratoire d Immuno-Biologie Cellulaire

More information

BIOINFORMATICS ORIGINAL PAPER doi: /bioinformatics/btm017

BIOINFORMATICS ORIGINAL PAPER doi: /bioinformatics/btm017 Vol. 23 no. 7 2007, pages 802 808 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm017 Sequence analysis PROMALS: towards accurate multiple sequence alignments of distantly related proteins

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Protein sequence alignment with family-specific amino acid similarity matrices

Protein sequence alignment with family-specific amino acid similarity matrices TECHNICAL NOTE Open Access Protein sequence alignment with family-specific amino acid similarity matrices Igor B Kuznetsov Abstract Background: Alignment of amino acid sequences by means of dynamic programming

More information

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB) Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

PROMALS3D web server for accurate multiple protein sequence and structure alignments

PROMALS3D web server for accurate multiple protein sequence and structure alignments W30 W34 Nucleic Acids Research, 2008, Vol. 36, Web Server issue Published online 24 May 2008 doi:10.1093/nar/gkn322 PROMALS3D web server for accurate multiple protein sequence and structure alignments

More information

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach Shirley Hui and Forbes J. Burkowski University of Waterloo, 200 University Avenue W., Waterloo, Canada ABSTRACT A topic

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

AlignmentsGrow,SecondaryStructurePrediction Improves

AlignmentsGrow,SecondaryStructurePrediction Improves PROTEINS: Structure, Function, and Genetics 46:197 205 (2002) AlignmentsGrow,SecondaryStructurePrediction Improves DariuszPrzybylskiandBurkhardRost* DepartmentofBiochemistryandMolecularBiophysics,ColumbiaUniversity,NewYork,NewYork

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Irit Orr Shifra Ben-Dor An example of Multiple Alignment VTISCTGSSSNIGAG-NHVKWYQQLPGQLPG VTISCTGTSSNIGS--ITVNWYQQLPGQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

Protein function prediction based on sequence analysis

Protein function prediction based on sequence analysis Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University Measures of Sequence Similarity Alignment with dot

More information

Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing

Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing Evaluation Measures of Multiple Sequence Alignments Gaston H. Gonnet, *Chantal Korostensky and Steve Benner Institute for Scientic Computing ETH Zurich, 8092 Zuerich, Switzerland phone: ++41 1 632 74 79

More information

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins Introduction: Benjamin Cooper, The Pennsylvania State University Advisor: Dr. Hugh Nicolas, Biomedical Initiative, Carnegie

More information

Similarity searching summary (2)

Similarity searching summary (2) Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

More information

Pairwise sequence alignments

Pairwise sequence alignments Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October

More information

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.

More information

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel ) Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Sequence Analysis '17- lecture 8. Multiple sequence alignment Sequence Analysis '17- lecture 8 Multiple sequence alignment Ex5 explanation How many random database search scores have e-values 10? (Answer: 10!) Why? e-value of x = m*p(s x), where m is the database

More information

Segment-based scores for pairwise and multiple sequence alignments

Segment-based scores for pairwise and multiple sequence alignments From: ISMB-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Segment-based scores for pairwise and multiple sequence alignments Burkhard Morgenstern 1,*, William R. Atchley 2, Klaus

More information

Introduction to Structural Bioinformatics

Introduction to Structural Bioinformatics arxiv:1801.09442v1 [q-bio.bm] 29 Jan 2018 Introduction to Structural Bioinformatics K. Anton Feenstra Sanne Abeln Centre for Integrative Bioinformatics (IBIVU), and Department of Computer Science, Vrije

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Variable-length Intervals in Homology Search

Variable-length Intervals in Homology Search Variable-length Intervals in Homology Search Abhijit Chattaraj Hugh E. Williams School of Computer Science and Information Technology RMIT University, GPO Box 2476V Melbourne, Australia {abhijit,hugh}@cs.rmit.edu.au

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein Structures: Experiments and Modeling. Patrice Koehl Protein Structures: Experiments and Modeling Patrice Koehl Structural Bioinformatics: Proteins Proteins: Sources of Structure Information Proteins: Homology Modeling Proteins: Ab initio prediction Proteins:

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level.

2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level. 2 Spial Chapter Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Spial Quorum sensing Chemogenomics Descriptor relationships Introduction Conclusions and perspectives Atomic level Pathway level Proteome

More information