Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis
|
|
- Ashlynn Atkinson
- 5 years ago
- Views:
Transcription
1 Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis Kumud Joseph Kujur, Sumit Pal Singh, O.P. Vyas, Ruchir Bhatia, Varun Singh* Indian Institute of Information Technology - Allahabad, India * rickky27@rediffmail.com ABSTRACT There exists various techniques/ algorithms for pairwise sequence alignment, multiple sequence alignment and entropy. The technical paper presented here is a consequence of study and implementation of these techniques on DNA to Protein Translation, Sequence Alignment and Comparison, Multiple Sequence Alignment and entropy. The DNA to Protein Translation is performed by detecting open reading frame (ORF) while taking a DNA coding sequence(cds) as an input. This sequence is then converted into aminoacids taking 3 nucleotides(codons) at a time. Each codon specifies an amino acid. 3 frames for this sequence are considered by shifting one position and then taking the codons. The other 3 frames are for the complementary sequence. These codons are checked for start and stop codons which mark the possible protein, longest of which gives the final protein. In pairwise sequence alignment and comparison, the input query sequence which is a primary protein sequence is compared to the various subject sequences that exist in the database. Local alignment and global alignment are the techniques used for sequence alignment. For local alignment Smith Waterman algorithm and for Global alignment, Needleman Wunsch algorithms were implemented. Various scoring techniques can be used including PAM and BLOSUM. Both the above techniques for comparison and alignment have been implemented. Both the techniques are discussed in details later. Pairwise comparison is fundamental to sequence analysis. However, analysis of groups of sequences that form gene families requires the ability to make connections between more than two members of the group, in order to reveal subtle conserved family characteristics. The process of multiple alignments can be regarded as an exercise in enhancing the signal-to-noise ratio with a set of sequences, which ultimately facilitates the elucidation of biologically significant motifs. Entropy analysis is done to detect the coding and noncoding regions in a DNA sequence. Sliding window method and recursive segmentation method are applied to calculate the Jensen- Shannon Divergence. This helps in distinguishing the homogeneous segments in a heterogeneous DNA sequence. Keywords: Pairwise alignment, Multiple alignment, Phylogeny, Entropy, DNA sequences. PROBLEM DEFINITION AND SOLUTION 1. DNA TO PROTEIN TRANSLATION
2 The conversion of a DNA to Proteins is carried out using Replication, Transcription and Translation. Transcription Translation After transcription, the next process is to remove the garbage or the unnecessary parts called the introns so that the useful part i.e. the exons get concatenated. This is the Coding sequence (CDS) After getting the CDS we start from the first nucleotide and form triplets(codon) till the end. Given a CDS, and knowing the genetic code, it is possible to translate the DNA into protein by looking up successive codons in a genetic code table. However this is only one case, the other cases being when we shift the starting nucleotide by one and two and similarly for its complement. Thus we would have six frames in all, that give us six different options for proteins, but the correct protein is the one which has the longest length out of this six frames starting from a start and ending at the stop codon. [1] Detecting open reading frames: ORF is normally deemed to be the longest frame uninterrupted by a stop codon. Finding an end of the ORF is easier than finding its beginning. Usually, the initial codon in the CDS is that for methionine (ATG); but methionine is also a common residue within the CDS, so its presence is not an absolute indicator of ORF initiation. Several features may be used as indicators of potential protein coding regions in DNA. One of these is sufficient ORF length (based on the premise that long ORFs rarely occur by chance). Recognition of flanking Kozak sequences (CCGCCATGG) may also be helpful in pinpointing the start of the CDS. [1] Understanding the effect of introns and exons: The genes of eukaryotic are connected by regions that contribute towards the CDS, known as exons, and those do not, known as introns. Once consequence of the presence of exons and introns in eukaryotic genes is that potential gene products can be of different length, because not all exons may be represented in the final transcribed mrna. The whole process involved in the DNA-to-Protein translation can be described as below: Query sequence (DNA): ACATGAGTCGTACGTAGCTGACTGATCGT Six frame Amino-acid translation: Forward 0:T#VVRS#LI Forward 1:HESYVAD#S Forward 2: *SRT#LTDR Reverse 0:TISQLRTTH Reverse 1:RSVSYVRL* Reverse 2:DQSATYDSC The start codon is represented by * and the stop codon by #. Hence from the above six-frames we can conclude that the protein is generated from the forward 2 translation and the possible protein is : *SRT# with length 5. After getting this possible protein, the technique of alignment and comparison is applied on it with respect to other protein sequences present in the databases. If we get an identical match or high
3 similarity, then protein produced is related to a known gene family. And if not, the protein sequence transcribed shows the existence of some new gene family. 2. PAIRWISE SEQUENCE ALLIGNMENT Pair-wise alignment is a fundamental process in sequence analysis, carried out to find the relationship based on sequence properties of any two sequences, may be protein, DNA or RNA. This section describes the comparison of two sequences, a query sequence (the properties of which need to be determined) and a subject sequence (the properties of which are already known) by searching the series of individual characters or character patterns that are in the same order in these sequences. This further helps in the identifying any similarity (similar in functionality) or evolutionary (homology) relationship existing between the query sequence and a family of known genes. We need to calculate the correct alignment as it is required to find which segment of the gene is altered(may be in the form of point mutation, insertion, deletion, duplication etc). There are two types of sequence alignment techniques, local and global. In global alignment, an attempt is made to align the entire sequence, using as many characters as possible, up to both ends of each sequence. Sequences that are quite similar and approximately the same length are suitable candidates for global alignment. In local alignment, stretches of sequences with the highest density of matches are aligned, thus generating one or more islands of matches or sub alignments in the aligned sequence. Local alignments are more suitable for aligning sequences that are similar along some of their lengths but dissimilar in others, sequences that differ in length, or sequences that share a conserved region or domain. The first deals with similarity across the entire length of the sequences and the second one on the regions of similarity in the parts of the sequences (subsequences). It is important to understand the difference between these two alignments as the sequences are not uniformly similar or identical. Thus there is no point in performing a global alignment that just has local similarity. Let us now discuss each of these techniques in details. SIMILARITY AND IDENTITY As we know that not only the identity but also the similarity is biologically significant. Many of the amino acids can be replaced or substituted by another one of same chemical properties and the substituted amino acid remains compatible with protein structure and function. Hence we can also take into account the different scoring matrices (for eg. PAM or BLOSUM). These scoring matrices provide different scores to all the matches depending on similarity/dissimilarity. Some scoring matrices are superior to others at finding related proteins based on either sequences or structures. BLOSUM matrices take into account the full range of amino acid substitution in families of related families. The other matrix, PAM s are based on variation in closely related proteins that are extrapolated to produce matrices for more distantly related proteins. GLOBAL ALIGNMENT: As already discussed this alignment technique takes care of finding the similarity across full length. The algorithm used here is Needleman-Wunsch algorithm (name given after the scientists who proposed the algorithm) based on the theory of dynamic programming. The whole algorithm consists of mainly three parts : Here for example we have consider two sequences 1) We form a matrix representation of the given query sequence and the subject sequence by placing them along the margins of the matrix. This matrix is a unitary matrix, that weights identical elements with value 1 and the rest with value 0. We can score them according to the scoring matrix also.
4 Table: Initial setup for Needleman-Wunsch 2) The next step is to trace a score to all the pathways. Here we start from the bottom right and end up at the upper left of the matrix. One can trace it in vice-versa fashion also. The scoring(matrix fill up process) is done as: M(i,j) = M(i,j) + max [ M(k,j+1), M(i+1,l)] Where, k is an integer greater than i l is an integer greater than j Table: Half way through the second step 3) The final step is to trace back the whole path. The trace back starts from the highest value (in this case the top leftmost element). The alignment is traced
5 proceeding left to right, top to bottom choosing the largest numbers available. Table: Trace the alignment LOCAL ALIGNMENT: The Needleman-Wunsch algorithm works well for sequences that show similarity across the full length. However the sequences that are distantly related to each other might show small regions of local similarity rather than across the full length. The Smith-Waterman algorithm takes handles this problem quite efficiently. It follows the same initial matrix based technique as used in the case of Needleman-Wunsch algorithm. The main difference between these two algorithm lies on the point that, in Smith-Waterman case, each element in the matrix defines the end point of the potential alignment(any element of the matrix can have the highest value not necessarily the the terminal end). Only minimal changes to the Needleman-Wunsch algorithm are required. These are: 1) A negative score/weight must be given to the mismatches, if any negative score would result, then zero is substituted. Score at any matrix point is given by Sij = max { Si-1,j + s(aibj), max ( Si-x,j - Wx), x>=1 max (Si,j-y - Wy) y>=1 } where Sij is the score at position i in sequence A and position j in sequence B, s(aibj) is the score for aligning the characters at position i and j, Wx is the gap penalty for a gap of length x in sequence A and Wy is the penalty for a gap of length y in sequence B. 2) As explained above, the beginning and the end of an optimal path may be found anywhere in the matrix and not only the endpoints. In this example penalty for mismatch is 0.5 and gap penalty is 0.
6 Table: Smith-Waterman example GAP AND GAP PENALTY The inclusion of gaps and gap penalty is necessary in order to obtain the best optimal alignment between any two sequences. The gaps are the result of the changes (mutation) occurring in the particular sequence during evolution. So our job is to allow the gaps in right position of the sequence to get a meaningful result. A gap penalty is the combination of both the gap opening penalty and the gap extension penalty. The summarization of the gap penalty in a sequence can be given as: W(penalty) = (No. of gaps originated)*g(opening penalty) + (gap length)*g(extension penalty) The values of these penalties are chosen in such a way that it shouldn t disturb the overall balance. If gap penalty is too high as compared to matrix scores the gaps will never appear in the alignments. On the other hand if the gap penalty is too low as compared to matrix scores, gaps will appear everywhere in the alignment in order to align as many of the same characters as possible. 3. MULTIPLE SEQUENCE ALIGNMENT Pairwise comparison is fundamental to sequence analysis. However, analysis of groups of sequences that form gene families requires the ability to make connections between more than two members of the group, in order to reveal subtle conserved family characteristics. The process of multiple alignments can be regarded as an exercise in enhancing the signal-to-noise ratio with a set of sequences, which ultimately facilitates the elucidation of biologically significant motifs. The goal of multiple sequence alignment is to generate a concise, information-rich summary of sequence data in order to inform decision-making on the relatedness of the sequences to a gene family. Sometimes, indeed, multiple alignments may be used to express the dissimilarity between a set of sequences. Alignments should be regarded as models that can be used to test the hypothesis. As in pairwise alignment, there is nothing inherently correct, or incorrect, about any particular pairwise alignment, the same maxim holds for multiple alignments.[2]
7 Definition of multiple sequence alignment: Here, a small alignment of 5 short sequences (I- V) is presented. The sequences have been arranged so that the most similar residues are brought into vertical register, through the use of gaps, while the order of residues in each sequence is preserved. MULTIPLE SEQUENCE ALIGNMENTS -- USES Just as the alignment of the pair of nucleic acid or protein sequences can reveal whether or not there is an evolutionary relationship between the sequences, so can the alignment of three or more sequences reveal relationships among multiple sequences. Multiple sequence alignments of a set of sequences can provide information as to the most alike regions in the set. In proteins, such regions may represent conserved functional or structural domains. If the structure of one or more members of the alignment is known, it may be possible to predict which amino acids occupy the same spatial relationship in other proteins in the alignment. In nucleic acids, such alignments also reveal structural and functional relationships. For example, aligned promoters of a set of similarly regulated genes may reveal consensus binding sites for regulatory proteins. Consensus Another use for consensus information retrieved from a multiple sequence alignment is for the prediction of specific probes for other members of the same group or family of similar sequences in the same or other organisms. There are both computer and molecular biological applications. Once a consensus pattern has been found, database searching programs may be used to find other sequences with a similar pattern. MULTIPLE SEQUENCE ALIGNMENT (MSA) TO PHYLOGENETIC ANALYSIS -- RELATIONSHIP Once the MSA has been found, the number or types of changes in the aligned sequence residues may be used for a phylogenetic analysis. The alignment provides a prediction as to which sequence characters correspond. Each column in the alignment predicts the mutation that occurs at one side during the evolution of the sequence family as illustrated in figure 1. Within the column are original characters that were present early, as well as other derived characters that appeared later in evolutionary time. In some cases the position is so important for function that mutational changes are not observed. It is these conserved positions that are
8 useful for producing an alignment. In other cases, the position is less important, and substitutions are observed. Deletions and insertions may also be present in some regions of the alignment. Thus, starting with the alignment, one can hope to dissect the order of appearance of the sequences during evolution. Seq A A. N Q P Seq B A. N -- P Seq C A R Y Q P Seq D A. Y Q P Figure 1: The close relationship between MSA and evolutionary tree construction. Shown is a short section of one MSA of four protein sequences including conserved and substituted positions, insertion (of R) and a deletion (of Q). PROGRESSIVE GLOBAL ALIGNMENT Pairwise alignment technique can be extended for aligning multiple sequences. But the number of sequences that can be aligned is limited because the number of computational steps and the amount of memory required grow exponentially with the number of sequences to be analyzed. Progressive alignment is the most commonly used method to align biological sequences. This heuristic approach is very rapid, requires low memory space and offers good performance on relatively well-conserved, homologous sequences. Description of progressive alignment methods: Progressive alignment consists of building a multiple alignment using pair wise alignments in three steps: a) Compute the alignment scores (or distances) between all pairs of sequences. b) Build a guide tree that reflects the similarities between sequences, using the pair wise alignment distances (as in Figure 1). c) Align the sequences following the guide tree. Corresponding to each node in the tree, the alignment aligns the two sequences or alignments that are associated with its daughter nodes. The process is repeated beginning from the tree leaves (the sequences) and ending with tree root. The problem with this progressive alignment stems from the greedy nature of the algorithm: any mistake that appears during early alignments cannot be corrected later as new sequence information is added.
9 4. SIGNIFICANCE OF SEQUENCE ALIGNMENT Sequence alignment is useful for discovering functional, structural, and evolutionary information in biological sequences. We have to get the best possible (optimal) alignment to discover this information. Sequences which are very much similar or alike, probably have the same functions, be it in some regulatory role in the case of similar DNA molecules, or a similar biochemical function and three dimensional structure in case of protein. In addition, if two sequences from different organisms are similar, then there is a possibility of having a common ancestor shared by these sequences. In this case the sequences are defined as homologous. The alignment technique indicates the changes (mutations) that could have occurred between the two homologous sequences and a common ancestor sequence during evolution. Hence one can easily find out the whereabouts of a new sequence that is occurred from these mutational changes. In other cases, similar regions in sequences may not have a common ancestor but might have arisen independently by two evolutionary pathways converging on the same function, called convergent evolution. 5. ENTROPY ANALYSIS The entropy is measured in linear time as the number of distinctive segments occurring in the regions. It helps in locating out the range of borders between coding and non-coding regions of any gene The entropic segmentation process partitions a heterogeneous DNA sequence into homogeneous subsequences, which we term compositional domains. If we accept a domain picture of DNA sequences, it is natural to design computational approaches that segment a DNA sequence into homogeneous domains, and computer algorithms that accomplish such a segmentation are commonly called segmentation algorithms. Two well-known examples of segmentation algorithms are the one based on hidden Markov model by Churchill and walking Markov model algorithm by Fickett et al. (1992). In the biology community, however, most people still use the old-fashioned moving window approach. One advantage of the widely-used sliding-window methods is that their implementation is straightforward: one calculates the density of a sequence feature of interest within a window, moves the window along the sequence, and recalculates the density. However, the choice of the window size and the moving distance are, in general, arbitrary. If the window size is too large, local fluctuations that contain significant biological information may be averaged out. If the moving distance is too long, one domain can be split between two windows and its distinctive feature may not be revealed. There are also some other drawbacks of moving window approach. Another approach for detection of coding and non-coding borders is Recursive segmentation.[5] Detection of coding-noncoding borders The coding potential measurement is obtained from within a coding or non-coding region (as versus from their borders). Such measurement can either be learned from the data or can be based on a known biological knowledge. However, the current biological knowledge about coding potential is still mainly limited to that of the codon structure. The fact that coding regions, and not the noncoding region, consists of three-base unit, plus the fact that these units are not used with equal probability, provides a strong signal for coding potential.[4]
10 Jensen Shannon divergence was implemented to do the entropy analysis. Jensen Shannon Divergence using Sliding Window Method For a given DNA sequence of length N, our code calculates Jensen Shannon divergence for multiplicities of step. First it determines the total number of purines (A & G) and pyrimidines (C & T) in our query DNA sequence, then calculates the entropy for the whole sequence i.e. H(W) Entropy (H) = -p*log2(p) - q*log2(q) [4] Where p = pur/size and q = pyr/size LOG2(q)=logq/log2 Second it computes Jensen Shannon divergence at segments U = 1,,i*step and V = i+step+1,.,n Where U is the left segment and V the right segment Move from left to right in steps Determine number of purines and pyrimidines in the left (U) and right (V) segment and calculate entropy for each one of them i.e. H(U) and H(V) Then calculate the divergence for that step by the given formula Divergence = HW -nu/n * HU - nv/n * HV nu is the length of left sequence and nv of right sequence, n is the length of whole sequence. High divergence indicates that the left and right segments are more homogeneous with respect to themselves than with respect to the whole. Example: Consider that the step size for calculating Jensen Shannon divergence is 20. now the results for a query sequence are given below Query Sequence: TCCATTGAGCCTTATACCAGTAACATCTACACTCGAAGATCTTGTCAGGGGAATTTCAGATTG TGAATCCTCACTTACTGAAAGATCTTACTGAGCGGGG FOR THE WHOLE SEQUENCE Purine: 49 Purimidine: 51 Genome Length: 100 Entropy H(W): AFTER STEP 1 Length of left segment(nu) = 20 Length of right segment(nv) = 80 Purines in U = 8 Pyrimidines in U = 12 Purines in V = 41 Pyrimidines in V = 39 DIVERGENCE = AFTER STEP 2 Length of left segment(nu) = 40 Length of right segment(nv) = 60 Purines in U = 18 Pyrimidines in U = 22 Purines in V = 31 Pyrimidines in V = 29
11 DIVERGENCE = AFTER STEP 3 Length of left segment(nu) = 60 Length of right segment(nv) = 40 Purines in U = 29 Pyrimidines in U = 31 Purines in V = 20 Pyrimidines in V = 20 DIVERGENCE = AFTER STEP 4 Length of left segment(nu) = 80 Length of right segment(nv) = 20 Purines in U = 3 Pyrimidines in U = 44 Purines in V = 13 Pyrimidines in V = 7 DIVERGENCE = Jensen Shannon Divergence using Recursive Segmentation For a sequence of length N, we calculate at each position i (0<i<N) the entropy HW of the whole sequence, the entropy HU of the subsequence on the left side of the partition point, and the entropy HV of the subsequence on the right side of the partition point, then calculate Shannon divergence. As a measure of the heterogeneity of the sequence we choose the maximized Jensen-Shannon divergence say maxdjs. If this is large enough, we say that the sequence is heterogeneous and should be segmented. We recursively apply the same procedure to both the left and the right subsequence, as long as maxdjs falls below that given threshold, the recursion along the current path is stopped. This recursive segmentation procedure is very similar to the procedure of growing a binary tree. When the segmentation is continued, two branches of the tree are generated; if it is stopped, that branch becomes a leaf.[5] CONCLUSION AND DISCUSSION The paper mainly talks about the implementation aspect of basic existing algorithms and techniques used in sequence analysis such as Needleman-Wunsch, Smith-Waterman, progressive methods for MSA, Jensen-Shannon divergence, entroypy, translation of DNA into protein. The Needlman-Wunsch and Smith-Waterman algorithm (examples of Dynamic programming) are considered to be highly accurate in finding out the optimal output but are relatively slow when large chunks of data are taken into account. Dynamic programming has time and space complexity of O(n^2) in the case of pairwise alignment, where n is the total length of the sequence. If we generalize this algorithm for multiple sequence alignments, then we are adding an extra dimension for each new sequence. Thus, the complexity becomes O(n^d) where d accounts for the number of sequences being added up. New methods comprising of dynamic programming, added with a heuristic approach, are coming into the picture. These are much faster than the existing basic algorithms. The topic of entropy is an open area and research works are still going on in order to get accurate results for coding and non-coding regions. Recursive segmentation is an interesting alternative to the traditional moving window approach. Admittedly, the moving window approach is simple, fast (O(N) computational complexity vs. the O(N log(n)) complexity for recursions), and usually provides an answer to questions of interest to investigators. Nevertheless, recursive
12 segmentation approach can be more accurate; and it also avoids the common problem in a moving window approach to select a window size and a moving distance. We suggest to use recursive segmentation as a refinement of the moving window approach, or a second-stage analysis after a rough result is obtained from moving window approach. BIBLIOGRAPHY Books 1) Introduction to bioinformatics - T K Attwood, D J Parry-Smith, Pearson Education Asia. 2) Bioinformatics: Sequence and Genome Analysis - David W Mount. 3) Developing Bioinformatics Computer Skills - Cynthia Gibbs, Per Jambeck, O Reilly Publications. Papers 4) Pedro Bernaola-Galván, Ivo Grosse, Pedro Carpena, José L. Oliver, Ramón Roldán, and H. Eugene Stanley. Finding Borders Between Coding And Non-coding DNA regions by an entropic segmentation method.(aug 1999) 5) Wentian Li, Pedro Bernaola-Galva n, Fatameh Haghighi, Ivo Grosse.Application of recursive segmentation to the analysis of DNA sequences.(nov 2001) 6) Aaron Davidson. A fast pruning algorithm for optimal sequence alignment.
Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationCopyright 2000 N. AYDIN. All rights reserved. 1
Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationIn-Depth Assessment of Local Sequence Alignment
2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationLecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models
Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary
More information5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT
5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationAlignment & BLAST. By: Hadi Mozafari KUMS
Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationBiochemistry 324 Bioinformatics. Pairwise sequence alignment
Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationA greedy, graph-based algorithm for the alignment of multiple homologous gene lists
A greedy, graph-based algorithm for the alignment of multiple homologous gene lists Jan Fostier, Sebastian Proost, Bart Dhoedt, Yvan Saeys, Piet Demeester, Yves Van de Peer, and Klaas Vandepoele Bioinformatics
More informationIMPLEMENTING HIERARCHICAL CLUSTERING METHOD FOR MULTIPLE SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION
IMPLEMENTING HIERARCHICAL CLUSTERING METHOD FOR MULTIPLE SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION Harmandeep Singh 1, Er. Rajbir Singh Associate Prof. 2, Navjot Kaur 3 1 Lala Lajpat Rai Institute
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationIntroduction to sequence alignment. Local alignment the Smith-Waterman algorithm
Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More information1.5 Sequence alignment
1.5 Sequence alignment The dramatic increase in the number of sequenced genomes and proteomes has lead to development of various bioinformatic methods and algorithms for extracting information (data mining)
More informationC E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment
C E N T R E F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Introduction to bioinformatics 2007 Lecture 5 Pair-wise Sequence Alignment Bioinformatics Nothing in Biology makes sense except in
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationBioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter
Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationMotifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.
Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationAlignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationFirst generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences
First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationMULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE
MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr
More informationMultiple Choice Review- Eukaryotic Gene Expression
Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationBLAST: Target frequencies and information content Dannie Durand
Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationSimilarity or Identity? When are molecules similar?
Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationComputational Molecular Biology (
Computational Molecular Biology (http://cmgm cmgm.stanford.edu/biochem218/) Biochemistry 218/Medical Information Sciences 231 Douglas L. Brutlag, Lee Kozar Jimmy Huang, Josh Silverman Lecture Syllabus
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationMultiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:
Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:
More informationSequence Comparison. mouse human
Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More information1. In most cases, genes code for and it is that
Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod
More informationLecture 5: September Time Complexity Analysis of Local Alignment
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 5: September 21 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More informationOrganic Chemistry Option II: Chemical Biology
Organic Chemistry Option II: Chemical Biology Recommended books: Dr Stuart Conway Department of Chemistry, Chemistry Research Laboratory, University of Oxford email: stuart.conway@chem.ox.ac.uk Teaching
More informationMarkov Chains and Hidden Markov Models. = stochastic, generative models
Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,
More informationBioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi
Bioinformatics Sequence Analysis An introduction Part 8 Mahdi Vasighi Sequence analysis Some of the earliest problems in genomics concerned how to measure similarity of DNA and protein sequences, either
More informationBioinformatics for Biologists
Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationCh. 9 Multiple Sequence Alignment (MSA)
Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -
More informationCSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004
CSE 397-497: Computational Issues in Molecular Biology Lecture 6 Spring 2004-1 - Topics for today Based on premise that algorithms we've studied are too slow: Faster method for global comparison when sequences
More informationSequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir
Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out
More informationPairwise Sequence Alignment
Introduction to Bioinformatics Pairwise Sequence Alignment Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Outline Introduction to sequence alignment pair wise sequence alignment The Dot Matrix Scoring
More informationChapter 17. From Gene to Protein. Biology Kevin Dees
Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting
More informationInferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT
Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More information