Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Save this PDF as:

Size: px
Start display at page:

Download "Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)"

Transcription

1 Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from a course by Dr N Pétrélis 1

2 Last time: Theory of Markov chains and applications in different models. Often we modeled the development over different positions in one DNA sequence (states: nucleotides, index: position in sequence). This time: Use Markov chains to model development of individual positions in a DNA/protein sequence over time (states: nucleotides/amino acids, index: time). 2

3 Objectives for today: Study models for sequence evolution; Derive good substitution matrices, ie matrices that give a realistic score to each possible substitution in a DNA or protein sequence. 3

4 Model for DNA sequence evolution Each site of the DNA sequence evolves according to a Markov chain with state space {a, c, g, t}: Each Markov chain is independent. Each Markov chain has the same transition probabilities. 4

5 Simplest model for sequence evolution: Cantor β = 1 3α. p aa p ac p ag p at p ca p cc p cg p ct p ga p gc p gg p gt p ta p tc p tg p tt = β α α α α β α α α α β α α α α β Jukes Stationary distribution: π = (0.25, 0.25, 0.25, 0.25). We need α < 1/3. The parameter α depends on the time scale. As the real time represented by one step increases, so does α. 5

6 The n step transition probabilities can be calculated: P (X n = i X 0 = i) = (1 4α)n, P (X n = j X 0 = i) = (1 4α)n, for i, j {a, c, g, t}, i j. The Jukes Cantor model is not entirely realistic (all types of substitutions are equally likely to occur). A more complicated and more realistic model is the Kimura model... 6

7 P = 1 α 2β β α β β 1 α 2β β α α β 1 α 2β β β α β 1 α 2β. We need α + 2β < 1. Stationary distribution: π = (0.25, 0.25, 0.25, 0.25). α: probability of transition (pyrimidine to pyrimidine or purine to purine). β: probability of transversion (purine to pyrimidine or pyrimidine to purine). (purine: a, g; pyrimidine: c, t) 7

8 There are even more realistic Markov models for DNA substitution (for example Hasegawa, Kishino, Yano, and many others) Most models assume that sites evolve independently (which is not entirely realistic). Some models allow different sites to evolve at different rates. Why not use more realistic models? Because they are very difficult to handle! The more complicated the model, the harder it is to compute the probabilities of interest. The more complicated the model, the more parameters we need to estimate. Simpler models seem to give sensible results. As always in mathematical modeling, we need a balance between realism and mathematical tractability. 8

9 Evolutionary models for proteins? Similar, but here it is much more important to account for the wide range of different transition probabilities associated with amino acid substitutions. See the construction of substitution matrices. 9

10 Scoring systems In bioinformatics we are very interested in comparing DNA or protein sequences. For example, in inferring molecular function by finding similarities to a sequence with a known function. For this purpose, we need a good alignment of given sequences. For this purpose, we need a measure for judging the quality of a given alignment against other possible alignments. This is a scoring system. 10

11 Additive scoring system: Look at each position of a given alignment and assign a score for the quality of the match at that position. Ignore gaps for now! The total (or cumulative) score is obtained by adding the scores for the individual positions. Example: Two DNA sequences. Score +1 for a match, 1 for a mismatch. a a g t t t c t t g a a a c t c c c t g Cumulative score: 6-4=2. 11

12 More realistic system? Score +1 for a match, -1/2 for a transition and -1 for a transversion. Cumulative score in this case: 6-2=4. Scoring matrices The scores for individual positions can be displayed in a substitution matrix (also called a scoring matrix). This is usually a symmetric 4 4 (DNA) or (protein) matrix which has as entry (i, j) the score that we assign if the nucleotides (or amino acids) i and j are aligned. 12

13 For example, with our scoring system on the previous slide, we get the scoring matrix S given by s aa s ac s ag s at s ca s cc s cg s ct s ga s gc s gg s gt s ta s tc s tg s tt = / /2 1/ /

14 A biologically sensible scoring matrix? For DNA sequences: simple scoring matrices (like the one presented) are often effective. Usually no need to worry! For protein sequences: some substitutions are clearly more likely to occur than others (presumably due to chemical properties of the amino acids), for example isoleucine for valine, serine for threonine. These are conservative substitutions. We get better alignments if we take this into account. Use scoring matrices that are derived by statistical analysis of protein data. 14

15 A biologically sensible scoring matrix for proteins Identical amino acids should be given a greater score than any substitution; Conservative substitutions should be given a greater score than non conservative ones; Different sets of values may be desired for comparing very similar sequences (eg homologies in mouse and rat) as opposed to highly divergent sequences (eg homologies in mouse and yeast). That is, we usually want our scoring matrix to take into account the evolutionary distance between our sequences. 15

16 Two frequently used approaches: 1. The PAM family of substitution matrices Uses Markov chains and phylogenetic trees (to fit an evolutionary model) and log likelihood ratios (for obtaining a scoring matrix from an estimated transition matrix). 2. The BLOSUM family of substitution matrices Uses log likelihood ratios (for obtaining a scoring matrix from a matrix of estimated substitution probabilities). 16

17 The PAM family (Dayhoff, Schwartz and Orcutt, 1978) PAM = Point (or Percentage) Accepted Mutations Accepted point mutation : a substitution of one amino acid for another that is accepted by evolution. That is, within some given species, the mutation has (over time) spread to essentially the entire species. Two types of matrices involved: the PAM Markov transition matrix (estimated transition matrix for underlying model) and the PAM substitution matrix (giving us our scores). 17

18 The underlying model: Each site in the sequence evolves according to a Markov chain, independently of the other sites. All Markov chains have the same (20 20) transition matrix P. P is estimated from protein sequence data. A PAM1 transition matrix is the Markov chain transition matrix applying for a time over which we expect 1% of the amino acids to undergo accepted point mutations. 18

19 To estimate the transition matrix: Find reliable data and align protein sequences that are at least 85% identical; Reconstruct phylogenetic trees and infer ancestral sequences; Count the amino acid replacements that occurred along the trees (count mutations accepted by natural selection); Use these counts to estimate probabilities of replacements. 19

20 Dayhoff et al (1978) use ungapped multiple alignments of well conserved regions from closely related proteins (71 groups of proteins, with 1572 changes in total). In any block, two sequences did not differ by more than 15%. We try to keep the number of sites that have encountered several changes low. These aligned regions are used to find the underlying evolutionary tree(s). We want the most parsimonious trees: those with the fewest substitutions. There may be more than one! 20

21 Why do we use trees? To avoid overcounting. Our count might be biased by closely related sequences that are overrepresented in our database. Trees give sequences that are grouped in the right way. Very similar sequences tend to succeed each other in the tree. We mainly have transitions between these sequences, and only a few transitions to other, more different, sequences, so the corresponding substitutions are not given unnatural importance. 21

22 Example Suppose we are given a block of three sequences: AA, AB and BB. There are 5 most parsimonious trees which lead to these three sequences as their leaves: We then count the number of amino acid substitutions of each type that occur in the trees... 22

23 A is substituted for B (or vice versa) twice in each of the five trees. This is an A B total (and B A total) of 10. Divide by the number of trees (5) to get the count 2. A is aligned with A a total of 15 times over the five trees. Each A A alignment gives a count of 2, so we get 30. Divide by the number of trees to get 6. Similar calculations for B B also gives 6. We can form a matrix: ( A A A B B A B B ) = ( ) 23

24 Suppose the amino acids are numbered 1 to 20. Just as in the example above, we can form a count matrix: A = A j,k is the j k count A 1,1 A 1,2 A 1,20 A 2,1 A 2,2 A 2,20.. A 20,1 A 20,2 A 20,20 In general there will be more than one block: add the counts from each block to get the final count. The count matrix A is used to estimate the transition probabilities... 24

25 For any pair (j, k) define a j,k = A j,k 20 m=1 A. j,m These are estimated probabilities. To get the transition matrix P = (p jk ) 20 20, we scale the a j,k in a certain way. Let c be a positive scaling constant and set p jk = c a j,k, j k, p jj = 1 k j c a j,k. It follows that k p jk = 1. We need to choose c small enough that p jj 0 for all j. 25

26 Why the scaling factor c? To account for the evolutionary distance. We choose a value of c which gives a transition matrix useful for short evolutionary periods. More precisely, choose a value c such that 1% of the amino acids are expected to undergo accepted point mutations during one time unit. This is an evolutionary distance of 1PAM. 26

27 Consider a particular site in the sequence. Recall: we label the amino acids 1,, 20. Let Z n be the amino acid present at the site at time n. The probability that the site will change after one time step is P (Z 1 Z 0 ) = = 20 j=1 20 j=1 20 j=1 P (Z 0 = j, Z 1 j) P (Z 1 j Z 0 = j)p (Z 0 = j) P (Z 1 j Z 0 = j) q j where q j is the observed frequency of amino acid j in the original block of aligned proteins. 27

28 We want the probability of a change to be 0.01 (an average change of 1%): 0.01 = = = = c 20 j=1 20 P (Z 1 j Z 0 = j) q j k j P (Z 1 = k Z 0 = j) j=1 20 p jk q j j=1 k j 20 c a j,k q j j=1 k j 20 j=1 k j q j a j,k. q j 28

29 So, we take c = 20 j= k j q j a j,k How can we turn our transition matrix into a scoring matrix? 29

30 Consider the two protein sequences s = a 1 a 2 a n and s = b 1 b 2 b n (with an evolutionary distance of 1 PAM). The score for aligning s with s is found by comparing the null and alternative hypotheses H 0 : s and s are not evolutionarily related (a chance alignment). H 1 : s and s are evolutionarily related (s depends on s via the Markov model). 30

31 Under H 0 : We have a chance alignment. That is, all sites in both sequences are randomly generated, and all the sites are independent of each other. Suppose amino acid j appears with probability q j. The probability of getting this chance alignment is P H0 (the alignment) = = n i=1 n i=1 q ai ( qai q bi ). n i=1 q bi 31

32 Under H 1 : The sites in the sequence are dependent, according to the Markov model described earlier. For example, P (site 3 changes from A to B in 1 time step) = p AB, the one step transition probability. Then P H1 (align A and B at site 3) = q A p AB. Since all sites behave independently: P H1 (the alignment) = n i=1 q ai p ai b i. 32

33 We want our score to reflect the chance that with s and s we have aligned evolutionarily related sequences. That is, we want the score to be high if the chance is high that we have aligned related sequences. A natural choice for the score is a comparison of the probability of the alignment under H 0 and H 1. The likelihood ratio: Score = P H 1 (the alignment) P H0 (the alignment) = = n i=1 n i=1 q ai p ai b i q ai q bi p ai b i q bi. 33

34 Equivalently (better for theoretical reasons), use the log likelihood ratio: ( ) PH1 (the alignment) Score = log P H0 (the alignment) = log = n i=1 n i=1 p ai b i q bi log ( pai b i q bi The entry at position (a, b) in the PAM substitution matrix is then S a,b = log ). ( ) pab (or rounded to the nearest integer for convenience). q b, 34

35 Using the logarithm, we have obtained our additive scoring system. With alignments s = a 1 a 2 a n and s = b 1 b 2 b n we get S = Total Score = n i=1 S ai,b i. Adding the individual scores is equivalent to multiplying the probabilities, thanks to the logarithm: ( ) PH1 (the alignment) S = log P H0 (the alignment) = = n i=1 n i=1 log ( pai b i q bi S ai,b i. ) 35

36 Note that S a,b < 0 p ab q b < 1 q ap ab q a q b < 1 q a p ab < q a q b, that is, if we are more likely to see a and b aligned against each other in a random alignment than to see a and b aligned in a comparison of two related sequences (at PAM 1 distance). Otherwise, S a,b = log ( ) pab q b 0. 36

37 PAMn substitution matrices? For sequences having an evolutionary distance of n PAM units. This does not mean that we expect n% of the amino acids to differ: substitutions can occur at the same site many times! Let P be the 1 PAM transition matrix. As always, the the n step transition probabilities p (n) ab are the entries of the matrix P n. The corresponding scores are S (n) a,b = log p(n) ab q b. 37

38 The BLOSUM family (Henikoff and Henikoff, 1992) BLOSUM = BLOcks SUbstitution Matrices Again the scores will be logarithms of likelihood ratios, but this time there are no evolutionary models, and so no Markov chains and no trees. The likelihoods are obtained by statistical analysis of blocks of aligned sequences. 38

39 Blocks of aligned sequences The blocks needed stem from an ungapped multiple alignment of a relatively highly conserved region of a family of proteins. This is different kind of data to that used in the PAM family. H&H s data was far more extensive: several hundred groups of proteins, at least 2369 occurrences of any particular substitution. difference in concept: Dayhoff et al. used data from closely related proteins and extrapolated (PAM1 to PAMn). H&H directly used protein sequences regardless of their evolutionary distance. 39

40 How are the likelihoods obtained? We count the proportion of times p a that the amino acid a occurs somewhere in the block; the proportion of times p ab that the amino acid pair (a, b) occurs in the same column of any block. NB: a ( and ) b are not necessarily distinct. There ( are ) = 210 pairs of amino acids, and N m 2 pairs that have to be taken into account in all the blocks, if each block has the same number of rows. N is the number of columns in all blocks together, and m is the number of rows in each block. 40

41 p ab is the likelihood (or estimated probability) for the substitution a b under the assumption the sequences are related. p a p b is the likelihood (or estimated probability) for the substitution a b (also for b a) under the assumption that the sequences are not related. Thus, for a b, the likelihood for the substitution a b is 2p a p b. A scoring matrix is obtained using the same ideas as for the PAM matrix. Set the entry at position (a, b) to be S a,b = ( ) 2 log pab 2 2p a p if a b, b 2 log 2 ( paa p 2 a ) if a = b. 41

42 2 log 2 was used by H&H; using log would give essentially the same score. This is not yet the BLOSUM we are looking for... To obtain the initial blocks, a multiple alignment was found. A substitution matrix is needed for that! Solution...? 42

43 The circularity problem. Solution: iteration. H&H used a simple unit matrix for this first alignment (1 for a match, 0 for a mismatch). With the BLO- SUM matrix they obtained by the above procedure, they constructed a second set of blocks and a second BLOSUM matrix. Then, with this second matrix a third matrix was constructed. This is the matrix that is recommended to be used. This gives a BLOSUM100 matrix, provided that we have eliminated all identical copies of sequences from the original blocks. The BLOSUM100 is not very useful! 43

44 The overcounting problem. Solution: clustering The problem of overcounting is solved by clustering those sequences in each block that are sufficiently close. That is, we combine them in a skillful way and regard them as a single sequence. The result is a BLOSUMx matrix, where x determines what we mean by sufficiently close. We cluster the sequences that have x% (or more) in common. An average BLOSUM often used is BLOSUM62 (the default when you do a BLAST search on NCBI). 44

45 PAM vs. BLOSUM The circularity problem PAM: not addressed. BLOSUM: iterative procedure. The overcounting problem PAM: inferring of phylogenetic trees for each block. Substitutions are only counted along the edges of the trees. BLOSUM: clustering by the x% rule in each block. The evolutionary distance problem PAM: Markov chain theory: distances are accounted for by n step transition matrices for different n (higher distance = higher n). BLOSUM: clustering by the x% rule in each block (higher distance = lower x). 45

46 Note that the numbers n and x in PAMn and BLOSUMx play opposite roles. Higher values of n and lower values of x both correspond to longer evolutionary distances. The n counts the time steps in the Markov chain used for the evolutionary model. The x tells us up to what percentage of similarity two sequences in a block will be seen as different. 46

47 Advantages of BLOSUM Simpler model; Observation based and mostly independent of other models and concepts (eg Markov chains); BLOSUM matrices seem to be better than PAM matrices at detecting biological relationships (even if the same amount of data is used). 47

48 Advantages of PAM Gives an explicit evolutionary model as a by product; Helps to give a better understanding of biological relationships. 48

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

More information

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices. Shifra Ben-Dor Irit Orr Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Local Alignment Statistics

Local Alignment Statistics Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison

More information

7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation CB Lecture #4 7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

CSE 549: Computational Biology. Substitution Matrices

CSE 549: Computational Biology. Substitution Matrices CSE 9: Computational Biology Substitution Matrices How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University Measures of Sequence Similarity Alignment with dot

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

What Is Conservation?

What Is Conservation? What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

More information

Sequence comparison: Score matrices

Sequence comparison: Score matrices Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Substitution matrices

Substitution matrices Introduction to Bioinformatics Substitution matrices Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best alignment path onsider the last step in

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Scoring Matrices. Shifra Ben Dor Irit Orr

Scoring Matrices. Shifra Ben Dor Irit Orr Scoring Matrices Shifra Ben Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a

More information

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

BINF 730. DNA Sequence Alignment Why?

BINF 730. DNA Sequence Alignment Why? BINF 730 Lecture 2 Seuence Alignment DNA Seuence Alignment Why? Recognition sites might be common restriction enzyme start seuence stop seuence other regulatory seuences Homology evolutionary common progenitor

More information

Phylogenetic Assumptions

Phylogenetic Assumptions Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by

More information

Exercise 5. Sequence Profiles & BLAST

Exercise 5. Sequence Profiles & BLAST Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation

Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation 1 dist est: Estimation of Rates-Across-Sites Distributions in Phylogenetic Subsititution Models Version 1.0 Edward Susko Department of Mathematics and Statistics, Dalhousie University Introduction The

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics A stochastic (probabilistic) model that assumes the Markov property Markov property is satisfied when the conditional probability distribution of future states of the process (conditional on both past

More information

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel ) Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Moreover, the circular logic

Moreover, the circular logic Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Pairwise sequence alignments

Pairwise sequence alignments Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i 第 4 章 : 进化树构建的概率方法 问题介绍 进化树构建方法的概率方法 部分 lid 修改自 i i f l 的 ih l i 部分 Slides 修改自 University of Basel 的 Michael Springmann 课程 CS302 Seminar Life Science Informatics 的讲义 Phylogenetic Tree branch internal node

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Phylogenetics. Andreas Bernauer, March 28, Expected number of substitutions using matrix algebra 2

Phylogenetics. Andreas Bernauer, March 28, Expected number of substitutions using matrix algebra 2 Phylogenetics Andreas Bernauer, andreas@carrot.mcb.uconn.edu March 28, 2004 Contents 1 ts:tr rate ratio vs. ts:tr ratio 1 2 Expected number of substitutions using matrix algebra 2 3 Why the GTR model can

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Tutorial 4 Substitution matrices and PSI-BLAST

Tutorial 4 Substitution matrices and PSI-BLAST Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about

More information

Copyright 2000 N. AYDIN. All rights reserved. 1

Copyright 2000 N. AYDIN. All rights reserved. 1 Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Computational Biology and Chemistry

Computational Biology and Chemistry Computational Biology and Chemistry 33 (2009) 245 252 Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem Research Article

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Markov Chains. Sarah Filippi Department of Statistics  TA: Luke Kelly Markov Chains Sarah Filippi Department of Statistics http://www.stats.ox.ac.uk/~filippi TA: Luke Kelly With grateful acknowledgements to Prof. Yee Whye Teh's slides from 2013 14. Schedule 09:30-10:30 Lecture:

More information

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical

More information

Stochastic processes and

Stochastic processes and Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

... and searches for related sequences probably make up the vast bulk of bioinformatics activities.

... and searches for related sequences probably make up the vast bulk of bioinformatics activities. 1 2 ... and searches for related sequences probably make up the vast bulk of bioinformatics activities. 3 The terms homology and similarity are often confused and used incorrectly. Homology is a quality.

More information