# Lecture 4. Models of DNA and protein change. Likelihood methods

Size: px
Start display at page:

Transcription

1 Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

2 The Jukes-Cantor model (1969) A u/3 u/3 G u/3 u/3 u/3 C u/3 T the simplest symmetrical model of DNA evolution Lecture 4. Models of DNA and protein change. Likelihood methods p.2/39

3 Transition probabilities under the Jukes-Cantor model All sites change independently All sites have the same stochastic process working at them Make up a fictional kind of event, such that when it happens the site changes to one of the 4 bases chosen at random (equiprobably) Assertion: Having these events occur at rate 4 3u is the same as having the Jukes-Cantor model events occur at rate u The probability of none of these fictional events happens in time t is exp( 4 3 ut) No matter how many of these fictional events occur, provided it is not zero, the chance of ending up at a particular base is 1 4. Lecture 4. Models of DNA and protein change. Likelihood methods p.3/39

4 Jukes-Cantor transition probabilities, cont d Putting all this together, the probability of changing to C, given the site is currently at A, in time t is Prob (C A, t) = 1 4 (1 e 4 3 ut) while Prob (A A, t) = e 4 3 t (1 e 4 3 ut) or Prob (A A, t) = 1 4 (1 + 3e 4 3 ut) so that the total probability of change is (1 e 4 3 ut ) Prob (change t) = 3 4 Lecture 4. Models of DNA and protein change. Likelihood methods p.4/39

5 Fraction of sites different, Jukes-Cantor 1 Differences per site Branch length after branches of different length, under the Jukes-Cantor model Lecture 4. Models of DNA and protein change. Likelihood methods p.5/39

6 Kimura s (1980) K2P model of DNA change, A a G b b b b C a T which allows for different rates of transitions and transversions, Lecture 4. Models of DNA and protein change. Likelihood methods p.6/39

7 Motoo Kimura Motoo Kimura, with family in Mishima, Japan in the 1960 s Lecture 4. Models of DNA and protein change. Likelihood methods p.7/39

8 Transition probabilities for the K2P model with two kinds of events: I. At rate α, if the site has a purine (A or G), choose one of the two purines at random and change to it. If the site has a pyrimidine (C or T), choose one of the pyrimidines at random and change to it. II. At rate β, choose one of the 4 bases at random and change to it. By proper choice of α and β one can achieve the overall rate of change and T s /T n ratio R you want. For rate of change 1, the transition probabilities (warning: terminological tangle). ) Prob (transition t) = ( exp 2R+1 R+1 t Prob (transversion t) = exp ( 2 R+1 t ) exp ( 2 R+1 t ) (the transversion probability is the sum of the probabilities of both kinds of transversions). Lecture 4. Models of DNA and protein change. Likelihood methods p.8/39

9 Transitions, transversions expected Differences Total differences Transitions 0.20 Transversions Time (branch length) R = 10 in different amounts of branch length under the K2P model, for T s /T n = 10 Lecture 4. Models of DNA and protein change. Likelihood methods p.9/39

10 Transitions, transversions expected Total differences Differences Transversions Transitions R = Time (branch length) in different amounts of branch length under the K2P model, for T s /T n = 2 Lecture 4. Models of DNA and protein change. Likelihood methods p.10/39

11 Other commonly used models include: Two models that specify the equilibrium base frequencies (you provide the frequencies π A, π C, π G, π T and they are set up to have an equilibrium which achieves them), and also let you control the transition/transversion ratio: The Hasegawa-Kishino-Yano (1985) model: to : A G C T from : A απ G + βπ G απ C απ T G απ A + βπ A απ C απ T C απ A απ G απ T + βπ T T απ A απ G απ C + βπ C Lecture 4. Models of DNA and protein change. Likelihood methods p.11/39

12 My F84 model to : A G C T from : A απ G + β π G πr απ C απ T G απ A + β π A πr απ C απ T C απ A απ G απ T + βπ T π Y T απ A απ G απ C + β π C πy where π R = π A + π G and π Y = π C + π T (The equilibrium frequencies of purines and pyrimidines) Both of these models have formulas for the transition probabilities, and both are subcases of a slightly more general class of models, the Tamura-Nei model (1993). Lecture 4. Models of DNA and protein change. Likelihood methods p.12/39

13 Reversibility P ij Pji π j π i Lecture 4. Models of DNA and protein change. Likelihood methods p.13/39

14 The General Time-Reversible model (GTR) It maintains detailed balance" so that the probability of starting at (say) A and ending at (say) T in evolution is the same as the probability of starting at T and ending at A: to : A G C T from : A απ G βπ C γπ T G απ A δπ C ɛπ T C βπ A δπ G υπ T T γπ A ɛπ G υπ C And there is of course the general 12-parameter model which has arbitrary rates for each of the 12 possible changes (from each of the 4 nucleotides to each of the 3 others). (Neither of these has formulas for the transition probabilities, but those can be done numerically.) Lecture 4. Models of DNA and protein change. Likelihood methods p.14/39

15 Relation between models There are many other models, but these are the most widely-used ones. Here is a general scheme of which models are subcases of which other ones: General 12 parameter model (12) General time reversible model (9) Tamura Nei (6) HKY (5) F84 (5) Kimura K2P (2) Jukes Cantor (1) Lecture 4. Models of DNA and protein change. Likelihood methods p.15/39

16 Rate variation among sites In reality, rates of evolution are not constant among sites. Fortunately, in the transition probability formulas, rates come in as simple multiples of times Prob (i j, u, t) = Prob (i j, 1, ut) Thus if we know the rates at two sites, we can compute the probabilities of change by simply, for each site, multiplying all branch lengths by the appropriate rate Lecture 4. Models of DNA and protein change. Likelihood methods p.16/39

17 (continued...) If we don t know the rates, we can imagine averaging them over a distribution f(u) of rates. Usually the Gamma distribution is used Prob (i j, t) = 0 f(u) Prob (i j, u, t) du In practice a discrete histogram of rates approximates the integration (For the Gamma it seems best to use Generalized Laguerre Quadrature to pick the rates and frequencies in the histogram). Also, there are actually autocorrelations with neighboring sites having similar rates of change. This can be handled by Hidden Markov Models, which we cover later. Lecture 4. Models of DNA and protein change. Likelihood methods p.17/39

18 A pioneer of protein evolution Margaret Dayhoff, about 1966 Lecture 4. Models of DNA and protein change. Likelihood methods p.18/39

19 Models of amino acid change in proteins There are a variety of models put forward since the mid-1960 s: 1. Amino acid transition matrices Dayhoff (1968) model. Tabulation of empirical changes in closely related pairs of proteins, normalized. The PAM100 matrix, for example, is the expected transition matrix given 1 substitution per position. Jones, Taylor and Thornton (1992) recalculated PAM matrices (the JTT matrix) from a much larger set of data. Jones, Taylor, and Thurnton (1994a, 1994b) have tabulated a separate mutation data matrix for transmembrane proteins. Koshi and Goldstein (1995) have described the tabulation of further context-dependent mutation data matrices. Henikoff and Henikoff (1992) have tabulated the BLOSUM matrix for conserved motifs in gene families. 2. Goldman and Yang (1994) pioneered codon-based models (see next screen). Lecture 4. Models of DNA and protein change. Likelihood methods p.19/39

20 Approaches to protein sequence models Making a model for protein sequence evolution (a not very practical approach) 1. Use a good model of DNA evolution. 2. Use the appropriate genetic code 3. When an amino acid changes, accept it with a probability that declines as the amino acids become more different 4. Fit this to empirical information on protein evolution 5. Take into account variation of rate from site to site 6. Take into account correlation of rate variation in adjacent sites 7. How about protein structure? Secondary structure? 3 D struncture? Lecture 4. Models of DNA and protein change. Likelihood methods p.20/39

21 Codon models (Muse & Gaut, MBE 1994; Goldman & Yang, MBE 1994) U C A G U phe UUU U C A phe leu UUC UUA ser UCA stop UAA stop UGA G leu UUG U leu CUU C C A leu leu CUC CUA G leu CUG U ile AUU A C A ile ile AUC AUA G met AUG U val GUU G C A val val GUC GUA G val GUG Lecture 4. Models of DNA and protein change. Likelihood methods p.21/39

22 Covarion models? (Fitch and Markowitz, 1970) A G T A A G G A T T A A G T C A A G T A A A G T T T A A G T C A A G T A A G G T T T A A G T C A A G C A A A G T T T A A G T C A Which sites are available A G C A A G G T T T A A G T C A for substitutions changes as one moves along the tree A G T A A G G T T T A A G T C A Lecture 4. Models of DNA and protein change. Likelihood methods p.22/39

23 Likelihoods and odds ratios Bayes Theorem relates prior and posterior probabilities of an hypothesis H: Prob (H D) = Prob (H and D)/ Prob (D) = Prob (D H) Prob (H)/ Prob (D) The ratios of posterior probabilities of two hypotheses, 1 and 2 can be written, putting this into its odds ratio" form () cancels): Prob (H 1 D) Prob (H 2 D) = Prob (D H 1) Prob (D H 2 ) Prob (H 1 ) Prob (H 2 ) Note that this says that the posterior odds in favor of 1 over 2 are the product of prior odds and a likelihood ratio. The likelihood of the hypothesis H is the probability of the observed data given it, ). This is not the same as the probability of the hypothesis given the data. That is the posterior probability of H and requires that we also have a believable prior probability ) Lecture 4. Models of DNA and protein change. Likelihood methods p.23/39

24 Rationale of likelihood inference If the data consists of n items that are conditionally independent given the hypothesis i, Prob (D H i ) = Prob (D (1) H i ) Prob (D (2) H i )... Prob (D (n) H i ). and we can then write the likelihood ratio as a product of ratios: Prob (D H 1 ) Prob (D H 2 ) = n j=1 Prob (D (j) H 1 ) Prob (D (j) H 2 ) If the amount of data is large the likelihood ratio terms will dominate and push the result towards the correct hypothesis. This can console us somewhat for the lack of a believable prior. Lecture 4. Models of DNA and protein change. Likelihood methods p.24/39

25 Properties of likelihood inference Likeihood inference has (usually) properties of Consistency. As the number of data items n gets large, we converge to the correct hypothesis with probability 1. Efficiency. Asymptotically, the likelihood estimate has the smallest possible variance (it need not be best for any finite number n of data points). Lecture 4. Models of DNA and protein change. Likelihood methods p.25/39

26 A simple example coin tossing If we toss a coin which has heads probability p and get HHTTHTHHTTT the likelihood is L = Prob (D p) = pp(1 p)(1 p)p(1 p)pp(1 p)(1 p)(1 p) = p 5 (1 p) 6 so that trying to maximize it we get dl dp = 5p4 (1 p) 6 6p 5 (1 p) 5 Lecture 4. Models of DNA and protein change. Likelihood methods p.26/39

27 finding the ML estimate and searching for a value of p for which the slope is zero: which has roots at 0, 1, and 1 dl dp = p4 (1 p) 5 (5(1 p) 6p) = 0 Lecture 4. Models of DNA and protein change. Likelihood methods p.27/39

28 Log likelihoods Alternatively, we could maximize not L but its logarithm. This turns products into sums: ln L = 5 ln p + 6 ln(1 p) whereby d(ln L) dp = 5 p 6 (1 p) = 0 so that finally p = 5/11 Lecture 4. Models of DNA and protein change. Likelihood methods p.28/39

29 Likelihood curve for coin tosses Likelihood p Lecture 4. Models of DNA and protein change. Likelihood methods p.29/39

30 Likelihood on trees A C C G t 1 t 2 y C t 3 t 4 t 5 w t t 6 z 7 x t 8 A tree, with branch lengths, and the data at a single site This example is used to describe calculation of the likelihood Since the sites evolve independently on the same tree, L = Prob (D T) = m Prob ( ) D (i) T i=1 Lecture 4. Models of DNA and protein change. Likelihood methods p.30/39

31 Likelihood at one site on a tree We can compute this by summing over all assignments of states x, y, z and w to the interior nodes Prob ( D (i) T ) = x y z w Prob (A, C, C, C, G, x, y, z, w T) Lecture 4. Models of DNA and protein change. Likelihood methods p.31/39

32 Computing the terms For each combination of states, the Markov process allows us to express it as a product of probabilities of a series of changes, with the probability that we start in state x: Prob (A, C, C, C, G, x, y, z, w T) = Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) Lecture 4. Models of DNA and protein change. Likelihood methods p.32/39

33 Computing the terms Summing this up, there are 256 terms in this case: x y z w Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) Lecture 4. Models of DNA and protein change. Likelihood methods p.33/39

34 References Barry, D., and J. A. Hartigan Statistical analysis of hominoid molecular evolution. Statistical Science 2: [Early use of full 12-parameter model] Dayhoff, M. O. and R. V. Eck Atlas of Protein Sequence and Structure National Biomedical Research Foundation, Silver Spring, Maryland. [Dayhoff s PAM modelfor proteins] Edwards, A. W. F., and L. L. Cavalli-Sforza Reconstruction of evolutionary trees. pp in Phenetic and Phylogenetic Classification, ed. V. H. Heywood and J. McNeill. Systematics Association Publ. No. 6, London. [first paper on likelihood for phylogenies] Felsenstein, J Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22: [The pruning" algorithm] Felsenstein, J Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: [Made likelihood practical for n species] Felsenstein, J Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts. [material is in chapters 13, 16] Lecture 4. Models of DNA and protein change. Likelihood methods p.34/39

35 References Fisher, R. A On an absolute criterion for fitting frequency curves. Messenger of Mathematics 41: [First modern paper introducing likelihood] Fisher, R. A On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A 222: [Likelihood in generality] Fitch, W. M. and E. Markowitz An improved method for determi ning codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochemical Genetics 4: [The first suggestion of a covarion model] Goldman, N., and Z. Yang A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution 11: [One of the two introductions of the codon model] Hasegawa, M., H. Kishino, and T. Yano Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22: [HKY model] Lecture 4. Models of DNA and protein change. Likelihood methods p.35/39

36 References Henikoff, S. and J. G. Henikoff Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, USA 89: [BLOSUM protein model] Jones, D. T., W. R. Taylor, and J. M. Thornton The rapid generation of mutation data matrices from protein sequences. Computer Applcations in the Biosciences (CABIOS) 8: [JTT model for proteins] Jones, D. T., W. R. Taylor, and J. M. Thornton. 1994a. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 33: [JTT membrane protein model] Jones, D. T., W. R. Taylor, and J. M. Thornton. 1994b. A mutation data matrix for transmembrane proteins. FEBS Letters 339: [JTT membrane protein model] Jukes, T. H. and C. Cantor Evolution of protein molecules. pp in Mammalian Protein Metabolism, ed. M. N. Munro. Academic Press, New York. [Jukes-Cantor model] Lecture 4. Models of DNA and protein change. Likelihood methods p.36/39

37 References Kashyap, R. L., and S. Subas Statistical estimation of parameters in a phylogenetic tree using a dynamic model of the substitutional process. Journal of Theoretical Biology 47: [Second paper applying likelihood to molecular sequences] Kimura, M A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16: [Kimura s 2-parameter model] Koshi, J. M. and R. A. Goldstein Context-dependent optimal substitution matrices. Protein Engineering 8: [generating other kinds of protein model matrices] Lanave, C., G. Preparata, C. Saccone, and G. Serio A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution 20: [General reversible model] Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny Recovering evolutionary trees under a more realistic model of sequence evolution. Molecular Biology and Evolution 11: [The LogDet distance for correcting for changing base composition] Lecture 4. Models of DNA and protein change. Likelihood methods p.37/39

38 References Muse, S. V. and B S. Gaut A likelihood method for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Molecular Biology and Evolution 11: [One of the two introductions of the codon model] Neyman, J Molecular studies of evolution: a source of novel statistical problems. pp in Statistical Decision Theory and Related Topics, ed. S. S. Gupta and J. Yackel. Academic Press, New York. [First application of likelihood to molecular sequences] Tamura, K. and M. Nei Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10: [Tamura-Nei model] Lecture 4. Models of DNA and protein change. Likelihood methods p.38/39

39 How it was done This projection produced using the prosper style in LaTeX, using Latex to make a.dvi file, using dvips to turn this into a Postscript file, using ps2pdf to make it into a PDF file, and displaying the slides in Adobe Acrobat Reader. Result: nice slides using freeware. Lecture 4. Models of DNA and protein change. Likelihood methods p.39/39

### Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

### Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

### Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

### How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

### Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

### Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

### Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

### Lecture Notes: Markov chains

Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

### Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

### Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics

Inferring Phylogenies from Protein Sequences by Parsimony, Distance, and Likelihood Methods Joseph Felsenstein Department of Genetics University of Washington Box 357360 Seattle, Washington 98195-7360

### Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

### Week 6: Protein sequence models, likelihood, hidden Markov models

Week 6: Protein sequence models, likelihood, hidden Markov models Genome 570 February, 2016 Week 6: Protein sequence models, likelihood, hidden Markov models p.1/57 Variation of rates of evolution across

### Taming the Beast Workshop

Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype

### Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

### Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

### Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

### Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

### Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics June 1, 2009 Smithsonian Workshop on Molecular Evolution Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs, CT Copyright 2009

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### Week 7: Bayesian inference, Testing trees, Bootstraps

Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob

### Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

### Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

### Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

### RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson

### Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

### "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

### What Is Conservation?

What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

### MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MATTHEW W. DIMMIC*, DAVID P. MINDELL RICHARD A. GOLDSTEIN* * Biophysics Research Division Department of Biology and

### Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

### Aoife McLysaght Dept. of Genetics Trinity College Dublin

Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution

### Evolutionary Analysis of Viral Genomes

University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

### Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

### POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

### Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

### Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

### Phylogenetics: Building Phylogenetic Trees

1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

### Phylogenetics. BIOL 7711 Computational Bioscience

Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

### Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics 26 January 2011 Workshop on Molecular Evolution Český Krumlov, Česká republika Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut,

### KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Method KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Zhang Zhang 1,2,3#, Jun Li 2#, Xiao-Qian Zhao 2,3, Jun Wang 1,2,4, Gane Ka-Shu Wong 2,4,5, and Jun Yu 1,2,4 * 1

### Likelihoods and Phylogenies

Likelihoods and Phylogenies Joe Felsenstein Department of enome Sciences and Department of Biology University of Washington, Seattle Likelihoods and Phylogenies p.1/68 n ideal parsimony method? Ideally,

### Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

### Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus

### Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

### Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved

### Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

### Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

### In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The Phylogenetic Handbook. Cambridge University Press, UK. Chapter 4. Nucleotide Substitution Models THEORY Korbinian Strimmer () and Arndt von Haeseler

### 7. Tests for selection

Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

### BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

### How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve

### Understanding relationship between homologous sequences

Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

### Inferring Molecular Phylogeny

Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

### Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics 29 July 2014 Workshop on Molecular Evolution Woods Hole, Massachusetts Paul O. Lewis Department of Ecology & Evolutionary Biology Paul O. Lewis (2014 Woods Hole Workshop

### In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The Phylogenetic Handbook. 2 nd Edition. Cambridge University Press, UK. (final version 21. 9. 2006) Chapter 4. Nucleotide Substitution

### Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

### EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### Summary statistics, distributions of sums and means

Summary statistics, distributions of sums and means Joe Felsenstein Summary statistics, distributions of sums and means p.1/17 Quantiles In both empirical distributions and in the underlying distribution,

### Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

### Phylogenetic Inference and Hypothesis Testing. Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne

Phylogenetic Inference and Hypothesis Testing Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne November 13, 2003 Contents 1 Introduction 4 2 Molecular Phylogenetics

### Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Predicting the Evolution of two Genes in the Yeast Saccharomyces Cerevisiae

Available online at wwwsciencedirectcom Procedia Computer Science 11 (01 ) 4 16 Proceedings of the 3rd International Conference on Computational Systems-Biology and Bioinformatics (CSBio 01) Predicting

### Lab 9: Maximum Likelihood and Modeltest

Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

### Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

### Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

### Quantifying sequence similarity

Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

### Modeling Noise in Genetic Sequences

Modeling Noise in Genetic Sequences M. Radavičius 1 and T. Rekašius 2 1 Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Vilnius Gediminas Technical University, Vilnius, Lithuania 1. Introduction:

### EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

### Natural selection on the molecular level

Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change

### Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

### Mutation models I: basic nucleotide sequence mutation models

Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or

### Models of Molecular Evolution and Phylogeny

REVIEW Models of Molecular Evolution and Phylogeny Pietro Liò and Nick Goldman 1 Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK Phylogenetic reconstruction is a fast-growing field

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### 7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

### Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:

### Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular

### Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics 23 July 2013 Workshop on Molecular Evolution Woods Hole, Massachusetts Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs,

### Genetic Code, Attributive Mappings and Stochastic Matrices

Genetic Code, Attributive Mappings and Stochastic Matrices Matthew He Division of Math, Science and Technology Nova Southeastern University Ft. Lauderdale, FL 33314, USA Email: hem@nova.edu Abstract: In

### Phylogenetic Assumptions

Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by

### Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

### Molecular Evolution and Phylogenetic Analysis

Molecular Evolution and Phylogenetic Analysis David Pollock and Richard Goldstein Introduction All of biology is based on evolution. Evolution is the organizing principle for understanding the shared history

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### MANNINO, FRANK VINCENT. Site-to-Site Rate Variation in Protein Coding

Abstract MANNINO, FRANK VINCENT. Site-to-Site Rate Variation in Protein Coding Genes. (Under the direction of Spencer V. Muse) The ability to realistically model gene evolution improved dramatically with

### Kei Takahashi and Masatoshi Nei

Efficiencies of Fast Algorithms of Phylogenetic Inference Under the Criteria of Maximum Parsimony, Minimum Evolution, and Maximum Likelihood When a Large Number of Sequences Are Used Kei Takahashi and

### T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

### PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

Annu. Rev. Ecol. Syst. 1997. 28:437 66 Copyright c 1997 by Annual Reviews Inc. All rights reserved PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD John P. Huelsenbeck Department of

### Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

### ومن أحياها Translation 1. Translation 1. DONE BY :Maen Faoury

Translation 1 DONE BY :Maen Faoury 0 1 ومن أحياها Translation 1 2 ومن أحياها Translation 1 In this lecture and the coming lectures you are going to see how the genetic information is transferred into proteins

### Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

### Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation

1 dist est: Estimation of Rates-Across-Sites Distributions in Phylogenetic Subsititution Models Version 1.0 Edward Susko Department of Mathematics and Statistics, Dalhousie University Introduction The

### Local Alignment Statistics

Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison

### Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#\$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods