Lecture 4. Models of DNA and protein change. Likelihood methods

Similar documents
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Week 5: Distance methods, DNA and protein models

Lecture Notes: Markov chains

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Week 6: Protein sequence models, likelihood, hidden Markov models

Taming the Beast Workshop

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Maximum Likelihood in Phylogenetics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Dr. Amira A. AL-Hosary

Week 7: Bayesian inference, Testing trees, Bootstraps

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Letter to the Editor. Department of Biology, Arizona State University

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

What Is Conservation?

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Evolutionary Analysis of Viral Genomes

Probabilistic modeling and molecular phylogeny

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Phylogenetics: Building Phylogenetic Trees

Phylogenetics. BIOL 7711 Computational Bioscience

Maximum Likelihood in Phylogenetics

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Likelihoods and Phylogenies

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Reconstruire le passé biologique modèles, méthodes, performances, limites

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

7. Tests for selection

BMI/CS 776 Lecture 4. Colin Dewey

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Understanding relationship between homologous sequences

Inferring Molecular Phylogeny

Maximum Likelihood in Phylogenetics

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

Estimating Divergence Dates from Molecular Sequences

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Summary statistics, distributions of sums and means

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Phylogenetic Inference and Hypothesis Testing. Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Constructing Evolutionary/Phylogenetic Trees

Predicting the Evolution of two Genes in the Yeast Saccharomyces Cerevisiae

Lab 9: Maximum Likelihood and Modeltest

Molecular Evolution, course # Final Exam, May 3, 2006

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Quantifying sequence similarity

Modeling Noise in Genetic Sequences

EVOLUTIONARY DISTANCES

Natural selection on the molecular level

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Mutation models I: basic nucleotide sequence mutation models

Models of Molecular Evolution and Phylogeny

Constructing Evolutionary/Phylogenetic Trees

7.36/7.91 recitation CB Lecture #4

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Maximum Likelihood in Phylogenetics

Genetic Code, Attributive Mappings and Stochastic Matrices

Phylogenetic Assumptions

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Molecular Evolution and Phylogenetic Analysis

Phylogenetic Tree Reconstruction

MANNINO, FRANK VINCENT. Site-to-Site Rate Variation in Protein Coding

Kei Takahashi and Masatoshi Nei

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

ومن أحياها Translation 1. Translation 1. DONE BY :Maen Faoury

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation

Local Alignment Statistics

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Phylogenetic Inference using RevBayes

The Importance of Proper Model Assumption in Bayesian Phylogenetics

Transcription:

Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

The Jukes-Cantor model (1969) A u/3 u/3 G u/3 u/3 u/3 C u/3 T the simplest symmetrical model of DNA evolution Lecture 4. Models of DNA and protein change. Likelihood methods p.2/39

Transition probabilities under the Jukes-Cantor model All sites change independently All sites have the same stochastic process working at them Make up a fictional kind of event, such that when it happens the site changes to one of the 4 bases chosen at random (equiprobably) Assertion: Having these events occur at rate 4 3u is the same as having the Jukes-Cantor model events occur at rate u The probability of none of these fictional events happens in time t is exp( 4 3 ut) No matter how many of these fictional events occur, provided it is not zero, the chance of ending up at a particular base is 1 4. Lecture 4. Models of DNA and protein change. Likelihood methods p.3/39

Jukes-Cantor transition probabilities, cont d Putting all this together, the probability of changing to C, given the site is currently at A, in time t is Prob (C A, t) = 1 4 (1 e 4 3 ut) while Prob (A A, t) = e 4 3 t + 1 4 (1 e 4 3 ut) or Prob (A A, t) = 1 4 (1 + 3e 4 3 ut) so that the total probability of change is (1 e 4 3 ut ) Prob (change t) = 3 4 Lecture 4. Models of DNA and protein change. Likelihood methods p.4/39

Fraction of sites different, Jukes-Cantor 1 Differences per site 0.75 0.49 0 0 0.7945 Branch length after branches of different length, under the Jukes-Cantor model Lecture 4. Models of DNA and protein change. Likelihood methods p.5/39

Kimura s (1980) K2P model of DNA change, A a G b b b b C a T which allows for different rates of transitions and transversions, Lecture 4. Models of DNA and protein change. Likelihood methods p.6/39

Motoo Kimura Motoo Kimura, with family in Mishima, Japan in the 1960 s Lecture 4. Models of DNA and protein change. Likelihood methods p.7/39

Transition probabilities for the K2P model with two kinds of events: I. At rate α, if the site has a purine (A or G), choose one of the two purines at random and change to it. If the site has a pyrimidine (C or T), choose one of the pyrimidines at random and change to it. II. At rate β, choose one of the 4 bases at random and change to it. By proper choice of α and β one can achieve the overall rate of change and T s /T n ratio R you want. For rate of change 1, the transition probabilities (warning: terminological tangle). ) Prob (transition t) = 1 4 1 2 ( exp 2R+1 R+1 t Prob (transversion t) = 1 2 1 2 exp ( 2 R+1 t ). + 1 4 exp ( 2 R+1 t ) (the transversion probability is the sum of the probabilities of both kinds of transversions). Lecture 4. Models of DNA and protein change. Likelihood methods p.8/39

Transitions, transversions expected Differences 0.60 0.50 0.40 0.30 Total differences Transitions 0.20 Transversions 0.10 0.00 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time (branch length) R = 10 in different amounts of branch length under the K2P model, for T s /T n = 10 Lecture 4. Models of DNA and protein change. Likelihood methods p.9/39

Transitions, transversions expected 0.70 0.60 Total differences Differences 0.50 0.40 0.30 0.20 Transversions Transitions 0.10 0.00 R = 2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time (branch length) in different amounts of branch length under the K2P model, for T s /T n = 2 Lecture 4. Models of DNA and protein change. Likelihood methods p.10/39

Other commonly used models include: Two models that specify the equilibrium base frequencies (you provide the frequencies π A, π C, π G, π T and they are set up to have an equilibrium which achieves them), and also let you control the transition/transversion ratio: The Hasegawa-Kishino-Yano (1985) model: to : A G C T from : A απ G + βπ G απ C απ T G απ A + βπ A απ C απ T C απ A απ G απ T + βπ T T απ A απ G απ C + βπ C Lecture 4. Models of DNA and protein change. Likelihood methods p.11/39

My F84 model to : A G C T from : A απ G + β π G πr απ C απ T G απ A + β π A πr απ C απ T C απ A απ G απ T + βπ T π Y T απ A απ G απ C + β π C πy where π R = π A + π G and π Y = π C + π T (The equilibrium frequencies of purines and pyrimidines) Both of these models have formulas for the transition probabilities, and both are subcases of a slightly more general class of models, the Tamura-Nei model (1993). Lecture 4. Models of DNA and protein change. Likelihood methods p.12/39

Reversibility P ij Pji π j π i Lecture 4. Models of DNA and protein change. Likelihood methods p.13/39

The General Time-Reversible model (GTR) It maintains detailed balance" so that the probability of starting at (say) A and ending at (say) T in evolution is the same as the probability of starting at T and ending at A: to : A G C T from : A απ G βπ C γπ T G απ A δπ C ɛπ T C βπ A δπ G υπ T T γπ A ɛπ G υπ C And there is of course the general 12-parameter model which has arbitrary rates for each of the 12 possible changes (from each of the 4 nucleotides to each of the 3 others). (Neither of these has formulas for the transition probabilities, but those can be done numerically.) Lecture 4. Models of DNA and protein change. Likelihood methods p.14/39

Relation between models There are many other models, but these are the most widely-used ones. Here is a general scheme of which models are subcases of which other ones: General 12 parameter model (12) General time reversible model (9) Tamura Nei (6) HKY (5) F84 (5) Kimura K2P (2) Jukes Cantor (1) Lecture 4. Models of DNA and protein change. Likelihood methods p.15/39

Rate variation among sites In reality, rates of evolution are not constant among sites. Fortunately, in the transition probability formulas, rates come in as simple multiples of times Prob (i j, u, t) = Prob (i j, 1, ut) Thus if we know the rates at two sites, we can compute the probabilities of change by simply, for each site, multiplying all branch lengths by the appropriate rate Lecture 4. Models of DNA and protein change. Likelihood methods p.16/39

(continued...) If we don t know the rates, we can imagine averaging them over a distribution f(u) of rates. Usually the Gamma distribution is used Prob (i j, t) = 0 f(u) Prob (i j, u, t) du In practice a discrete histogram of rates approximates the integration (For the Gamma it seems best to use Generalized Laguerre Quadrature to pick the rates and frequencies in the histogram). Also, there are actually autocorrelations with neighboring sites having similar rates of change. This can be handled by Hidden Markov Models, which we cover later. Lecture 4. Models of DNA and protein change. Likelihood methods p.17/39

A pioneer of protein evolution Margaret Dayhoff, about 1966 Lecture 4. Models of DNA and protein change. Likelihood methods p.18/39

Models of amino acid change in proteins There are a variety of models put forward since the mid-1960 s: 1. Amino acid transition matrices Dayhoff (1968) model. Tabulation of empirical changes in closely related pairs of proteins, normalized. The PAM100 matrix, for example, is the expected transition matrix given 1 substitution per position. Jones, Taylor and Thornton (1992) recalculated PAM matrices (the JTT matrix) from a much larger set of data. Jones, Taylor, and Thurnton (1994a, 1994b) have tabulated a separate mutation data matrix for transmembrane proteins. Koshi and Goldstein (1995) have described the tabulation of further context-dependent mutation data matrices. Henikoff and Henikoff (1992) have tabulated the BLOSUM matrix for conserved motifs in gene families. 2. Goldman and Yang (1994) pioneered codon-based models (see next screen). Lecture 4. Models of DNA and protein change. Likelihood methods p.19/39

Approaches to protein sequence models Making a model for protein sequence evolution (a not very practical approach) 1. Use a good model of DNA evolution. 2. Use the appropriate genetic code 3. When an amino acid changes, accept it with a probability that declines as the amino acids become more different 4. Fit this to empirical information on protein evolution 5. Take into account variation of rate from site to site 6. Take into account correlation of rate variation in adjacent sites 7. How about protein structure? Secondary structure? 3 D struncture? Lecture 4. Models of DNA and protein change. Likelihood methods p.20/39

Codon models (Muse & Gaut, MBE 1994; Goldman & Yang, MBE 1994) U C A G U phe UUU U C A phe leu UUC UUA ser UCA stop UAA stop UGA G leu UUG U leu CUU C C A leu leu CUC CUA G leu CUG U ile AUU A C A ile ile AUC AUA G met AUG U val GUU G C A val val GUC GUA G val GUG Lecture 4. Models of DNA and protein change. Likelihood methods p.21/39

Covarion models? (Fitch and Markowitz, 1970) A G T A A G G A T T A A G T C A A G T A A A G T T T A A G T C A A G T A A G G T T T A A G T C A A G C A A A G T T T A A G T C A Which sites are available A G C A A G G T T T A A G T C A for substitutions changes as one moves along the tree A G T A A G G T T T A A G T C A Lecture 4. Models of DNA and protein change. Likelihood methods p.22/39

Likelihoods and odds ratios Bayes Theorem relates prior and posterior probabilities of an hypothesis H: Prob (H D) = Prob (H and D)/ Prob (D) = Prob (D H) Prob (H)/ Prob (D) The ratios of posterior probabilities of two hypotheses, 1 and 2 can be written, putting this into its odds ratio" form () cancels): Prob (H 1 D) Prob (H 2 D) = Prob (D H 1) Prob (D H 2 ) Prob (H 1 ) Prob (H 2 ) Note that this says that the posterior odds in favor of 1 over 2 are the product of prior odds and a likelihood ratio. The likelihood of the hypothesis H is the probability of the observed data given it, ). This is not the same as the probability of the hypothesis given the data. That is the posterior probability of H and requires that we also have a believable prior probability ) Lecture 4. Models of DNA and protein change. Likelihood methods p.23/39

Rationale of likelihood inference If the data consists of n items that are conditionally independent given the hypothesis i, Prob (D H i ) = Prob (D (1) H i ) Prob (D (2) H i )... Prob (D (n) H i ). and we can then write the likelihood ratio as a product of ratios: Prob (D H 1 ) Prob (D H 2 ) = n j=1 Prob (D (j) H 1 ) Prob (D (j) H 2 ) If the amount of data is large the likelihood ratio terms will dominate and push the result towards the correct hypothesis. This can console us somewhat for the lack of a believable prior. Lecture 4. Models of DNA and protein change. Likelihood methods p.24/39

Properties of likelihood inference Likeihood inference has (usually) properties of Consistency. As the number of data items n gets large, we converge to the correct hypothesis with probability 1. Efficiency. Asymptotically, the likelihood estimate has the smallest possible variance (it need not be best for any finite number n of data points). Lecture 4. Models of DNA and protein change. Likelihood methods p.25/39

A simple example coin tossing If we toss a coin which has heads probability p and get HHTTHTHHTTT the likelihood is L = Prob (D p) = pp(1 p)(1 p)p(1 p)pp(1 p)(1 p)(1 p) = p 5 (1 p) 6 so that trying to maximize it we get dl dp = 5p4 (1 p) 6 6p 5 (1 p) 5 Lecture 4. Models of DNA and protein change. Likelihood methods p.26/39

finding the ML estimate and searching for a value of p for which the slope is zero: which has roots at 0, 1, and 1 dl dp = p4 (1 p) 5 (5(1 p) 6p) = 0 Lecture 4. Models of DNA and protein change. Likelihood methods p.27/39

Log likelihoods Alternatively, we could maximize not L but its logarithm. This turns products into sums: ln L = 5 ln p + 6 ln(1 p) whereby d(ln L) dp = 5 p 6 (1 p) = 0 so that finally p = 5/11 Lecture 4. Models of DNA and protein change. Likelihood methods p.28/39

Likelihood curve for coin tosses Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 p 0.454 Lecture 4. Models of DNA and protein change. Likelihood methods p.29/39

Likelihood on trees A C C G t 1 t 2 y C t 3 t 4 t 5 w t t 6 z 7 x t 8 A tree, with branch lengths, and the data at a single site This example is used to describe calculation of the likelihood Since the sites evolve independently on the same tree, L = Prob (D T) = m Prob ( ) D (i) T i=1 Lecture 4. Models of DNA and protein change. Likelihood methods p.30/39

Likelihood at one site on a tree We can compute this by summing over all assignments of states x, y, z and w to the interior nodes Prob ( D (i) T ) = x y z w Prob (A, C, C, C, G, x, y, z, w T) Lecture 4. Models of DNA and protein change. Likelihood methods p.31/39

Computing the terms For each combination of states, the Markov process allows us to express it as a product of probabilities of a series of changes, with the probability that we start in state x: Prob (A, C, C, C, G, x, y, z, w T) = Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) Lecture 4. Models of DNA and protein change. Likelihood methods p.32/39

Computing the terms Summing this up, there are 256 terms in this case: x y z w Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) Lecture 4. Models of DNA and protein change. Likelihood methods p.33/39

References Barry, D., and J. A. Hartigan. 1987. Statistical analysis of hominoid molecular evolution. Statistical Science 2: 191-210. [Early use of full 12-parameter model] Dayhoff, M. O. and R. V. Eck. 1968. Atlas of Protein Sequence and Structure 1967-1968. National Biomedical Research Foundation, Silver Spring, Maryland. [Dayhoff s PAM modelfor proteins] Edwards, A. W. F., and L. L. Cavalli-Sforza. 1964. Reconstruction of evolutionary trees. pp. 67-76 in Phenetic and Phylogenetic Classification, ed. V. H. Heywood and J. McNeill. Systematics Association Publ. No. 6, London. [first paper on likelihood for phylogenies] Felsenstein, J. 1973. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22: 240-249. [The pruning" algorithm] Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368-376. [Made likelihood practical for n species] Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts. [material is in chapters 13, 16] Lecture 4. Models of DNA and protein change. Likelihood methods p.34/39

References Fisher, R. A. 1912. On an absolute criterion for fitting frequency curves. Messenger of Mathematics 41: 155-160. [First modern paper introducing likelihood] Fisher, R. A. 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A 222: 309-368. [Likelihood in generality] Fitch, W. M. and E. Markowitz. 1970. An improved method for determi ning codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochemical Genetics 4: 579-593. [The first suggestion of a covarion model] Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution 11: 725-736. [One of the two introductions of the codon model] Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22: 160-174. [HKY model] Lecture 4. Models of DNA and protein change. Likelihood methods p.35/39

References Henikoff, S. and J. G. Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, USA 89: 10915-10919. [BLOSUM protein model] Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Computer Applcations in the Biosciences (CABIOS) 8: 275-282. [JTT model for proteins] Jones, D. T., W. R. Taylor, and J. M. Thornton. 1994a. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 33: 3038-3049. [JTT membrane protein model] Jones, D. T., W. R. Taylor, and J. M. Thornton. 1994b. A mutation data matrix for transmembrane proteins. FEBS Letters 339: 269-275. [JTT membrane protein model] Jukes, T. H. and C. Cantor. 1969. Evolution of protein molecules. pp. 21-132 in Mammalian Protein Metabolism, ed. M. N. Munro. Academic Press, New York. [Jukes-Cantor model] Lecture 4. Models of DNA and protein change. Likelihood methods p.36/39

References Kashyap, R. L., and S. Subas. 1974. Statistical estimation of parameters in a phylogenetic tree using a dynamic model of the substitutional process. Journal of Theoretical Biology 47: 75-101. [Second paper applying likelihood to molecular sequences] Kimura, M. 1980. A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16: 111-120. [Kimura s 2-parameter model] Koshi, J. M. and R. A. Goldstein. 1995. Context-dependent optimal substitution matrices. Protein Engineering 8: 641-645. [generating other kinds of protein model matrices] Lanave, C., G. Preparata, C. Saccone, and G. Serio. 1984. A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution 20: 86-93. [General reversible model] Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Molecular Biology and Evolution 11: 605-612. [The LogDet distance for correcting for changing base composition] Lecture 4. Models of DNA and protein change. Likelihood methods p.37/39

References Muse, S. V. and B S. Gaut. 1994. A likelihood method for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Molecular Biology and Evolution 11: 715-724. [One of the two introductions of the codon model] Neyman, J. 1971. Molecular studies of evolution: a source of novel statistical problems. pp. 1-27 in Statistical Decision Theory and Related Topics, ed. S. S. Gupta and J. Yackel. Academic Press, New York. [First application of likelihood to molecular sequences] Tamura, K. and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10: 512-526. [Tamura-Nei model] Lecture 4. Models of DNA and protein change. Likelihood methods p.38/39

How it was done This projection produced using the prosper style in LaTeX, using Latex to make a.dvi file, using dvips to turn this into a Postscript file, using ps2pdf to make it into a PDF file, and displaying the slides in Adobe Acrobat Reader. Result: nice slides using freeware. Lecture 4. Models of DNA and protein change. Likelihood methods p.39/39