Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Similar documents
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics

Mutation models I: basic nucleotide sequence mutation models

What Is Conservation?

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Nucleotide substitution models

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Molecular Evolution, course # Final Exam, May 3, 2006

Phylogenetics: Building Phylogenetic Trees

Maximum Likelihood in Phylogenetics

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Lecture 4. Models of DNA and protein change. Likelihood methods

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Maximum Likelihood in Phylogenetics

Likelihood in Phylogenetics

Lab 9: Maximum Likelihood and Modeltest

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Lecture Notes: Markov chains

Maximum Likelihood in Phylogenetics

Genetic distances and nucleotide substitution models

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics

Phylogenetics. Andreas Bernauer, March 28, Expected number of substitutions using matrix algebra 2

Inferring Molecular Phylogeny

Letter to the Editor. Department of Biology, Arizona State University

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

1. Can we use the CFN model for morphological traits?

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Taming the Beast Workshop

BMI/CS 776 Lecture 4. Colin Dewey

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Chapter 7: Models of discrete character evolution

Evolutionary Analysis of Viral Genomes

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

Week 5: Distance methods, DNA and protein models

Reconstruire le passé biologique modèles, méthodes, performances, limites

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

7. Tests for selection

Bayesian Phylogenetics

Lecture 4. Models of DNA and protein change. Likelihood methods

Phylogenetic Inference using RevBayes

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Counting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK-50. University of Washington

Phylogenetic Gibbs Recursive Sampler. Locating Transcription Factor Binding Sites

Constructing Evolutionary/Phylogenetic Trees

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r)

C.DARWIN ( )

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Phylogenetic Inference using RevBayes

Phylogenetic Tree Reconstruction

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

A Phylogenetic Gibbs Recursive Sampler for Locating Transcription Factor Binding Sites

Evolutionary Models. Evolutionary Models

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

A Bayesian Approach to Phylogenetics

Natural selection on the molecular level

Introduction to MEGA

The Importance of Proper Model Assumption in Bayesian Phylogenetics

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Constructing Evolutionary/Phylogenetic Trees

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Quantifying sequence similarity

Dr. Amira A. AL-Hosary

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation

A phylogenetic view on RNA structure evolution

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

Points of View JACK SULLIVAN 1 AND DAVID L. SWOFFORD 2

Molecular Evolution and Phylogenetic Tree Reconstruction

Estimating Divergence Dates from Molecular Sequences

Probabilistic modeling and molecular phylogeny

Phylogenetic Inference and Hypothesis Testing. Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Reading for Lecture 13 Release v10

Understanding relationship between homologous sequences

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Prime Factorization and GCF. In my own words

Transcription:

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

y = -1.5972 x 5 + 23.167 x 4-126.18 x 3 + 319.17 x 2-369.22 x + 155.67 10 Why do we need models? 8 6 4 2 0-2.5-2 -1.5-1 -0.5 0 0.5 1 1.5 2 2.5-2 -4-6 y = 0.6611 x Copyright 2007 Paul O. Lewis 2

Models Models help us intelligently interpolate between our observations for purposes of making predictions Adding parameters to a model generally increases its fit to the data Underparameterized models lead to poor fit to observed data points Overparameterized models lead to poor prediction of future observations Criteria for choosing models include likelihood ratio tests, AIC, BIC, Bayes Factors, etc. all provide a way to choose a model that is neither underparameterized nor overparameterized Copyright 2007 Paul O. Lewis 3

The Poisson distribution Probability distribution on the number of events when: 1. events are assumed to be independent, 2. the rate of events some constant, µ, and 3. the process continues for some duration of time, t. The expectation of the number of events is ν = µt. Note that ν can be any non-negative number, but the Poisson is a discrete distribution it gives the probabilities of the number of events (and this number will always be a non-negative integer).

The Poisson distribution Pr(k events Expected # is ν) = νk e ν k! Pr(0 events) = ν0 e ν 0! = e ν = e µt Pr( 1 events) = 1 e ν = 1 e µt

"Disruptions" vs. substitutions When a disruption occurs, any base can appear in a sequence. T? Note: disruption is my term for this make-believe event. You will not see this term in the literature. If the base that A appears is different C G from the base that was already there, then a substitution event has occurred. T The rate at which any particular substitution occurs will be 1/4 the disruption rate (assuming equal base frequencies) Copyright 2007 Paul O. Lewis 13

Probability of T G over time t If µ is the rate of disruptions, and a branch is t units of time long then: Let s use θ for the rate of any particular disruption. µ T A = µ T C = µ T G = µ T T = θ µ = 4θ Furthermore, given that there is a disruption the chance of any particular change is 1 4

Probability of T G over time t Pr(0 disruptions t) = e µt Pr(at least 1 disruption t) = 1 e µt Pr(last disruption leads to G) = 0.25 Pr(T G t) = 0.25 ( 1 e µt) = 0.25 ( 1 e 4θt)

JC69 model Bases are assumed to be equally frequent (all 0.25) Assumes rate of substitution (α) is the same for all possible substitutions Usually described as a 1-parameter model (the parameter being α) Remember, however, that each edge in a tree can have its own α, so there are really as many parameters in the model as there are edges in the tree! Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pages 21-132 in H. N. Munro (ed.), Mammalian Protein Metabolism. Academic Press, New York. Copyright 2007 Paul O. Lewis 15

JC transition probabilities Pr(T A t) = 0.25 ( 1 e 4θt) Pr(T C t) = 0.25 ( 1 e 4θt) Pr(T G t) = 0.25 ( 1 e 4θt) Pr(T T t) = 0.25 ( 1 e 4θt) but this only adds up to: ( 1 e 4θt ) instead of 1!

We left out the probability of no disruptions:e 4θt So: Pr(T A t) = 0.25 ( 1 e 4θt) Pr(T C t) = 0.25 ( 1 e 4θt) Pr(T G t) = 0.25 ( 1 e 4θt) Pr(T T t) = e 4θt + 0.25 ( 1 e 4θt) = 0.25 + 0.75e 4θt

JC transition probabilities Pr(i j t) = 0.25 ( 1 e 4θt) Pr(i i t) = 0.25 + 0.75e 4θt When t = 0, then e 4θt = 1, and: Pr(i j t) = 0 Pr(i i t) = 1

JC transition probabilities Pr(i j t) = 0.25 ( 1 e 4θt) Pr(i i t) = 0.25 + 0.75e 4θt When t =, then e 4θt = 0, and: Pr(i j t) = 0.25 Pr(i i t) = 0.25

Probability of A present as a function of time Upper curve assumes we started with A at time 0. Over time, the probability of still seeing an A at this site drops because rate of changing to one of the other three bases is 3α (so rate of staying the same is -3α). The equilibrium relative frequency of A is 0.25 Lower curve assumes we started with some state other than A (T is used here). Over time, the probability of seeing an A at this site grows because the rate at which the current base will change into an A is α. Copyright 2007 Paul O. Lewis 24

Water analogy (time 0) 3α A C G T Start with container A completely full and others empty Imagine that all containers are connected by tubes that allow same rate of flow between any two Initially, A will be losing water at 3 times the rate that C (or G or T) gains water α Copyright 2007 Paul O. Lewis 25

Water analogy (after some time) A C G T A s level is not dropping as fast now because it is now also receiving water from C, G and T Copyright 2007 Paul O. Lewis 26

Water analogy (after a very long time) A C G T Eventually, all containers are one fourth full and there is zero net volume change stationarity (equilibrium) has been achieved (Thanks to Kent Holsinger for this analogy) Copyright 2007 Paul O. Lewis 27

JC instantaneous rate matrix - the Q matrix for JC The 1 parameter is α (sometimes parameterized in terms of µ). This is the rate of replacements ( disruptions that change the state): To State A C G T From A 3α α α α C α 3α α α State G α α 3α α T α α α 3α

Change probabilities We can calculate a transition probability matrix as a function of time by: P(t) = e Qt The important thing to note is the rates (Q matrix) is multiplied by the time. We can t separate rates and times since we always see the effect of their product. Is a medium level of character divergence: 1. medium rate of change and medium amount of time, 2. high rate, but short time period, 3. low rate, but a long time period?

JC instantaneous rate matrix again What if you do not know the length of time for a branch in the tree? We estimate branch lengths in terms of character divergence the product of rate and time. What is important is that we know the relative rates of different types of substitutions, so JC can be expressed: To State A C G T From A 3 1 1 1 C 1 3 1 1 State G 1 1 3 1 T 1 1 1 3

JC instantaneous rate matrix yet again We estimate branch lengths in terms of expected number of changes per site. To do this we standardize the total rate of divergence in the Q matrix and estimate ν = µt = 3αt for each branch. From A 1 1 3 1 C State G 1 3 T 1 3 To State A C G T 1 3 1 3 1 3 1 3 1 3 1 1 3 1 3 1 3 1 3 1

? model or the K80 model Transitions and transversions occur at different rates: To State A C G T From A 2β α β α β C β 2β α β α State G α β 2β α β T β α β 2β α

? model or the K80 model. Reparameterized. Once again, we care only about the relative rates, so we can choose one rate to be frame of reference. This turns the 2 parameter model into a 1 parameter form: From State To State A C G T A (2 + κ)β β κβ β C β (2 + κ) β κβ G κβ β (2 + κ) β T β κβ β (2 + κ)

? model or the K80 model. Reparameterized again. To State A C G T From A 2 κ 1 κ 1 C 1 2 κ 1 κ State G κ 1 2 κ 1 T 1 κ 1 2 κ

Kappa is the transititon/transversion rate ratio: κ = α β (if κ = 1 then we are back to JC).

What is the instantaneous probability of an particular transversion? Pr(A C) = Pr(A) Pr(change to C) = 1 4 (βdt)

What is the instantaneous probability of an particular transition? Pr(A G) = Pr(A) Pr(change to G) = 1 4 (κβdt)

There are four types of transitions: A G, G A, C T, T C and eight types of transversions: A C, A T, G C, G T, C A, C G, T A, T G Ti/Tv ratio = Pr(any transition) Pr(any transversion) = 4 ( 1 4 (κβdt)) 8 ( 1 4 (βdt)) = κ 2 For K2P instantaneous transition/transversion ratio is one-half the instantaneous transition/transversion rate ratio

Felsenstein 1981 model or F81 model To State A C G T From A π C π G π T C π State A π G π T G π A π C π T T π A π C π G

HKY 1985 model To State A C G T From A π C κπ G π T C π State A π G κπ T G κπ A π C π T T π A κπ C π G

F84 model: F84* vs. HKY85 μ rate of process generating all types of substitutions kμ rate of process generating only transitions Becomes F81 model if k = 0 HKY85 model: β rate of process generating only transversions κβ rate of process generating only transitions Becomes F81 model if κ = 1 *First used in PHYLIP in 1984, first published bykishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. Journal of Molecular Evolution 29: 170-179. Copyright 2007 Paul O. Lewis 38

General Time Reversible GTR model To State A C G T From A aπ C bπ G cπ T C aπ State A dπ G eπ T G bπ A dπ C fπ T T cπ A eπ C fπ G In PAUP, f = 1 indicating that G T is the reference rate

References Kimura, M. (1980). A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16:111 120.