How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Similar documents
Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Maximum Likelihood in Phylogenetics

Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics

Lecture 4. Models of DNA and protein change. Likelihood methods

What Is Conservation?

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Bayesian Phylogenetics

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Letter to the Editor. Department of Biology, Arizona State University

Inferring Molecular Phylogeny

Lecture 4. Models of DNA and protein change. Likelihood methods

Lab 9: Maximum Likelihood and Modeltest

A Bayesian Approach to Phylogenetics

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Reconstruire le passé biologique modèles, méthodes, performances, limites

Lecture Notes: Markov chains

The Importance of Proper Model Assumption in Bayesian Phylogenetics

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Inferring Speciation Times under an Episodic Molecular Clock

Bayesian Phylogenetics

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetic Inference using RevBayes

Phylogenetic Tree Reconstruction

Molecular Evolution, course # Final Exam, May 3, 2006

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

Bayesian Models for Phylogenetic Trees

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Phylogenetics: Building Phylogenetic Trees

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Molecular Evolution & Phylogenetics

Mutation models I: basic nucleotide sequence mutation models

BMI/CS 776 Lecture 4. Colin Dewey

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference

An Introduction to Bayesian Inference of Phylogeny

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

EVOLUTIONARY DISTANCES

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

MCMC: Markov Chain Monte Carlo

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Natural selection on the molecular level

7. Tests for selection

Probabilistic modeling and molecular phylogeny

Dr. Amira A. AL-Hosary

Estimating Evolutionary Trees. Phylogenetic Methods

Points of View JACK SULLIVAN 1 AND DAVID L. SWOFFORD 2

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Taming the Beast Workshop

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution

Limitations of Markov Chain Monte Carlo Algorithms for Bayesian Inference of Phylogeny

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Phylogenetic Assumptions

Hastings Ratio of the LOCAL Proposal Used in Bayesian Phylogenetics

Lecture 6 Phylogenetic Inference

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Taming the Beast Workshop

Estimating the Rate of Evolution of the Rate of Molecular Evolution

Phylogenetic Inference using RevBayes

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation

Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument

Phylogenetics. BIOL 7711 Computational Bioscience

Branch-Length Prior Influences Bayesian Posterior Probability of Phylogeny

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bayesian Phylogenetics

The Phylo- HMM approach to problems in comparative genomics, with examples.

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Evolutionary Analysis of Viral Genomes

Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics

Nucleotide substitution models

Estimating Divergence Dates from Molecular Sequences

Phylogeny of Mixture Models

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Concepts and Methods in Molecular Divergence Time Estimation

Chapter 7: Models of discrete character evolution

Transcription:

How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? 1

Maximum likelihood estimation First 32 nucleotides of the ψη-globin gene of gorilla and orangutan: gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG logl -60-59 -58-57 -56-55 -54 0.0 0.1 0.2 0.3 0.4 maximum likelihood estimate (MLE) of v is 0.065259 v plot of log-likelihood as a function of branch length v Knowing that v = rate time, Why does the curve drop as you move left of the MLE? Why does the curve drop as you move right of the MLE? 2

What does this mean? Why is this number negative? From A C G T JC69 rate matrix 3 3 To A C G T 3 1 parameter: α 3 Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pages 21-132 in H. N. Munro (ed.), Mammalian Protein Metabolism. Academic Press, New York. 3

Equilibrium Frequencies A sequence consisting only of A... C T G A Perfume bottle broken, and perfume quickly fills the room 4

Equilibrium Frequencies If all doors are suddenly opened, perfume will spread by diffusion to the other rooms... C T G A The instant the doors open, the rate away from A is 3α (i.e. rate = -3α) 5

Equilibrium Frequencies C T G A Sequence now contains a few Cs, Gs, and Ts... AATAAAAAAAAAAAA AAAACAAAAAATAAA AAAAAAACGAAGAAA AAATAAAAAAAAAAA As perfume spreads by diffusion, the difference in concentration among rooms decreases... 6

Equilibrium Frequencies C T G A Sequence contains a mixture of about equal quantities A, C, G and T CAGAATCGAGCAGCT TGACTACGTCATGTG GTTGCGCCGCAACGC CATATACCGCCGACT AGTTTGAGGGCGGTT AGGGCTCGGTTCGTA CATCGTATAAACATT After a long time, equilibrium (=stationarity) is achieved. 7

Stationarity Assumed Models assume stationarity, which means that the state frequencies do not change across the tree Models assume that substitution process has reached equilibrium by this point 8

From 2 parameters: α β To A C G T K80 (or K2P) rate matrix A C G T 2 2 2 2 transition rate transversion rate Kimura, M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111-120. 9

K80 rate matrix (looks different, but actually the same) 2 parameters: κ β A C G T A C G T ( + 2) ( + 2) ( + 2) ( + 2) All I ve done is define κ to be the transition/transversion rate ratio = Note: the K80 model is identical to the JC69 model if κ = 1 (α = β) 10

Likelihood Surface when K80 true K80 model (entire 2d space) Based on simulated data: sequence length = 500 sites true branch length = 0.15 true kappa = 5.0 JC69 model (just this 1d line) apple (trs/trv rate ratio) (branch length) 11

Likelihood Surface when JC true Based on simulated data: sequence length = 500 sites true branch length = 0.15 true kappa = 1.0 K80 model (entire 2d space) JC69 model (just this 1d line) apple (trs/trv rate ratio) (branch length) 12

F81 rate matrix 4 parameters: µ πa πc πg A C G T A C G T µ(1 A ) C µ G µ T µ A µ µ(1 C ) G µ T µ A µ C µ µ(1 G ) T µ A µ C µ G µ µ(1 T ) Note: the F81 model is identical to the JC69 model if all base frequencies are equal Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17:368-376. 13

A C G T HKY85 rate matrix A C G T C G T A G T A C T A C G 5 parameters: κ β πa πc πg A dash means equal to negative sum of other elements on the same row Note: the HKY85 model is identical to the F81 model if κ = 1. If, in addition, all base frequencies are equal, it is identical to JC69. Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 21:160-174. 14

A C G T GTR rate matrix A C G T C aµ G bµ T cµ A aµ G dµ T eµ A bµ C dµ T fµ A cµ C eµ G fµ 9 parameters: π A π C π G a b c d e µ Identical to the F81 model if a = b = c = d = e = f = 1. If, in addition, all the base frequencies are equal, GTR is identical to JC69. If a = c = d = f = β and b = e = κβ, GTR becomes the HKY85 model. Lanave, C., G. Preparata, C. Saccone, and G. Serio. 1984. A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution 20:86-93. 15

So, how do you turn likelihoods into probabilities? this is the likelihood surface: Pr(data κ,ν) want probability surface: Pr(κ,ν data) apple 16

Kinds of probabilities B = Black S = Solid W = White D = Dotted Marginal probabilities: Joint probabilities: 17

Kinds of probabilities (continued) Conditional probability 18

Bayes rule provides a way to calculate conditional probabilities 19

Bayes rule shows how to turn Pr(D B) into Pr(B D) 20

The marginal probability of D is the sum of all joint probabilities involving D 21

Bayes rule in statistics Likelihood of hypothesis θ Prior probability of hypothesis θ Pr( D) = Pr(D ) Pr( ) Pr(D ) Pr( ) Posterior probability of hypothesis θ Marginal probability of the data (marginalizing over hypotheses) 22

Moving through treespace The Larget-Simon move X Step 1: Pick 3 contiguous edges randomly, defining two subtrees, X and Y Y *Larget, B., and D. L. Simon. 1999. Markov chain monte carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular Biology and Evolution 16: 750-759. See also: Holder et al. 2005. Syst. Biol. 54: 961-965. 23

Moving through treespace The Larget-Simon move YY X X Step 1: Pick 3 contiguous edges randomly, defining two subtrees, X and Y Step 2: Shrink or grow selected 3-edge segment by a random amount 24

Moving through treespace The Larget-Simon move Y X Step 1: Pick 3 contiguous edges randomly, defining two subtrees, X and Y Step 2: Shrink or grow selected 3-edge segment by a random amount 25

Moving through treespace The Larget-Simon move Y Step 1: Pick 3 contiguous edges randomly, defining two subtrees, X and Y Step 2: Shrink or grow selected 3-edge segment by a random amount Step 3: Choose X or Y randomly, then reposition randomly 26

Moving through treespace The Larget-Simon move Proposed new tree: 3 edge lengths have changed and the topology differs by one NNI rearrangement Step 1: Pick 3 contiguous edges randomly, defining two subtrees, X and Y Step 2: Shrink or grow selected 3-edge segment by a random amount Step 3: Choose X or Y randomly, then reposition randomly 27

Moving through treespace Current tree Proposed tree log-posterior = -34256 log-posterior = -32519 (better, so accept) 28