Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Similar documents
Mutation models I: basic nucleotide sequence mutation models

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

BMI/CS 776 Lecture 4. Colin Dewey

What Is Conservation?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Constructing Evolutionary/Phylogenetic Trees

Genetic distances and nucleotide substitution models

Molecular Evolution and Phylogenetic Tree Reconstruction

Lecture Notes: Markov chains

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

Taming the Beast Workshop

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Evolutionary Analysis of Viral Genomes

Reconstruire le passé biologique modèles, méthodes, performances, limites

Lecture 4. Models of DNA and protein change. Likelihood methods

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

EVOLUTIONARY DISTANCES

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Dr. Amira A. AL-Hosary

Phylogenetic Tree Reconstruction

Evolutionary Theory and Principles of Phylogenetics. Lucy Skrabanek ICB, WMC February 25, 2010

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Letter to the Editor. Department of Biology, Arizona State University

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

Evolutionary Models. Evolutionary Models

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogenetics: Building Phylogenetic Trees

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Theory of Evolution Charles Darwin

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Assumptions

Week 5: Distance methods, DNA and protein models

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Chapter 7: Models of discrete character evolution

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Maximum Likelihood in Phylogenetics

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Lab 9: Maximum Likelihood and Modeltest

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Reading for Lecture 13 Release v10

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Inferring Molecular Phylogeny

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Probabilistic modeling and molecular phylogeny

arxiv: v1 [q-bio.pe] 4 Sep 2013

Phylogenetic Inference and Hypothesis Testing. Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

7. Tests for selection

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Maximum Likelihood in Phylogenetics

Lecture 6 Phylogenetic Inference

Molecular Evolution, course # Final Exam, May 3, 2006

Concepts and Methods in Molecular Divergence Time Estimation

On Estimating Topology and Divergence Times in Phylogenetics

BINF6201/8201. Molecular phylogenetic methods

Algorithms in Bioinformatics

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

7.36/7.91 recitation CB Lecture #4

The Phylo- HMM approach to problems in comparative genomics, with examples.

A phylogenetic view on RNA structure evolution

Quantifying sequence similarity

Predicting the Evolution of two Genes in the Yeast Saccharomyces Cerevisiae

Estimating Divergence Dates from Molecular Sequences

Molecular Evolution and Comparative Genomics

Distance Corrections on Recombinant Sequences

Using algebraic geometry for phylogenetic reconstruction

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetics. BIOL 7711 Computational Bioscience

Lecture 10: Phylogeny

A Phylogenetic Gibbs Recursive Sampler for Locating Transcription Factor Binding Sites

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Phylogeny Jan 5, 2016

The Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences

Transcription:

GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon

AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon

Likelihood (Prob. of data given model & parameter values) = AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon Likelihood for Site 1 X AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon Likelihood for Site 2 X AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon... X... X... X Likelihood for Site 6 AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon

A G G AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon

Prob. of A at end of branch given A at beginning 0.4 0.6 0.8 1.0 Jukes-Cantor Transition Probabilities 0 1 2 3 4 Expected number of changes per site (branch length)

Jukes-Cantor Transition Probabilities Prob. of G at end of branch given A at beginning 0.0 0.05 0.10 0.15 0.20 0.25 0 1 2 3 4 Expected number of changes per site (branch length)

Probabilistic models of nucleotide change (independently and identically evolving sites) Let q ij be the instantaneous rate of change at a site from nucleotide type i to j Q is matrix of instantaneous rates (Q will have 4 rows and 4 columns because i and j can each by any of 4 nucleotide types) For nucleotide starting as type i at time 0, probability nucleotide is type j at time t is denoted p ij (t). p ij (t) is referred to as a transition probability.

Consider a very very small amount of evolutionary time t. When i j, p ij ( t). = q ij t p ii ( t). = 1 q ij t j,j i where p ii ( t). = 1 + q ii t q ii = j,j i q ij (in preceding equations, =. can be replaced by = when the limit as t approaches 0 is taken)

Jukes-Cantor model is simplest model of nucleotide substitution. It assumes sequence positions evolve independently and it assumes that all possible changes at a position are equally likely. Let π j be probability a residue is type j. π j is called the equilibrium probability of type j. p ij ( ) = π j For Jukes-Cantor model, π j = 1/4 for all 4 nucleotide types j.

General Time Reversible Model 3 subst. types (transitions, 2 transversion classes) Tamura-Nei HKY85 Felsenstein 84 Felsenstein 81 Equal Base Frequencies 2 subst. types (transitions vs. transversions) Single Substitution Type 3 subst. types (transitions, 2 transversion classes ) SYM 2 subst. types (transitions vs. transversions) K3ST Kimura 2 Param. Jukes-Cantor Equal Base Frequencies Single Substitution Type based on Figure 11 from Swofford et al. Chapter in Molecular Systematics (Sinauer, 2nd ed. 1996)

Rate Matrix for Jukes-Cantor Model F R To O M A C G T A 3µ µ µ µ C µ 3µ µ µ G µ µ 3µ µ T µ µ µ 3µ Note 1: Diagonal matrix elements multiplied by 1 are rate away from nucleotide type of that row. Note 2: In later slide on Jukes-Cantor model, we write s/3 rather than µ.

Rate Matrix for Kimura 2-Parameter Model F R To O M A C G T A α 2β β α β C β α 2β β α G α β α 2β β T β α β α 2β Changes involving only purines (i.e., A and G) or only pyrimidines (i.e., C and T) are transitions. Changes involving one purine and one pyrimidine are transversions.

Rate Matrix for Felsenstein 1981 Model F R To O M A C G T A µ(π C + π G + π T ) µπ C µπ G µπ T C µπ A µ(π A + π G + π T ) µπ G µπ T G µπ A µπ C µ(π A + π C + π T ) µπ T T µπ A µπ C µπ G µ(π A + π C + π G )

Rate Matrix for Hasegawa-Kishino-Yano (a.k.a. HKY or HKY85) Model F R To O M A C G T A µ(π C + κπ G + π T ) µπ C µκπ G µπ T C µπ A µ(π A + π G + κπ T ) µπ G µκπ T G µκπ A µπ C µ(κπ A + π C + π T ) µπ T T µπ A µκπ C µπ G µ(π A + κπ C + π G )

Rate Matrix for General Time Reversible Model F R To O M A C G T A µ(aπ C + bπ G + cπ T ) µaπ C µbπ G µcπ T C µaπ A µ(aπ A + dπ G + eπ T ) µdπ G µeπ T G µbπ A µdπ C µ(bπ A + dπ C + fπ T ) µfπ T T µcπ A µeπ C µfπ G µ(cπ A + eπ C + fπ G )

Time Reversibility is a common property of models of sequence evolution. Time reversibility means that π i p ij (t) = π j p ji (t) for all i, j, and t. π i q ij = π j q ji for all i and j. For phylogeny reconstruction, time reversibility means that we cannot (on the basis of sequence data alone) hope to distinguish which of two sequence is ancestral and which is the descendant.

The practical implication of time reversibility for phylogeny reconstruction is that maximum likelihood cannot infer the position of the root of the tree unless additional information information exists (e.g., which taxa are the outgroups) or additional assumptions are made (e.g., a molecular clock).

Q will represent matrix of instantaneous rates of change. For general time reversible model, entries of Q are: To From A C G T A (aπ C + bπ G + cπ T ) aπ c bπ G cπ T C aπ A (aπ A + dπ G + eπ T ) dπ G eπ T G bπ A dπ C (bπ A + dπ C + fπ T ) fπ T T cπ A eπ C fπ G (cπ A + eπ C + fπ G ) In above matrix: a, b, c, d, e, and f cannot be negative. With any rate matrix (including above), the transition probabilities P (t) can be determined from the rate matrix Q and the amount of evolution t via P (t) = e Qt = I + (Qt) 1! + (Qt)2 2! + (Qt)3 3! +..., where I is the identity matrix.

Computing p ij (t) for the Jukes-Cantor model The Jukes Cantor model assumes that this is how nucleotide substitution occurs: 0. π A = π G = π C = π T = 1 4. 1. For each site in the sequence, an event will occur with probability 4 3s per unit evolutionary time. 2. If no event occurs, the residue at the site does not change. 3. If an event occurs, the probability that a residue is type i after the event is π i.

What is the probability that no event occurs in t units of evolutionary time? (1 4 3 s) (1 4 3 s) (1 4 3 s)... (1 4 3 s) = (1 4 3 s)t. When 3 4 s is close to 0, 1 4 3 s. = e 4 3 s.

Pr (no event) = (1 4 3 s)t. = e 4 3 st. When s is redefined as an instantaneous rate per unit evolutionary time, the approximation becomes an equality: Pr (no event) = e 4 3 st. Pr (at least one event) = 1 Pr (no event) = 1 e 4 3 st.

If there have been no events, then the residue cannot possibly have changed after an amount of evolution t. If there has been at least one event, then the residue is type j with probability π j. p ii (t) = Pr (no events) + Pr (at least one event)π j = e 4 3 st + (1 e 4 3 st )π j.

For i j, p ij (t) = Pr (at least one event)π j = (1 e 4 3 st )π j. Notice that 4 3 s and t appear only as a product. 4 3 s and t cannot be separately estimated. Only their product can be estimated. Note: A generalization of the Jukes Cantor model, the Felsenstein 1981 model does not require π A = π G = π C = π T = 1 4.