Concepts and Methods in Molecular Divergence Time Estimation

Similar documents
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Taming the Beast Workshop

Constructing Evolutionary/Phylogenetic Trees

8/23/2014. Phylogeny and the Tree of Life

Reconstructing the history of lineages

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE TEMPORAL FRAMEWORK OF CLADES

Dr. Amira A. AL-Hosary

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Theory of Evolution Charles Darwin

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

DNA-based species delimitation

Estimating Evolutionary Trees. Phylogenetic Methods

Dating r8s, multidistribute

Constructing Evolutionary/Phylogenetic Trees

Inferring Speciation Times under an Episodic Molecular Clock

How should we organize the diversity of animal life?

Anatomy of a species tree

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

EVOLUTIONARY DISTANCES

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

A (short) introduction to phylogenetics

Estimating the Rate of Evolution of the Rate of Molecular Evolution

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Phylogenetic inference

Molecular Evolution and Phylogenetic Tree Reconstruction

Macroevolution Part I: Phylogenies

Accepted Article. Molecular-clock methods for estimating evolutionary rates and. timescales

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

From Individual-based Population Models to Lineage-based Models of Phylogenies

Phylogenetic Tree Reconstruction

BINF6201/8201. Molecular phylogenetic methods

Theory of Evolution. Charles Darwin

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

How to read and make phylogenetic trees Zuzana Starostová

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Lecture 6 Phylogenetic Inference


How can molecular phylogenies illuminate morphological evolution?

A primer on phylogenetic biogeography and DEC models. March 13, 2017 Michael Landis Bodega Bay Workshop Sunny California

Phylogenetics. BIOL 7711 Computational Bioscience

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

What is Phylogenetics

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Phylogenetic analysis. Characters

arxiv: v2 [q-bio.pe] 18 Oct 2013

Understanding relationship between homologous sequences

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

A Simple Method for Estimating Informative Node Age Priors for the Fossil Calibration of Molecular Divergence Time Analyses

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

A Phylogenetic Network Construction due to Constrained Recombination

A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera

APPENDIX S1: DESCRIPTION OF THE ESTIMATION OF THE VARIANCES OF OUR MAXIMUM LIKELIHOOD ESTIMATORS

Integrating Fossils into Phylogenies. Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained.

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

Testing quantitative genetic hypotheses about the evolutionary rate matrix for continuous characters

Phylogeny and the Tree of Life

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Chapter 7: Models of discrete character evolution

Consensus Methods. * You are only responsible for the first two

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Chapter 26: Phylogeny and the Tree of Life

Biology 211 (2) Week 1 KEY!

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Reading for Lecture 13 Release v10

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Reconstruire le passé biologique modèles, méthodes, performances, limites

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

7. Tests for selection

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Introduction to characters and parsimony analysis

1 ATGGGTCTC 2 ATGAGTCTC

Quartet Inference from SNP Data Under the Coalescent Model

The phylogenetic effective sample size and jumps

Examining the Fossil Record

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

(haemoglobins, cytochrome c, fibrinopeptides) from different species of mammals

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Workshop III: Evolutionary Genomics

Evolutionary Models. Evolutionary Models

Biol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016

C.DARWIN ( )

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Lecture 11 Friday, October 21, 2011

Transcription:

Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History

Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks and autocorrelated rates 4. Bayesian inference using uncorrelated rates 5. Fossil calibrations and uncertainty 6. Innovations in molecular dating

Why do we need dates? Biogeographic hypothesis testing Crisp et al. (2011) Trends Ecol Evol 26: 66-72.

Why do we need dates? Testing hypotheses of co- diversivication Cruaud et al. (2012) Syst Biol 61: 1029-1047.

Why do we need dates? Quantifying and characterizing rates of diversivication Rouse et al. (2013) Mol Phylogenet Evol 66: 161-181.

Why do we need dates? Inferring the age of important evolutionary events Timetree of Life Project (there s an app for that)

Why do we need molecular dates? The fossil record is incomplete The fossil record is biased taxonomically and taphonomically Fossil dates may not be precise At best, fossil dates are minimum age estimates

Zuckerkandl and Pauling (1965) J Theor Biol 8: 357-366.

K = number of substitutions per site T = Time R= rate Ancestor R t = K / (2T) Time Descendent 1 Descendent 2

Strict molecular clocks Phylogeny with branch lengths One or more node estimates Time We can predict dates of other nodes in the tree Time for divergence of novel sequences T ij = d ij / 2r Assumption: Probability of substitutions is constant over time

The theory of the molecular clock Rate constancy is an extension of neutral theory of molecular evolution (Kimura, King and Jukes, 1968-1969) Strongly invluenced early models of molecular evolution (Jukes- Cantor, 1969) A tree under a molecular clock does not have to be exactly ultrametric; only the probability of mutation per unit time is constant How do we test if a tree is ultrametric enough?

Testing the molecular clock 1. Likelihood ratio test 2. Relative rates test

Likelihood ratio test Procedure: Estimate a molecular phylogeny with, and without, a molecular clock Calculate 2[log (L 1 ) log (L 2 )] X 2 test with df = (n 2), where n = number of terminals

Relative rates test T0 T1 T2 T3

Relative rates test T0 K 01 K 02 T1 T2 T3 H 0 : K 01 = K 02 or H 0 : K 01 K 02 = 0

Relative rates test T0 K 01 K 02 T1 T2 T3 K 13 = K 01 + K 03 K 23 = K 02 + K 03 K 12 = K 01 + K 02

Relative rates test T0 K 01 K 02 T1 T2 T3 K 01 = (K 13 + K 12 K 23 )/2 K 02 = (K 12 + K 23 K 13 )/2 K 03 = (K 13 + K 23 K 12 )/2

Relative rates test T0 K 01 K 02 T1 T2 T3 K 01 K 02 = K 13 K 23

Relative rates test It can be shown that this statistic is normally distributed for large samples Calculate for all triplets Calculate Z score Z < 1.96

ConVidence limits for molecular clocks Substitutions occur as a linear function of time Probability of substitution per unit time is constant Rate variation must have a Poisson distribution Time Substitutions

Sources of rate heterogeneity Physiology/life history: generation time Demography: genetic drift affects small populations more strongly than large ones Selection/relaxation: increase or decrease in evolutionary rate Gene duplication: neofunctionalized paralogs evolve faster than copies retaining ancestral functions (Assis and Bachtrog, 2013)

Local clocks O huigin and Li (1992) J Mol Evol 35: 377-384.

Local clocks K s (mouse- rat) = 18.0% K s (mouse- hamster) = 30.3% K s (rat- hamster) = 31.3% Hamsters diverged 1.7 times earlier than mouse- rat divergence There is a molecular clock for rodents Mouse Rat Hamster But substitution rates are higher in rodents than in primates O huigin and Li (1992) J Mol Evol 35: 377-384.

Local clocks Multiple molecular clocks occur in different parts of a tree Rate 2 Rate 1 Rate autocorrelation: substitution rates are heritable Descendent nodes inherit the substitution rate of their ancestor nodes Rate 3

Methods using many rates, assuming rate autocorrelation 1. Non- parametric rate smoothing (Sanderson, 1997) 2. Penalized likelihood (Sanderson, 2002)

Non- parametric rate smoothing b 0 i b 1 b 2 Measure of rate roughness at node i: R i = (r b0 r b1 ) 2 + (r b0 r b2 ) 2 Sanderson (1997) Mol Biol Evol 14: 1218 1231

Non- parametric rate smoothing b 0 i b 1 b 2 Adjust branching times in order to minimize overall roughness, ΣR i

Non- parametric rate smoothing b 0 i b 1 b 2 Drawbacks: 1. Assumes branch lengths are known with complete certainty 2. Attributes differences in sister branches exclusively to variation in rate of evolution

Penalized likelihood Find the set of branch lengths and rates that minimizes the function: Log(L) λp where: λ is a user- devined smoothing parameter P is a penalty function Sanderson (2002) Mol Biol Evol 19: 101 109

Penalized likelihood Log(L) λp Penalty function, P In Sanderson s formulation, the quadratic roughness function R i was used Alternative penalty functions: 1. Lognormal 2. Exponential 3. Ornstein- Uhlenbeck process Sanderson (2002) Mol Biol Evol 19: 101 109

Penalized likelihood Log(L) λp Smoothing parameter, λ Determined empirically through cross- validation procedure Drawback: Computationally expensive and difvicult to implement Sanderson (2002) Mol Biol Evol 19: 101 109

Relaxed clock methods What happens if evolutionary rates are not autocorrelated? Uncorrelated clock methods implement evolutionary rates as prior distributions

Bayesian Evolutionary Inference of Species Trees (BEAST) Implements strict clocks, autocorrelated rate models, and uncorrelated rate models MCMC procedure to derive posterior distributions of Tree topology Rates Divergence times Calibration points can be distributions, not point estimates Drummond et al. (2006) PLoS Biol 4: e55

BEAST f (g, Θ, Φ, Ω D) = (1/Z) Pr {D g, Φ, Ω) f G (g Φ) f ΘΦΩ (Θ, Φ, Ω) Φ : parameters of the relaxed clock model Ω: parameters of the substitution model Θ: hyperparameters of the tree prior Pr {D g, Φ, Ω): standard term for likelihood, where g is a tree with branch lengths in time units f G (g Φ): the tree prior (Yule, birth- death, or coalescent- based) Drummond et al. (2006) PLoS Biol 4: e55

Properties of uncorrelated clock models Strict clocks and rate autocorrelation are special cases of uncorrelated rate models Uncorrelated lognormal distributions better account for cases where evolution is clock- like Uncorrelated exponential distributions have high variance (2 10x higher than uncorrelated lognormal) Drummond et al. (2006) PLoS Biol 4: e55

Uncertainty is inherent to molecular dating Tree topology Branch lengths Rate variation Fossil calibration

Fossil calibrations How old is the fossil? Where does the fossil Vit in the tree? What does the placement of the fossil mean for the calibration?

Fossil calibrations Fossil age Uncertainty Fossil taxon Extant taxa Past a: Time of fossil lineage s divergence b: Time of fossil lineage s extinction a b Present

Fossil calibrations Past a Present Alternative 1: Constrain preceding node using fossil Alternative 2: Constrain subsequent node using fossil b

Fossil calibrations Past a b Present a is never observed in molecular dating For this reason, fossil calibrations yield mimimum age estimates

Closing the rocks and clocks gap Realistic priors for fossil calibrations Exponential and lognormal distribution priors Interval estimates on fossil ages Increasing sampling of fossils used for calibration Cross- validation of fossil calibrations

Total evidence molecular dating 161 morphological characters RAG- 1 (2652 bp) Pyron (2011) Syst Biol 60: 466-481

Problems with integrating morphology in dating Model- based approaches to dating require a model for morphological data partitions The Lewis (2001) model Abundant evidence for rate heterogeneity in morphological evolution

Improving precision in molecular dating After Shih and Matzke (2013) Proc Natl Acad Sci USA 110: 12355-12360

Shih and Matzke (2013) Proc Natl Acad Sci USA 110: 12355-12360

14-26% reduction in size of convidence intervals Shih and Matzke (2013) Proc Natl Acad Sci USA 110: 12355-12360

Summary 1. Molecular dating is a matter of quantifying uncertainty a. Tree topology b. Branch length c. Rate variation d. Fossil age calibration e. Fossil placement 2. Implement with caution, interpret with skepticism