Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Similar documents
Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 7: Bayesian inference, Testing trees, Bootstraps

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Phylogenetic inference

Inferring Molecular Phylogeny

Constructing Evolutionary/Phylogenetic Trees

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Constructing Evolutionary/Phylogenetic Trees

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

EVOLUTIONARY DISTANCES

Dr. Amira A. AL-Hosary

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Why IQ-TREE? Bui Quang Minh Australian National University. Workshop on Molecular Evolution Woods Hole, 24 July 2018

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Consensus Methods. * You are only responsible for the first two

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Hypothesis testing and phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenetic Tree Reconstruction

Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology.

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Evaluating phylogenetic hypotheses

Scientific ethics, tree testing, Open Tree of Life. Workshop on Molecular Evolution Marine Biological Lab, Woods Hole, MA.

A (short) introduction to phylogenetics

A Statistical Test of Phylogenies Estimated from Sequence Data

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Estimating Divergence Dates from Molecular Sequences

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

STAT440/840: Statistical Computing

Assessing Phylogenetic Hypotheses and Phylogenetic Data

Letter to the Editor. Department of Biology, Arizona State University

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Theory of Evolution Charles Darwin

Phylogenetics: Building Phylogenetic Trees

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Concepts and Methods in Molecular Divergence Time Estimation

Bootstrap, Jackknife and other resampling methods

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Bootstrap Method for Dependent Data Structure and Measure of Statistical Precision

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference

Inferring Molecular Phylogeny

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

The Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Standard Errors, Prediction Error and Model Tests in Additive Trees 1

Assessing Congruence Among Ultrametric Distance Matrices

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

How to read and make phylogenetic trees Zuzana Starostová


Emily Blanton Phylogeny Lab Report May 2009

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

The Nonparametric Bootstrap

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Lab 9: Maximum Likelihood and Modeltest

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Bayesian Models for Phylogenetic Trees

Better Bootstrap Confidence Intervals

DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE TEMPORAL FRAMEWORK OF CLADES

8/23/2014. Phylogeny and the Tree of Life

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Understanding relationship between homologous sequences

What Is Conservation?

Frequentist Properties of Bayesian Posterior Probabilities of Phylogenetic Trees Under Simple and Complex Substitution Models

On Improving the Output of. a Statistical Model

Modeling Trait Evolutionary Processes with More Than One Gene

7. Tests for selection

Estimation of species divergence dates with a sloppy molecular clock

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Phylogenetics in the Age of Genomics: Prospects and Challenges

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

Bayesian Phylogenetics:

Cladistics and Bioinformatics Questions 2013

Branch-Length Prior Influences Bayesian Posterior Probability of Phylogeny

Lecture 4. Models of DNA and protein change. Likelihood methods

Phylogenetics. BIOL 7711 Computational Bioscience

Bootstrapping phylogenies: Large deviations and dispersion effects

Transcription:

ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L 2630 2635 2640 5 10 20 50 100 200 Transition / transversion ratio (This is for the 14-species primates data available for download). ootstraps and testing trees p.2/20

onstraints on a tree for a clock onstraints for a clock v 1 v 2 v 4 v 5 v 1 = v 2 v 6 v 3 v v 4 = 5 v 8 v 1 + v v 6 = 3 v 7 v v 3 + 7 = v 4 + v 8 oes not constrain the branch length on the unrooted tree ootstraps and testing trees p.3/20 Likelihood-ratio test of molecular clock Mouse Human himp Gorilla Orang Gibbon ovine log likelihood Without clock 1405.608 With clock 1407.085 parameters 11 6 ifference 1.477 5 χ 2 = 2.954 df = 5 (This is for this 7-species subset of the primates data). (non significant) ootstraps and testing trees p.4/20

Likelihood surface for three clocklike trees 204 ln Likelihood 205 206 x x x 0.10 bigger 0 0.10 0.20 x bigger (These are profile likelihoods" as they show the largest likelihood for that value of x,maximizingovertheotherbranchlengthinthetree.) ootstraps and testing trees p.5/20 Two trees to be tested using KHT test Mouse Tree I ovine Gibbon Orang Gorilla himp Human Mouse Tree II ovine Gibbon Orang Gorilla himp Human ootstraps and testing trees p.6/20

Table of differences in log-likelihood Tree I II site 1 2 3 4 5 6 231 232 ln L 2.971 4.483 5.673 5.883 2.691 8.003... 2.971 2.691 2.983 4.494 5.685 5.898 2.700 7.572 2.987 2.705... 1405.61 1408.80 iff... +3.19 +0.012 +0.111 +0.013 +0.015 +0.010 0.431 +0.012 +0.010 ootstraps and testing trees p.7/20 Histogram of those differences 0.50 0.0 0.50 1.0 1.5 2.0 ifference in log likelihood at site o sign test, or t-test, or similar nonparametric tests. ootstraps and testing trees p.8/20

ootstrap sampling (with mixtures of normals) estimate of θ (unknown) true value of θ empirical distribution of sample ootstrap replicates (unknown) true distribution istribution of estimates of parameters ootstraps and testing trees p.9/20 ootstrap sampling To infer the error in a quantity, θ, estimatedfromasampleofpoints x 1, x 2,...,x n we can o the following R times (R =1000or so) raw a bootstrap sample" by sampling n times with replacement from the sample. all these x 1, x 2,...,x n.notethatsomeofthe original points are represented more than once in the bootstrap sample, some once, some not at all. stimate θ from each of the bootstrap samples, call these ˆθ k (k = 1, 2,...,R) When all R bootstrap samples have been done, the distribution of ˆθ i estimates the distribution one would get if one were able to draw repeated samples of n data points from the unknown true distribution. ootstraps and testing trees p.10/20

ootstrap sampling of phylogenies Original ata sites sequences ootstrap sample #1 sites stimate of the tree sequences sample same number of sites, with replacement ootstrap sample #2 sequences sites sample same number of sites, with replacement ootstrap estimate of the tree, #1 (and so on) ootstrap estimate of the tree, #2 ootstraps and testing trees p.11/20 nalyzing bootstraps with phylogenies The sites are assumed to have evolved independently given the tree. They are the entities that are sampled (the x i ). The trees play the role of the parameter. One ends up with a cloud of R sampled trees. There are many possible ways. The one I will describe here is the most useful, but not the only way we could go. To summarize this cloud, we ask, for each branch in the tree, how frequently it appears among the cloud of trees. We make a tree that summarizes this for all the most frequently occurring branches. This is the majority rule consensus tree of the bootstrap estimates of the tree. ootstraps and testing trees p.12/20

Partitions from branches in an (unrooted) tree and so on for all the other external (tip) branches ootstraps and testing trees p.13/20 The majority-rule consensus tree Trees: How many times each (non tip) partition of species is found: 3 3 1 1 1 2 1 3 Majority rule consensus tree of the unrooted trees: 60 60 60 ootstraps and testing trees p.14/20

ootstrap sampling of a phylogeny ovine Mouse Squir Monk 35 42 80 84 72 74 himp Human Gorilla Orang 77 49 100 99 99 Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq Tarsier Lemur In this example, parsimony was used to infer the tree. ootstraps and testing trees p.15/20 Potential problems with the bootstrap Sites may not evolve independently Sites may not come from a common distribution (but you can consider them to be sampled from a mixture of possible distributions) If do not know which branch is of interest at the outset, a multiple-tests" problem means that the most extreme P values are overstated Pvaluesarebiased(tooconservative) ootstrapping does not correct biases in phylogeny methods ootstraps and testing trees p.16/20

elete-half jackknife P values ovine Mouse 50 80 84 69 72 Squir Monk himp Human Gorilla Orang 32 80 100 98 99 Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq 59 Tarsier Lemur In this example, parsimony was used to infer the tree. ootstraps and testing trees p.17/20 diagramoftheparametricbootstrap computer simulation estimation of tree data set #1 T 1 estimate of tree data set #2 T 2 original data data set #3 T 3 data set #100 T 100 ootstraps and testing trees p.18/20

References ootstraps etc. fron,. 1979. ootstrap methods: another look at the jackknife. nnals of Statistics 7: 1-26. [The original bootstrap paper] Margush, T. and. R. McMorris. 1981. onsensus n-trees. ulletin of Mathematical iology 43: 239-244i. [Majority-rule consensus trees] elsenstein, J. 1985. onfidence limits on phylogenies: an approach using the bootstrap. volution 39: 783-791. [The bootstrap first applied to phylogenies] Zharkikh,., and W.-H. Li. 1992. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. our taxa with a molecular clock. Molecular iology and volution 9: 1119-1147. [iscovery and explanation of bias in P values] Künsch, H. R. 1989. The jackknife and the bootstrap for general stationary observations. nnals of Statistics 17: 1217-1241. [The block-bootstrap] Wu,.. J. 1986. Jackknife, bootstrap and other resampling plans in regression analysis. nnals of Statistics 14: 1261-1295. [The delete-half jackknife] fron,. 1985. ootstrap confidence intervals for a class of parametric problems. iometrika 72: 45-58. [The parametric bootstrap] ootstraps and testing trees p.19/20 (more references) Templeton,. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. volution 37: 221-224. [The first paper on the KHT test] Goldman, N. 1993. Statistical tests of models of N substitution. Journal of Molecular volution 36: 182-98. [Parametric bootstrapping for testing models] Shimodaira, H. and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular iology and volution 16: 1114-1116. [orrection of KHT test for multiple hypothesis] Prager,. M. and.. Wilson. 1988. ncient origin of lactalbumin from lysozyme: analysis of N and amino acid sequences. Journal of Molecular volution 27: 326-335. [winning-sites test] Hasegawa, M. and H. Kishino. 1994. ccuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Molecular iology and volution 11: 142-145. [RLL probabilities] ootstraps and testing trees p.20/20