Hypothesis testing and phylogenetics

Similar documents
Scientific ethics, tree testing, Open Tree of Life. Workshop on Molecular Evolution Marine Biological Lab, Woods Hole, MA.

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Molecular Evolution & Phylogenetics

Estimating Evolutionary Trees. Phylogenetic Methods

Inferring Molecular Phylogeny

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Constructing Evolutionary/Phylogenetic Trees

A Bayesian Approach to Phylogenetics

Systematics - Bio 615

Constructing Evolutionary/Phylogenetic Trees

A (short) introduction to phylogenetics

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Consensus Methods. * You are only responsible for the first two

Statistical Data Analysis Stat 3: p-values, parameter estimation

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Dr. Amira A. AL-Hosary

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Bayesian Models in Machine Learning

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Branch-Length Prior Influences Bayesian Posterior Probability of Phylogeny

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Week 7: Bayesian inference, Testing trees, Bootstraps

7. Tests for selection

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

18.05 Practice Final Exam

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

X X (2) X Pr(X = x θ) (3)

Phylogenetic inference

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Lecture 7: The χ 2 and t distributions, frequentist confidence intervals, the Neyman-Pearson paradigm, and introductory frequentist hypothesis testing

Bayesian Methods for Machine Learning

Consistency Index (CI)

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

P Values and Nuisance Parameters

A Discussion of the Bayesian Approach

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Letter to the Editor. Department of Biology, Arizona State University

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

(1) Introduction to Bayesian statistics

Lecture: Mixture Models for Microbiome data

Frequentist Statistics and Hypothesis Testing Spring

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Testing Independence

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Physics 509: Propagating Systematic Uncertainties. Scott Oser Lecture #12

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

Mathematical Statistics

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

Concepts and Methods in Molecular Divergence Time Estimation

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Phylogenetic Tree Reconstruction

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Phylogenetics: Building Phylogenetic Trees

Experimental Design and Data Analysis for Biologists

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Exam 2 Practice Questions, 18.05, Spring 2014

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lab 9: Maximum Likelihood and Modeltest

Subject CS1 Actuarial Statistics 1 Core Principles

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Introductory Econometrics. Review of statistics (Part II: Inference)

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Lecture 7: Hypothesis Testing and ANOVA

Transcription:

Hypothesis testing and phylogenetics Woods Hole Workshop on Molecular Evolution, 2017 Mark T. Holder University of Kansas Thanks to Paul Lewis, Joe Felsenstein, and Peter Beerli for slides.

Motivation Pipeline of: Data collection Alignment Model selction Tree estimation gives us a point estimate of the best tree. This lecture/lab covers: 1. How confident should we be that this tree is correct? 2. Can we be confident in some aspects of the tree (e.g. some clades)? 3. If our question of interest is about evolution of characters, can we answer that question despite our uncertainty about the tree?

Conclusions 1 - confidence on trees 1. Non-parametric bootstrapping: useful for assessing sampling error, but a little hard to interpret precisely. Susko s abp gives 1 abp P -value for the hypothesis that a recovered branch is not present in the true tree. 2. How should we assign a P -value to tree hypothesis? is surprisingly complicated. Kishino-Hasegawa (KH-Test) if testing 2 (a priori) trees. Shimodaira s approximately unbiased (AU-Test) for sets of trees. Parametric bootstrapping (can simulate under complex models)

Conclusions 2 - confidence about evo. hypotheses If H 0 is about the evolution of a trait: 1. P -value must consider uncertainty of the tree: can be large P over confidence set of trees. Bayesian methods (covered tomorrow) enable prior predictive or posterior predictive P -values.

Conclusions 3 - simulate your own null distributions (the focus of the lab) 1. In phylogenetics we often have to simulate data to approximate P -values 2. Designing the simulations requires care to make a convincing argument.

Reasons phylogenetic inference might be wrong 1. Systematic error Our inference method might not be sophisticated enough 2. Random error We might not have enough data we are misled by sampling error.

Distr of p The bootstrap (unknown) true value of estimate of (unknown) true distribution empirical distribution of sample Bootstrap replicates

The bootstrap (unknown) true value of estimate of (unknown) true distribution empirical distribution of sample Bootstrap replicates Distribution of estimates of parameters Slide from Joe Felsenstein

The bootstrap for phylogenies sites Original Data T^ Bootstrap sample #1 sample same number of sites, with replacement T (1) Bootstrap sample #2 sample same number of sites, with replacement T (2) Slide from Joe Felsenstein (and so on)

The majority-rule consensus tr ee Trees: E A C B D F E C A B D F E A F B C D E A C B D F E A C B D F How many times each partition of species is found: AE BCDF 4 ACE BDF 3 ACEF BD 1 AC BDEF 1 AEF BCD 1 ADEF BC 2 ABCE DF 3 E A C 0.8 0.6 0.6 B D F Slide from Joe Felsenstein

From Hasegawa s analysis of 232 sites D-loop

http://phylo.bio.ku.edu/mephytis/boot-sample.html http://phylo.bio.ku.edu/mephytis/parsimony.html http://phylo.bio.ku.edu/mephytis/bootstrap.html

Bootstrapping for branch support Typically a few hundred bootstrap, pseudoreplicate datasets are produced. Less thorough searching is faster, but will usually artificially lower bootstrap proportions (BP). However, Anisimova et al. (2011) report that RAxML s rapid bootstrap algorithm may inflate BP. Rogue taxa can lower support for many splits you do not have to use the majority-rule consensus tree to summarize bootstrap confidence statements; See also (Lemoine et al., 2017)

Bootstrap proportions have been characterized as providing: a measure of repeatability, an estimate of the probability that the tree is correct (and bootstrapping has been criticized as being too conservative in this context), the P-value for a tree or clade

Frequentist hypothesis testing: coin flipping example N = 100 and h = 60 Can we reject the fair coin hypothesis? H 0 : Pr(heads) = 0.5 The recipe is: 1. Formulate null (H 0 ) and alternative (H A ) hypotheses. 2. Choose an acceptable Type-I error rate (significance level) 3. Choose a test statistic: f H = fraction of heads in sample. f H = 0.6 4. Characterize the null distribution of the test statistic 5. Calculate the P -value: The probability of a test statistic value more extreme than f H arising even if H 0 is true. 6. Reject H 0 if P -value is your Type I error rate.

H 0 0.0 0.2 0.4 0.6 0.8 1.0 Pr(heads)

H 0 f H 0.0 0.2 0.4 0.6 0.8 1.0 Pr(heads)

Null distribution frequency 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.4 0.6 0.8 1.0 Pr(heads)

frequency 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.4 0.6 0.8 1.0 P -value 0.058 Pr(heads)

Making similar plots for tree inference is hard. Our parameter space is trees and branch lengths. Our data is a matrix of characters. It is hard to put these objects in the same space. You can do this pattern frequency space. Some cartoons of projections of this space are posted at: http://phylo.bio.ku.edu/slides/pattern-freq-space-cartoons.pdf

coin flipping N = 100 and H = 60 Can we reject the hypothesis of a fair coin? We can use simulation to generate the null distribution (we could actually use the binomial distribution to analytically solve this one)...

A simulation of the null distribution of the # heads Frequency 0 1000 2000 3000 4000 P-value 0.029 0 20 40 60 80 # heads

We discussed how bootstrapping gives us a sense of the variability ofour estimate It can also give a tail probability for Pr(f (boot) H 0.5) Amazingly (for many applications): Pr( ˆf H 0.6 null is true) Pr(f (boot) H 0.5) In other words, the P -value is approximate by the fraction of bootstrap replicates consistent with the null.

Distribution of the # heads in bootstrap resampled datasets Frequency 0 1000 2000 3000 4000 Pr(p (boot) 0.5) 0.027 BP = frac. of p (boot) 0.5 BP 0.973 P -value 1 BP 0 20 40 60 80 100 # heads

A simulation of the null distribution of the # heads Frequency 0 1000 2000 3000 4000 0 20 40 60 80 100 # heads Distribution of the # heads in bootstrap resampled datasets Frequency 0 1000 2000 3000 4000 0 20 40 60 80 100 # heads

When you decide between trees, the boundaries between tree hypotheses can be curved When the boundary of the hypothesis space is curved, 1 - BP can be a poor approximation of the P -value. Efron et al. (1996)

In the straight border case, symmetry implies that: The actual P -value (blue region) 1 BP (1 BP is the blue below)

In the curved border case, the symmetry breaks down: The actual P -value (blue region) 1 BP (1 BP is the blue below)

Efron et al. (1996) proposed a computationally expensive multi-level bootstrap (which has not been widely used). Shimodaira (2002) used the same theoretical framework to devise a (more feasible) Approximately Unbiased (AU) test of topologies. Multiple scales of bootstrap resampling (80% of characters, 90%, 100%, 110%...) are used to detect and correct for curvature of the boundary. Implemented in the new versions of PAUP

Susko (2010) adjusted BP abp Susko agrees with curvature arguments of Efron et al. (1996) and Shimodaira (2002), but points out that they ignore the sharp point in parameter space around the polytomy. He correct bootstrap proportions: 1 abp accurately estimates the P -value. The method uses the multivariate normal distributions the based on calculations about the curvature of the likelihood surface. You need to perform a different correction when you know the candidate tree a priori versus when you are putting BP on the ML tree. BP may not be conservative when you correct for selection bias.

0 20 40 60 80 100 0 20 40 60 80 100 abp for each BP (5 model conditions) BP abp 0 20 40 60 80 100 0 20 40 60 80 100 abp with selection bias correction for each BP (5 model conditions) BP abp

Conclusions bootstrapping 1. Non-parametric bootstrapping proportions help us see which branches have so little support that they could be plausibly explained by sampling error. 2. BPs are a little hard to interpret precisely. 3. Susko has and adjustment ( abp ) so that 1 abp P - value for the hypothesis that a recovered branch is not present in the true tree.

Can we test trees using the LRT? 192 194 A t B C 1. Should we calculate the LRT as: δ i = 2 [ ln L(t = ˆt, T i X) ln L(t = 0, T i X) ] ln Likelihood 196 198 2. And can we use the χ 2 1 distribution to get the critical value for δ? 200 0.0 0.1 0.2 Slide from Joe Felsenstein t

Can we test trees using the LRT? 192 ln Likelihood 194 196 198 A t B C 200 0.0 0.1 0.2 Slide from Joe Felsenstein t 1. Should we calculate the LRT as: δ i = 2 [ ln L(t = ˆt, T i X) ln L(t = 0, T i X) ] No. t = 0 might not yield the best alternative ln L 2. And can we use the χ 2 1 distribution to get the critical value for δ? No. Constraining parameters at boundaries leads to a mixture such as: 1 2 χ2 0 + 1 2 χ2 1 See Ota et al. (2000).

192 Can we test trees using the LRT? A B C 194 A C B t ln Likelihood 196 198 t A B C t No, tree hypotheses are not nested! 200 0.2 0.1 t 0.0 0.1 0.2 t Slide from Joe Felsenstein

alrt of Anisimova and Gascuel (2006) For a branch j, calculate δ j as twice the difference in ln L between the optimal tree (which has the branch) and the best NNI neighbor. This is very fast. They argue that the null distribution for each LRT around the polytomy follows a 1 2 χ2 0 + 1 2 χ2 1 distribution The introduce Bonferroni-correction appropriate for correcting for the selection of the best of the three resolutions. They find alrt to be accurate and powerful in simulations, but Anisimova et al. (2011) report that it rejects too often and is sensitive to model violation.

alrt = 2 [ln l 1 ln L(T 2 X)] l 1 = L(T 1 X) Image from Anisimova and Gascuel (2006)

abayes Anisimova et al. (2011) abayes(t 1 X) = Pr(X T 1 ) Pr(X T 1 ) + Pr(X T 2 ) + Pr(X T 3 ) Simulation studies of Anisimova et al. (2011) show it to have the best power of the methods that do not have inflated probability of falsely rejecting the null. It is sensitive to model violation. This is similar to likelihood-mapping of Strimmer and von Haeseler (1997)

coin flipping example (again, for inspiration) N = 100 and H = 60 Can we reject the hypothesis of a fair coin? We can use simulation to generate the null distribution (we could actually use the binomial distribution to analytically solve this one)...

A simulation of the null distribution of the # heads Frequency 0 1000 2000 3000 4000 P-value 0.029 0 20 40 60 80 # heads

The simplest phylogenetic test would compare two trees Null: If we had no sampling error (infinite data) T 1 and T 2 would explain the data equally well. Test Statistic: δ(t 1, T 2 X) = 2 [ln L(T 1 X) ln L(T 2 X)] Expectation under null: E H0 [δ(t 1, T 2 X)] = 0

Using 3000 sites of mtdna sequence for 5 primates T 1 is ((chimp, gorilla), human)

Total lnl(t1) lnl(t2) = 1.58134 lnl(t1) lnl(t2) 1.0 0.5 0.0 0 500 1000 1500 2000 2500 3000 Site

12 10 8 6 4 2 0 12 10 8 6 4 2 0 Total lnl(t1) lnl(t2) = 1.58134 lnl(t1) lnl(t2)

Using 3000 sites of mtdna sequence for 5 primates T 1 is ((chimp, gorilla), human) ln L(T 1 X) = 7363.296 T 2 is ((chimp, human), gorilla) ln L(T 2 X) = 7361.707 δ(t 1, T 2 X) = 3.18 E(δ) -3.0-2.0-1.0 0.0 1.0 2.0 3.0 δ(t 1, T 2 X)

To get the P -value, we need to know the probability: ( Pr δ(t 1, T 2 X) ) 3.18 H 0 is true δ(t 1, T 2 X) = 3.18 E(δ) δ(t 1, T 2 X) = 3.18-3.0-2.0-1.0 0.0 1.0 2.0 3.0 δ(t 1, T 2 X)

KH Test 1. Examine the difference in ln L for each site: δ(t 1, T 2 X i ) for site i. 2. Note that the total difference is simply a sum: δ(t 1, T 2 X) = M i=1 δ(t 1, T 2 X i ) 3. The variance of δ(t 1, T 2 X) will be a function of the variance in site δ(t 1, T 2 X i ) values.

δ(t 1, T 2 X i ) for each site, i.. Frequency 0 10 20 30 40 50 4 2 0 2 4 δ(t 1, T 2 x i )

KH Test - the variance of δ(t 1, T 2 X) To approximate variance of δ(t 1, T 2 X) under the null, we could: 1. use assumptions of Normality (by appealing to the Central Limit Theorem 1 ). Or 2. use bootstrapping to generate a cloud of pseudoreplicate δ(t 1, T 2 X ) values, and look at their variance. 1 Susko (2014) recently showed that this is flawed and too conservative.

δ for many (RELL) bootstrapped replicates of the data Frequency 0 100 200 300 400 500 600 700 30 20 10 0 10 20 30 δ(t 1, T 2 X )

The (RELL) bootstrapped sample of statistics. Is this the null distribution for our δ test statistic? Frequency 0 100 200 300 400 500 600 700 30 20 10 0 10 20 30 δ(t 1, T 2 X )

KH Test - centering E H0 [δ(t 1, T 2 X)] = 0 Bootstrapping gives us a reasonable guess of the variance under H 0 Subtracting the mean of the bootstrapped δ(t 1, T 2 X ) values gives the null distribution. For each of the j bootstrap replicates, we treat δ(t 1, T 2 X j ) δ(t 1, T 2 X ) as draws from the null distribution.

δ(t 1, T 2 X (j) ) δ(t 1, T 2 X ) for many (RELL) bootstrapped replicates of the data Frequency 0 100 200 300 400 500 600 700 20 10 0 10 20 30 δ(t 1, T 2 X (j) ) δ(t 1, T 2 X )

Approximate null distribution with tails (absolute value 3.18) shown Frequency 0 200 400 600 800 P value = 0.46 20 10 0 10 20 30 δ(t 1, T 2 X ) δ(t 1, T 2 X )

Other ways to assess the null distribution of the LR test statistic Bootstrapping then centering LR, and Using normality assumptions. are both clever and cute solutions. They are too conservative (Susko, 2014) - more complicated calculations from the Normal [KHns] or mixtures of χ 2 distributions [chi-bar]. They do not match the null distribution under any model of sequence evolution.

Mini-summary δ(t 1, T 2 X) = 2 [ln L(T 1 X) ln L(T 2 X)] is a powerful statistic for discrimination between trees. We can assess confidence by considering the variance in signal between different characters. Bootstrapping helps us assess the variance in ln L that we would expect to result from sampling error.

Scenario 1. A (presumably evil) competing lab scoops you by publishing a tree, T 1, for your favorite group of organisms. 2. You have just collected a new dataset for the group, and your ML estimate of the best tree, T 2, differ s from T 1. 3. A KH Test shows that your data significantly prefer T 2 over T 1. 4. You write a (presumably scathing) response article. Should a Systematic Biology publish your response?

What if start out with only one hypothesized tree, and we want to compare it to the ML tree? The KH Test is NOT appropriate in this context (see Goldman et al., 2000, for discussion of this point) Multiple Comparisons: lots of trees increases the variance of δ( ˆT, T 1 X) Selection bias: Picking the ML tree to serve as one of the hypotheses invalidates the centering procedure of the KH test.

Using the ML tree in your test introduces selection bias Even [ when the H 0 is true, we do not expect 2 ln L( ˆT ] ) ln L(T 1 ) = 0 Imagine a competition in which a large number of equally skilled people compete, and you compare the score of one competitor against the highest scorer.

Experiment: 70 people each flip a fair coin 100 times and count # heads. h 1 h 2 Null dist.: difference in # heads any two competitors Frequency 0 200 400 600 800 1000 1200 30 20 10 0 10 20 30 Diff # heads

Experiment: 70 people each flip a fair coin 100 times and count # heads. h 1 h 2 max(h) h 1 Null dist.: difference in # heads any two competitors Null dist.: difference highest random competitor Frequency 0 200 400 600 800 1000 1200 Frequency 0 500 1000 1500 30 20 10 0 10 20 30 Diff # heads 30 20 10 0 10 20 30 Diff # heads

Shimodaira and Hasegawa proposed the SH test which deals the selection bias introduced by using the ML tree in your test Requires set of candidate trees - these must not depend on the dataset to be analyzed. H 0 : each tree in the candidate set is as good as the other trees. The test makes worst-case assumptions - it is conservative. AU test is less conservative (still needs a candidate set)

real data test stat commands real test stat null fitting commands fit null

real data test stat commands real test stat null fitting commands fit null simulated data

real data test stat commands real test stat null fitting commands fit null simulated data null distrib.

real data test stat commands real test stat null fitting commands fit null simulated data null distrib. P -value

Parametric bootstrapping to generate the null distribution for the LR statistic 1. find the best tree and model pair that are consistent with the null, 2. Simulate many datasets under the parameters of that model, [ 3. Calculate δ (j) = 2 ln L( ˆT ] (j) X (j) (j) ) ln L( ˆT 0 X (j) ) for each simulated dataset. the (j) is just an index for the simulated dataset, (j) ˆT 0 is the tree under the null hypothesis for simulation replicate j

Parametric bootstrapping This procedure is often referred to as SOWH test (in that form, the null tree is specified a priori). Huelsenbeck et al. (1996) describes how to use the approach as a test for monophyly. Intuitive and powerful, but not robust to model violation (Buckley, 2002). Can be done manually 2 or via SOWHAT by Church et al. (2015). Optional demo here. Susko (2014): collapse optimize null tree with 0-length contraints for the branch in question (to avoid rejecting too often) 2 instructions in https://molevol.mbl.edu/index.php/parametricbootstrappinglab

Null distribution of the difference in number of steps under GTR+I+G Frequency 0 100 200 300 400 500 600 15 10 5 0 Difference in steps

Null distribution of the difference in number of steps under JC Frequency 0 200 400 600 800 1000 15 10 5 0 Difference in steps

We often don t want to test tree topologies If we are conducting a comparative method we have to consider phylogenetic history, ideally we would integrate out the uncertainty in the phylogeny Tree is a nuisance parameter

P -values with nuisance parameters Suppose we are using some positive test statistic, s. If we observe a value of s = X on our data: Remember that P = Pr(s X H 0 ), which is usually described as the tail probability. P = X f(s H 0 )ds But what if the probability density of s depends on a nuisance parameter T and we don t know the value of T?

P -values with nuisance parameters We could: P = X f(s H 0 )ds take the max P over all T this is way too conservative make a confidence (1 β)% set of T and take P to be β+ the largest P in that set (Berger and Boos method) do a Bayesian analysis and get a posterior predictive P value: ( ) P = f(s T, H 0 )p(t )dt ds X

char. evo. rate sim data P -value tree topology edge lengths

rate CI char. evo. rate sim data (usually bigger) P -value tree topology edge lengths

rate CI char. evo. rate sim data (usually bigger) P -value tree CI tree topology edge lengths

rate CI char. evo. rate sim data (usually biggest) P -value tree CI tree topology edge CI edge lengths

Significantly different genealogy different phylogeny True gene tree can differ from true species tree for several biological reasons: deep coalescence, gene duplication/loss (you may be comparing paralogs), lateral gene transfer.

There are lots of simulators out there Seq-gen, PAUP, Indelible... - substitutions DAWG, Indelible... - alignment ms and multi-species coalescent simulators for within population samples Ginkgo, DIM SUM... biogeography simulators,... Parametric bootstrapping is very flexible and lots of tools are available.

You have to think about what sources of error are most relevant for your data!

Errors modeling multiple hits Assembly + asc. bias errors Alignment errors Paralogy errors No signal Errors from deep coalescence Long ago Time Now

Horizontal gene transfer Errors modeling multiple hits Assembly + asc. bias errors Alignment errors Paralogy errors No signal Errors from deep coalescence Long ago Time Now

Tree and parameter sensitivity interact From?

Conclusions 1 - confidence on trees 1. Non-parametric bootstrapping: useful for assessing sampling error, but a little hard to interpret precisely. Susko s abp gives 1 abp P -value for the hypothesis that a recovered branch is not present in the true tree. 2. How should we assign a P -value to tree hypothesis? is surprisingly complicated. Kishino-Hasegawa (KH-Test) if testing 2 (a priori) trees. Shimodaira s approximately unbiased (AU-Test) for sets of trees. Parametric bootstrapping (can simulate under complex models)

Conclusions 2 - confidence about evo. hypotheses If H 0 is about the evolution of a trait: 1. P -value must consider uncertainty of the tree: can be large P over confidence set of trees. Bayesian methods (covered tomorrow) enable prior predictive or posterior predictive P -values.

Conclusions 3 - simulate your own null distributions (the focus of the lab) 1. In phylogenetics we often have to simulate data to approximate P -values 2. Designing the simulations requires care to make a convincing argument.

References Anisimova, M. and Gascuel, O. (2006). Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Systematic Biology, 55(4):539 552. Anisimova, M., Gil, M., Dufayard, J. F., Dessimoz, C., and Gascuel, O. (2011). Survey of Branch Support Methods Demonstrates Accuracy, Power, and Robustness of Fast Likelihood-based Approximation Schemes. Systematic Biology. Buckley, T. R. (2002). Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Systematic Biology, 51(3):509 523. Church, S. H., Ryan, J. F., and Dunn, C. W. (2015). Automation and evaluation of the sowh test with sowhat. Systematic Biology, 64(6):1048 1058.

Efron, B., Halloran, E., and Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proceedings of the National Academy of Science, U. S. A., 93:13429 13434. Goldman, N., Anderson, J. P., and Rodrigo, A. G. (2000). Likelihood-based tests of topologies in phylogenetics. Systematic Biology, 49:652 670. Huelsenbeck, J., Hillis, D., and Nielsen, R. (1996). A Likelihood-Ratio Test of Monophyly. Systematic Biology, 45(4):546. Lemoine, F., Domelevo Entfellner, J.-B., Wilkinson, E., De Oliveira, T., and Gascuel, O. (2017). Boosting felsenstein phylogenetic bootstrap. biorxiv. Ota, R., Waddell, P. J., Hasegawa, M., Shimodaira, H., and Kishino, H. (2000). Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Molecular Biology and Evolution, 17(5):798 803. Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection. Systematic Biology, 51(3):492 508.

Strimmer, K. and von Haeseler, A. (1997). Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proceedings of the National Academy of Sciences of the United States of America, 94(13):6815 6819. Susko, E. (2010). First-order correct bootstrap support adjustments for splits that allow hypothesis testing when using maximum likelihood estimation. Molecular Biology and Evolution, 27(7):1621 1629. Susko, E. (2014). Tests for two trees using likelihood methods. Molecular Biology and Evolution.