Bayesian Phylogenetics:

Similar documents
Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Bayesian Inference and MCMC

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Principles of Bayesian Inference

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

Learning the hyper-parameters. Luca Martino

Monte Carlo in Bayesian Statistics

ST 740: Markov Chain Monte Carlo

A Bayesian Approach to Phylogenetics

Probabilistic Machine Learning

MCMC: Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Introduction into Bayesian statistics

Bayesian Estimation with Sparse Grids

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Regression Linear and Logistic Regression

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian RL Seminar. Chris Mansley September 9, 2008

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

MCMC Methods: Gibbs and Metropolis

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

David Giles Bayesian Econometrics

Molecular Epidemiology Workshop: Bayesian Data Analysis

STA 4273H: Statistical Machine Learning

Computational statistics

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Tutorial on Probabilistic Programming with PyMC3

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

A Review of Pseudo-Marginal Markov Chain Monte Carlo

MCMC algorithms for fitting Bayesian models

Principles of Bayesian Inference

Bayesian Methods for Machine Learning

Infer relationships among three species: Outgroup:

MARKOV CHAIN MONTE CARLO

Principles of Bayesian Inference

Markov Chain Monte Carlo methods

Density Estimation. Seungjin Choi

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov chain Monte Carlo

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Session 3A: Markov chain Monte Carlo (MCMC)

Markov Chain Monte Carlo

Markov Chain Monte Carlo, Numerical Integration

Answers and expectations

Notes on pseudo-marginal methods, variational Bayes and ABC

Monte Carlo Methods. Leon Gu CSD, CMU

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Methods in Multilevel Regression

CPSC 540: Machine Learning

MCMC Sampling for Bayesian Inference using L1-type Priors

Lecture 6: Markov Chain Monte Carlo

An introduction to Markov Chain Monte Carlo techniques

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Lecture 7 and 8: Markov Chain Monte Carlo

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Bayesian inference: what it means and why we care

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Molecular Evolution & Phylogenetics

STAT 425: Introduction to Bayesian Analysis

Inference for a Population Proportion

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Advanced Statistical Modelling

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Multimodal Nested Sampling

an introduction to bayesian inference

Metropolis-Hastings Algorithm

Estimating Evolutionary Trees. Phylogenetic Methods

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Doing Bayesian Integrals

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

16 : Approximate Inference: Markov Chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Reminder of some Markov Chain properties:

Likelihood-free MCMC

Bayesian Networks in Educational Assessment

David Giles Bayesian Econometrics

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference. Chapter 1. Introduction and basic concepts

F denotes cumulative density. denotes probability density function; (.)

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

CS281A/Stat241A Lecture 22

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Lecture 5: Bayes pt. 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

MCMC notes by Mark Holder

Supplementary Note on Bayesian analysis

Introduction to Machine Learning CMU-10701

Bayesian Inference in Astronomy & Astrophysics A Short Course

Modelling Operational Risk Using Bayesian Inference

Riemann Manifold Methods in Bayesian Statistics

Transcription:

Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA

Who is this man? How sure are you?

The one true tree? Methods we ve learned so far try to find a single tree that best describes the data However, they do not search everywhere, and it is difficult to find the best tree Many (gazillions of) trees may be almost as good

Bayesian phylogenetics: general principle Using Bayesian principles, we will search for and average over sets of plausible trees (weighted by their probability) instead a single best tree In this method, the space that you search is limited by prior information and the data The posterior distribution of trees naturally translates into probability statements (and uncertainty) on aspects of direct scientific interest When did an evolutionary event happen? Are a subset of sequences more closely related? The cost: we must formalize our prior beliefs

Conditional probability: intuition Philippe is a hipster. Philippe is a hipster and rides a single-speed bike. Which is more probable? SISMID University of Washington Bayesian Phylogenetics

Conditional probability: intuition Arbitrary events A (hipster) and B (bike) from sample space U

Bayes theorem Definition of conditional probability in words: probability(a and B) = probability(a given B) probability(b) In usual mathematical symbols: p(a B)p(B) = p(a, B) = p(b A)p(A) With a slight re-arrangement: p(a B) = p(b A)p(A) p(b) Just" a restatement of conditional probability

Bayes theorem Integration (averaging) yields a marginal probability: p(a) = p(a, B)dB = p(a B)p(B)dB }{{} over all possible values of B probability(hipster) = probability(hipster and has bike) + probability(hipster and has no bike)

Conditional probability: pop quiz What do you know about Thomas Bayes? Bayes theorem? Some discussion points: Favorite game? Best buddies?

Bayes theorem for statistical inference Unknown quantity θ (model parameters, scientific hypotheses) Prior p(θ) beliefs before observed data Y become available Conditional probability p(y θ) of the data given fixed θ also called the likelihood of Y Posterior p(θ Y ) beliefs: p(θ Y ) = p(y θ)p(θ) p(y ) p(θ) and p(y θ) easy p(y ) = p(y θ)p(θ)dθ hard

Bayesian phylogenetic inference

Bayesian phylogenetic inference

Bayesian phylogenetic inference Posterior: p(θ Y ) = p(y θ)p(θ) p(y ) Trouble: p(y ) is not computable sum over all possible trees For N taxa: there are G(N) = (2N 3) (2N 5) 1 θ = (tree, substitution process) p(y θ) - continuous-time Markov chain process that gives rise to sequences at tips of tree E.g., G(21) > 3 10 23

Priors Strongest assumption: most parameters are separable, e.g. the tree is independent of the substitution process Weaker assumption: tree Coalescent process Weaker assumption: functional form on substitution parameters Specialized priors as well If worried: check sensitivity

Posterior inference Numerical (Monte Carlo) integration as a solution:

Markov chain Monte Carlo Metropolis et al (1953) and Hastings (1970) proposed a stochastic integration algorithm that can explore vast parameter spaces Algorithm generates a Markov chain that visits parameter values (e.g., a specific tree) with frequency equal to their posterior density / probability. Markov chain: random walk where the next step only depends on the current parameter state

Metropolis-Hastings Algorithm Each step in the Markov chain starts at its current state θ and proposes a new state θ from an arbitrary proposal distribution q( θ) (transition kernel) θ becomes the new state of the chain with probability: R = min p(θ Y ) q(θ θ ) p(θ Y ) q(θ θ) = min p(y θ )p(θ ) / p(y ) q(θ θ ) p(y θ)p(θ) / p(y ) q(θ θ) = min p(y θ )p(θ ) q(θ θ ) p(y θ)p(θ) q(θ θ) Otherwise, θ remains the state of the chain

Posterior sampling We repeat the process of proposing a new state, calculating the acceptance probability and either accepting or rejecting the proposed move millions of times Although correlated, the Markov chain samples are valid draws from the posterior; however... Initial sampling (burn-in) is often discarded due to correlation with chain s starting point ( posterior)

Transition Kernels Often we propose changes to only a small # of dimensions in θ at a time (Metropolis-within-Gibbs) In phylogenetics, mixing (correlation) in continuous dimensions is much better (smaller) than for the tree So, dominate approach has been keep-it-simple-stupid alternatives exist and may become necessary: Gibbs sampler; slice sampler; Hamiltonian MC

Tree Transition Kernels

Posterior Summaries For continuous θ, consider: posterior mean or median MCMC sample average or median Credible Regions quantitative measures of uncertainty, e.g. high posterior density interval ayesian equivalent of a confidence interval is called the est posterior density (HPD) credible region. This is the st region that contains 95% of the posterior probability. For trees, consider: scientifically interesting posterior probability statement, e.g. the probability of monophyly MCMC sample proportion under which hypothesis is true lower 95% HPD limit distribution upper 95% HPD limit SISMID Parameter (x) University of Washington Bayesian Phylogenetics

Posterior Probabilities

Summarizing Trees

MCMC Diagnostics: within a single chain Visually inspect MCMC output traces Measure autocorrelation within a chain: the effective sample size (ESS)

MCMC Diagnostics: across multiple chains Visually inspect MCMC output traces Comparing different chains variance among and between chains

Improving Mixing (Only if convergence diagnostics suggest a problem) Run the chain longer Use a more parsimonious model (uninformative data) Change tuning parameters of transition kernels to bring acceptance rates to 10% to 70% Use different transition kernels (consult an expert)

Improving Mixing

Why Bother being Bayesian? In practice, we have almost no prior knowledge for the model parameters. So, why bother with Bayesian inference? Analysis provides directly interpretable probability statements given the observed data MCMC is a stochastic algorithm that (in theory) avoids getting trapped in local sub-optimal solutions Search space under Coalescent prior is astronomically smaller By numerically integrating over all possible trees, we obtain marginal probability statements on hypotheses of scientific interest, e.g. specific branching events or population dynamics, avoiding bias