A Bayesian Approach to Phylogenetics

Similar documents
Bayesian Phylogenetics

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian Methods for Machine Learning

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference and MCMC

Bayesian Phylogenetics:

Infer relationships among three species: Outgroup:

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Bayesian Phylogenetics

Bayesian Regression Linear and Logistic Regression

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Stat 516, Homework 1

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

CPSC 540: Machine Learning

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Answers and expectations

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review

Markov Chain Monte Carlo methods

MCMC notes by Mark Holder

Bayes Nets: Sampling

Bayesian Models in Machine Learning

Molecular Evolution & Phylogenetics

STA 4273H: Sta-s-cal Machine Learning

CS 343: Artificial Intelligence

Reminder of some Markov Chain properties:

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman

Physics 509: Bootstrap and Robust Parameter Estimation

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Introduction to Machine Learning

Statistical Data Analysis Stat 3: p-values, parameter estimation

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Bayesian Inference. Chapter 1. Introduction and basic concepts

Example. If 4 tickets are drawn with replacement from ,

Informatics 2D Reasoning and Agents Semester 2,

Estimating Evolutionary Trees. Phylogenetic Methods

STA414/2104 Statistical Methods for Machine Learning II

Brief introduction to Markov Chain Monte Carlo

an introduction to bayesian inference

Bayesian Analysis. Justin Chin. Spring 2018

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Computational statistics

Introduc)on to Bayesian Methods

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Frequentist Statistics and Hypothesis Testing Spring

One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today.

Advanced Statistical Methods. Lecture 6

Introduction to Machine Learning CMU-10701

COMP90051 Statistical Machine Learning

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Statistics: Learning models from data

Inference in Bayesian Networks

Monte Carlo in Bayesian Statistics

The Ising model and Markov chain Monte Carlo

MCMC: Markov Chain Monte Carlo

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Machine Learning for Data Science (CS4786) Lecture 24

Bayesian Estimation of Input Output Tables for Russia

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

CS 361: Probability & Statistics

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Approximate Bayesian Computation: a simulation based approach to inference

Machine Learning using Bayesian Approaches

Molecular Epidemiology Workshop: Bayesian Data Analysis

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Advanced Statistical Modelling

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Approximate Inference

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

(1) Introduction to Bayesian statistics

Markov Chain Monte Carlo methods

ST 740: Markov Chain Monte Carlo

Bayesian Networks. Motivation

Forward Problems and their Inverse Solutions

PROBABILISTIC REASONING SYSTEMS

Results: MCMC Dancers, q=10, n=500

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Bayesian Methods in Multilevel Regression

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

STA 4273H: Statistical Machine Learning

Comparison of Bayesian and Frequentist Inference

ST 740: Model Selection

Statistical Inference for Stochastic Epidemic Models

QTL model selection: key players

Bayesian Inference in Astronomy & Astrophysics A Short Course

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

CS 188: Artificial Intelligence. Bayes Nets

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.

An introduction to Bayesian reasoning in particle physics

Transcription:

A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte Carlo Bayesian phylogenetics Prior distributions 10 important considerations Bayesian inference in general D will stand for Data H will mean any one of a number of things: a discrete hypothesis a distinct model (e.g. JC, HKY, GTR, etc.) a tree topology one of an infinite number of continuous model parameter values (e.g. ts:tv rate ratio) A Bayesian approach compared to ML In ML, we choose the hypothesis that gives the highest (maximized) likelihood to the data The likelihood is the probability of the data given the hypothesis L = P (D H). A Bayesian analysis expresses its results as the probability of the hypothesis given the data. this may be a more desirable way to express the result

The posterior probability of a hypothesis Likelihood of hypothesis Prior probability of hypothesis The posterior probability, [P (H D)], is the probability of the hypothesis given the observations, or data (D) The main feature in Bayesian statistics is that it takes into account prior knowledge of the hypothesis P (H D) = P (D H) * P (H) P (D) Posterior probability of hypothesis H Probability of the data (a normalizing constant) Likelihood function is common Both ML and Bayesian methods use the likelihood function In ML, free parameters are optimized, maximizing the likelihood In a Bayesian approach, free parameters are probability distributions, which are sampled. Coin-flipping example Data D: 6 heads (out of 10 flips) H = true underlying proportion of heads (the probability of coming up heads on any single flip) if H = 0.5, coin is perfectly fair if H = 1.0, coin always comes up heads (i.e. it is a trick coin)

The Frequentist and the Bayesian F: there exists true probability H of getting heads, H 0 : H=0.5 Does the data reject the null hypothesis? B: what is the range around 0.5 that we are willing to accept as being in the fair coin range? What is the probability that H is in this range? H

How the MCMC works Markov chain Monte Carlo Start somewhere That somewhere will have a likelihood associated with it Not the optimized, maximum likelihood Randomly propose a new state If the new state has a better likelihood, the chain goes there

Target vs. proposal distributions The target distribution is the posterior distribution of interest The proposal distribution is used to decide where to go next; you have much flexibility here, and the choice affects the efficiency of the MCMC algorithm Symmetric proposal distributions have been assumed thus far, but the Hastings ratio can be used for asymmetric ones

The Tradeoff Pro: taking big steps helps in jumping from one island in the posterior density to another Con: taking big steps often results in poor mixing Solution: MCMCMC!

Metropolis-coupled Markov chain Monte Carlo (MCMCMC, or MC 3 ) MC 3 involves running several chains simultaneously (one cold and several heated ) The cold chain is the one that counts, the heated chains are scouts Chain is heated by raising densities to a power less than 1.0 (values closer to 0.0 are warmer) Bayesian phylogenetics

Sampling the chain Marginal = taking into account all possible values Record the position of the robot every 100 or 1000 steps (1000 represents more thinning than 100) This sample will be autocorrelated, but not much so if it is thinned appropriately (can measure autocorrelation to assess this) If using heated chains, only the cold chain is sampled The marginal distribution of any parameter can be obtained from this sample

Putting it all together Start with random tree and arbitrary initial values for branch lengths and model parameters Each generation consists of one of these (chosen at random): Propose a new tree (e.g. Larget-Simon move) and either accept or reject the move Propose (and either accept or reject) a new model parameter value Every k generations, save tree topology, branch lengths and all model parameters (i.e. sample the chain) After n generations, summarize sample using histograms, means, credible intervals, etc. Prior Distributions Prior distributions For topologies: discrete Uniform distribution For proportions: Beta(a,b) distribution flat when a=b peaked above 0.5 if a=b and both are greater than 1 For base frequencies: Dirichlet(a,b,c,d) distribution flat when a=b=c=d all base frequencies close to 0.25 if v=a=b=c=d and v large (e.g. 300) For GTR model relative rates: Dirichlet(a,b,c,d,e,f) distribution

Prior Distributions For other model parameters and branch lengths: Gamma(a,b) distribution Exponential(λ) equals Gamma(1, λ-1) λ distribution Mean of Gamma(a,b) is ab (so mean of an Exponential(10) distribution is 0.1) Variance of a Gamma(a,b) distribution is ab 2 (so variance of an Exponential(10) distribution is 0.01) The effect of priors Flat (uninformative) priors mean that the posterior probability is directly proportional to the likelihood The value of H at the peak of the posterior distribution is equal to the MLE of H Informative priors can have a strong effect on posterior probabilities

10 important considerations Top 10 List (of important considerations) 1. Beware arbitrarily truncated priors 2. Branch length priors particularly important 3. Beware high posteriors for very short branch lengths 4. Partition with care (prefer fewer subsets) 5. MCMC run length should depend on number of parameters 6. Calculate how many times parameters were updated 7. Pay attention to parameter estimates 8. Run without data to explore prior 9. Run long and run often! 10. Future: model selection should include effects of priors

Top 10 List (of important considerations) 1. Beware arbitrarily truncated priors 2. Branch length priors particularly important 3. Beware high posteriors for very short branch lengths 4. Partition with care (prefer fewer subsets) 5. MCMC run length should depend on number of parameters 6. Calculate how many times parameters were updated 7. Pay attention to parameter estimates 8. Run without data to explore prior 9. Run long and run often! 10. Future: model selection should include effects of priors

To conclude Bayesian methods have great potential Are able to take into account uncertainty in parameter estimates Still assume a homogenous Markov model for rates of change in a tree There are still problems that need to be fixed