Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Similar documents
Bayesian Phylogenetics:

Infer relationships among three species: Outgroup:

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

A Bayesian Approach to Phylogenetics

Bayesian Inference and MCMC

Molecular Evolution & Phylogenetics

Answers and expectations

ST 740: Markov Chain Monte Carlo

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Bayesian Methods for Machine Learning

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Principles of Bayesian Inference

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

MCMC: Markov Chain Monte Carlo

Bayesian Networks in Educational Assessment

Lecture 6: Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Introduction to Bayesian Statistics 1

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Regression Linear and Logistic Regression

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Estimating Evolutionary Trees. Phylogenetic Methods

Computational statistics

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

CSC 2541: Bayesian Methods for Machine Learning

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Markov Chain Monte Carlo

An Introduction to Bayesian Inference of Phylogeny

Monte Carlo in Bayesian Statistics

MCMC Sampling for Bayesian Inference using L1-type Priors

Reminder of some Markov Chain properties:

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Molecular Epidemiology Workshop: Bayesian Data Analysis

Session 3A: Markov chain Monte Carlo (MCMC)

Principles of Bayesian Inference

Principles of Bayesian Inference

CPSC 540: Machine Learning

Markov chain Monte Carlo

Markov Chain Monte Carlo, Numerical Integration

STA 4273H: Statistical Machine Learning

16 : Markov Chain Monte Carlo (MCMC)

Markov Chains and MCMC

Markov chain Monte Carlo

17 : Markov Chain Monte Carlo

Bayesian Classification and Regression Trees

Markov Chain Monte Carlo (MCMC)

Doing Bayesian Integrals

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference

Simulation - Lectures - Part III Markov chain Monte Carlo

Advanced Statistical Modelling

MCMC algorithms for fitting Bayesian models


One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today.

MCMC Methods: Gibbs and Metropolis

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Multimodal Nested Sampling

arxiv: v1 [stat.co] 23 Apr 2018

STA 4273H: Sta-s-cal Machine Learning

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Machine Learning for Data Science (CS4786) Lecture 24

When Trees Grow Too Long: Investigating the Causes of Highly Inaccurate Bayesian Branch-Length Estimates

Lecture 7 and 8: Markov Chain Monte Carlo

Learning the hyper-parameters. Luca Martino

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Estimation of Input Output Tables for Russia

Markov chain Monte Carlo methods in atmospheric remote sensing

Bayesian Phylogenetics

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bayesian Networks BY: MOHAMAD ALSABBAGH

The Ising model and Markov chain Monte Carlo

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Forward Problems and their Inverse Solutions

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Bayes Nets: Sampling

Stat 516, Homework 1

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Markov Chain Monte Carlo

STAT 425: Introduction to Bayesian Analysis

The Metropolis-Hastings Algorithm. June 8, 2012

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference

Transcription:

Bayesian phylogenetics the one true tree? the methods we ve learned so far try to get a single tree that best describes the data however, they admit that they don t search everywhere, and that it is difficult to find the best tree are we doing a good job reporting a single tree?? 2 Bayesian phylogenetics using Bayesian principles, we will search for a set of plausible trees (weighed by their probability) instead of a single best tree in this method, the space that you search in is limited by prior information and the data the posterior distribution of trees can be translated to a probability of any branching event - allows estimate of uncertainty! - BUT incorporates prior beliefs 3

4 World championship medalists 5 World championship medalists 6

Bayes rule: H represents a specific hypothesis Pr(H D) = Pr(D H)Pr(H) Pr(D) Pr(H) is called the prior probability of H that was assumed before new data, D, became available. Pr(D H) is called the conditional probability of seeing D if H is true. It is also called a likelihood function when it is considered as a function of H for fixed D Pr(D) is called the marginal probability of D: the a priori probability of witnessing the D under all possible hypotheses. It can be calculated as the sum of the product of all probabilities of any complete set of mutually exclusive hypotheses and corresponding conditional probabilities: Pr(D) = Pr(D Hi)P(Hi) Pr(H D) is called the posterior probability of H given D. 7 Bayes rule Pr(H D) = Pr(D H)Pr(H) Pr(D) I choose a cookie from a jar at random, and it turns out to be a sugar cookie how probable is it that I picked from Jar #1? Pr(Jar1 sc) = Pr(sc Jar1)*Pr(Jar1) [Pr(sc Jar1)*Pr(Jar1)]+ [Pr(sc Jar2)*Pr(Jar2)] = 0.75 * 0.5 (0.75 * 0.5) + (0.5 * 0.5) = 0.6 10x 30x 20x 20x Jar #1 Jar #2 8 Bayesian phylogenetic inference A C B 9

Bayesian phylogenetic inference A B C Model probability 1.0 Prior distribution probability 1.0 Data (observations) ACGTAATCGT ACTTGATAGT AGTTAAAAGT ACTTGAATGT Posterior distribution 10 Bayesian inference of phylogeny The posterior probability of a phylogenetic tree, τ : f(τ i X) = f(x τ i ) f(τ i ) B(s) = (2n 3)!! Σ B(s) j=1 f(x τ f(x) j ) f(τ j ) Likelihood: n Π L(τ, η, Θ y 1,...,y 6 ) = Pr[y i τ, η, Θ] A I=1 alignment sites A C G T branch topology lengths model parameters G G T T C G υ 1 υ 2 υ 3 υ 4 υ 5 υ 6 T Prior: 1 B(s) number of topologies for s taxa 11 Bayesian inference: numerical integration via sampling as a solution? 2000 random samples 10000 random samples 12

Markov chain Monte Carlo (MCMC) integration Metropolis et al. (1953) and Hastings (1970) proposed a stochastic algorithm (a variant of MCMC) to explore vast parameter spaces the frequency at which a tree is visited will be proportional to the posterior probability of this tree - trees with higher posterior probability will be visited more often - Ergodic Theorem 13 MCMC 14 Metropolis-Hastings algorithm The algorithm starts from a random state (Ψ = {τ,υ,θ}) and proposes a new state (Ψ*) The new state is accepted with probability: R = min 1, f (Ψ* X) x f (Ψ Ψ*) ( ) f (Ψ X) f (Ψ* Ψ) ( ) = min 1, f (X Ψ*) f (Ψ*)/f(X) f (X Ψ) f (Ψ)/f(X) f (Ψ Ψ*) x f (Ψ* Ψ) = min 1, f (X Ψ*) f (Ψ*) f (Ψ Ψ*) ( x x ) f (X Ψ) f (Ψ) f (Ψ* Ψ) Likelihood ratio Prior ratio Proposal ratio Draw random variable q~(0,1): Ψ Ψ* q < R: Ψ= Ψ* Otherwise chain remains in original state 15

Metropolis-Hastings / MCMC This process of proposing a new state, calculating the acceptance probability, and either accepting or rejecting the proposed move is repeated many millions of times. The samples from the algorithm form a Markov chain of valid, albeit correlated, sample from the posterior probability distribution The initial sampling (=burn-in) is required to reach stationarity and should be discarded when summarizing the results 16 Metropolis-coupled MCMC How to cross deep valleys? Running multiple chains, some of which are heated: Heating results in lower peaks and filling in valleys After a step in the chain, a swap is attempted If the swap is accepted, then the states are exchanged 17 Bayesian inference in phylogeny We have almost no prior knowledge for the parameters of interest... So why bother doing Bayesian inference?!!! A Bayesian analysis expresses its results as the probability of the hypothesis given the data MCMC is a stochastic algorithm and thus is able to avoid getting stuck in a local suboptimal solution. By sampling a set of plausible trees, MCMC allows estimating of the uncertainty of any branching event 18

summarizing MCMC results * Logging trees and continuous parameters for each sample continuous parameters [ID: 9409050143] Gen LnL TL r(a<->c)... pi(g) pi(t) alpha pinvar 1-5723.498 3.357 0.067486... 0.098794 0.247609 0.580820 0.124136 10-5727.478 3.110 0.030604... 0.072965 0.263017 0.385311 0.045880... 9990-5727.775 2.687 0.052292... 0.086991 0.224332 0.951843 0.228343 10000-5720.465 3.290 0.038259... 0.076770 0.240826 0.444826 0.087738 summarizing MCMC results * Logging trees and continuous parameters for each sample The proportion of a given tree topology (after burn-in) in these logs is an approximation of the posterior probability of that tree topology Trees in the file can be analyzed for tree partitions, from which a consensus tree can be made. The proportion of a given tree partition in the trees is the posterior probability of that partition posterior probabilities 21

summarizing trees A B C D E A B C D E A B C D E Tree 1 Tree 2 Tree 3 A B C D E A B C D E 0.67 1 0.67 Strict consensus tree Majority-rule consensus tree 22 MCMC diagnostics measure the behavior of a single chain visually inspect output traces LnL measure autocorrelation within a chain: the effective sampling size (ESS) 0.00025 ESS =75 0.0002 0.00015 0.0001 0.00005 states 0.28 0.27 0.26 0.25 0.24 0.23 0.22 0.21 ESS = 900 0.2 0.0 0.0 5000000.0 10000000.0 15000000.0 20000000.0 25000000.0 30000000.0 35000000.0 0.19 0.0 10000000.0 20000000.0 30000000.0 40000000.0 50000000.0 23 MCMC diagnostics compare samples from different runs Visually inspect output traces Equilibrium frequency compare different runs = comparing the variance among and within runs. states 24

improving convergence (Only if convergence diagnostics indicate problem!) run the chain longer increase the number of heated chains change heating temperature to bring acceptance rate of swaps between adjacent chains into the range 10 % to 70 %. change tuning parameters of proposals to bring acceptance rate into the range 10 % to 70 % propose changes to difficult parameters more often use different proposal mechanisms make the model more realistic 25 improving MCMC mixing 1. chain length Bayesian analyses gives you a lot of rope...... to hang yourself with (J.P. Huelsenbeck) 10000000 7500000 Chain length 5000000 2500000 0 0 100 200 300 400 Number of taxa 26 improving MCMC mixing 2. proposals A 25 Target distribution Sampled value 20 15 10 Too modest proposals Acceptance rate too high Poor mixing 5 B 0 100 200 300 400 500 Generation 25 Sampled value 20 15 10 Too bold proposals Acceptance rate too low Poor mixing 5 C 0 100 200 300 400 500 Generation 25 Sampled value 20 15 10 Moderately bold proposals Acceptance rate intermediate Good mixing 5 0 100 200 300 400 500 Generation 27

a brief recap... to be followed up 1. frequentist "probability" = long-run fraction having this characteristic. 2. Bayesian "probability" = degree of believability. 3. A frequentist is a person whose long-run ambition is to be wrong 5% of the time. 4. A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule. - charles annis 28