A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte Carlo Bayesian phylogenetics Prior distributions 10 important considerations Bayesian inference in general D will stand for Data H will mean any one of a number of things: a discrete hypothesis a distinct model (e.g. JC, HKY, GTR, etc.) a tree topology one of an infinite number of continuous model parameter values (e.g. ts:tv rate ratio) A Bayesian approach compared to ML In ML, we choose the hypothesis that gives the highest (maximized) likelihood to the data The likelihood is the probability of the data given the hypothesis L = P (D H). A Bayesian analysis expresses its results as the probability of the hypothesis given the data. this may be a more desirable way to express the result

The posterior probability of a hypothesis Likelihood of hypothesis Prior probability of hypothesis The posterior probability, [P (H D)], is the probability of the hypothesis given the observations, or data (D) The main feature in Bayesian statistics is that it takes into account prior knowledge of the hypothesis P (H D) = P (D H) * P (H) P (D) Posterior probability of hypothesis H Probability of the data (a normalizing constant) Likelihood function is common Both ML and Bayesian methods use the likelihood function In ML, free parameters are optimized, maximizing the likelihood In a Bayesian approach, free parameters are probability distributions, which are sampled. Coin-flipping example Data D: 6 heads (out of 10 flips) H = true underlying proportion of heads (the probability of coming up heads on any single flip) if H = 0.5, coin is perfectly fair if H = 1.0, coin always comes up heads (i.e. it is a trick coin)

The Frequentist and the Bayesian F: there exists true probability H of getting heads, H 0 : H=0.5 Does the data reject the null hypothesis? B: what is the range around 0.5 that we are willing to accept as being in the fair coin range? What is the probability that H is in this range? H

How the MCMC works Markov chain Monte Carlo Start somewhere That somewhere will have a likelihood associated with it Not the optimized, maximum likelihood Randomly propose a new state If the new state has a better likelihood, the chain goes there

Target vs. proposal distributions The target distribution is the posterior distribution of interest The proposal distribution is used to decide where to go next; you have much flexibility here, and the choice affects the efficiency of the MCMC algorithm Symmetric proposal distributions have been assumed thus far, but the Hastings ratio can be used for asymmetric ones

The Tradeoff Pro: taking big steps helps in jumping from one island in the posterior density to another Con: taking big steps often results in poor mixing Solution: MCMCMC!

Metropolis-coupled Markov chain Monte Carlo (MCMCMC, or MC 3 ) MC 3 involves running several chains simultaneously (one cold and several heated ) The cold chain is the one that counts, the heated chains are scouts Chain is heated by raising densities to a power less than 1.0 (values closer to 0.0 are warmer) Bayesian phylogenetics

Sampling the chain Marginal = taking into account all possible values Record the position of the robot every 100 or 1000 steps (1000 represents more thinning than 100) This sample will be autocorrelated, but not much so if it is thinned appropriately (can measure autocorrelation to assess this) If using heated chains, only the cold chain is sampled The marginal distribution of any parameter can be obtained from this sample

Putting it all together Start with random tree and arbitrary initial values for branch lengths and model parameters Each generation consists of one of these (chosen at random): Propose a new tree (e.g. Larget-Simon move) and either accept or reject the move Propose (and either accept or reject) a new model parameter value Every k generations, save tree topology, branch lengths and all model parameters (i.e. sample the chain) After n generations, summarize sample using histograms, means, credible intervals, etc. Prior Distributions Prior distributions For topologies: discrete Uniform distribution For proportions: Beta(a,b) distribution flat when a=b peaked above 0.5 if a=b and both are greater than 1 For base frequencies: Dirichlet(a,b,c,d) distribution flat when a=b=c=d all base frequencies close to 0.25 if v=a=b=c=d and v large (e.g. 300) For GTR model relative rates: Dirichlet(a,b,c,d,e,f) distribution

Prior Distributions For other model parameters and branch lengths: Gamma(a,b) distribution Exponential(λ) equals Gamma(1, λ-1) λ distribution Mean of Gamma(a,b) is ab (so mean of an Exponential(10) distribution is 0.1) Variance of a Gamma(a,b) distribution is ab 2 (so variance of an Exponential(10) distribution is 0.01) The effect of priors Flat (uninformative) priors mean that the posterior probability is directly proportional to the likelihood The value of H at the peak of the posterior distribution is equal to the MLE of H Informative priors can have a strong effect on posterior probabilities

10 important considerations Top 10 List (of important considerations) 1. Beware arbitrarily truncated priors 2. Branch length priors particularly important 3. Beware high posteriors for very short branch lengths 4. Partition with care (prefer fewer subsets) 5. MCMC run length should depend on number of parameters 6. Calculate how many times parameters were updated 7. Pay attention to parameter estimates 8. Run without data to explore prior 9. Run long and run often! 10. Future: model selection should include effects of priors

Top 10 List (of important considerations) 1. Beware arbitrarily truncated priors 2. Branch length priors particularly important 3. Beware high posteriors for very short branch lengths 4. Partition with care (prefer fewer subsets) 5. MCMC run length should depend on number of parameters 6. Calculate how many times parameters were updated 7. Pay attention to parameter estimates 8. Run without data to explore prior 9. Run long and run often! 10. Future: model selection should include effects of priors

To conclude Bayesian methods have great potential Are able to take into account uncertainty in parameter estimates Still assume a homogenous Markov model for rates of change in a tree There are still problems that need to be fixed