Approximate Bayesian Computation: a simulation based approach to inference

Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics and Theoretical Physics University of Cambridge Workshop on Approximate Inference in Stochastic Processes and Dynamical System R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

Stochastic Computation Implicit Statistical Models Two types of statistical model: Prescribed models - likelihood function is specified. Implicit models - mechanism to simulate observations. Implicit models give scientists more freedom to accurately model the phenomenon under consideration. The increase in computer power has made there use more practicable. Popular in many disciplines. Time t R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 2 / 9

Fitting to data Most models are forwards models, i.e., specify parameters θ and i.c.s and the model generates output D. Usually, we are interested in the inverse-problem, i.e., observe data, want to estimate parameter values. Different terminology: Calibration Data assimilation Parameter estimation Inverse-problem Bayesian inference R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 3 / 9

Monte Carlo Inference Aim to sample from the posterior distribution: π(θ D) prior likelihood = π(θ)p(d θ). Monte Carlo methods enable Bayesian inference to be done in more complex models. MCMC can be difficult or impossible in many stochastic models, e.g., if P(D θ) unknown - true for many stochastic models, or where there are convergence or mixing problems, often caused by highly dependent data arising from an underlying tree or graphical structure. Population Genetics Epidemiology Evolutionary Biology R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 4 / 9

Likelihood-Free Inference Rejection Algorithm Draw θ from prior π( ) Accept θ with probability P(D θ) Accepted θ are independent draws from the posterior distribution, π(θ D). R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 5 / 9

Likelihood-Free Inference Rejection Algorithm Draw θ from prior π( ) Accept θ with probability P(D θ) Accepted θ are independent draws from the posterior distribution, π(θ D). If the likelihood, P(D θ), is unknown: Mechanical Rejection Algorithm Draw θ from π( ) Simulate D P( θ) Accept θ if D = D The acceptance rate is P(D): the number of runs to get n observations is negative binomial, with mean n P(D). R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 5 / 9

Approximate Bayesian Computation I If P(D) is small, we will rarely accept any θ. Instead, there is an approximate version: Approximate Rejection Algorithm Draw θ from π(θ) Simulate D P( θ) Accept θ if ρ(d, D ) ǫ This generates observations from π(θ ρ(d, D ) < ǫ): As ǫ, we get observations from the prior, π(θ). If ǫ =, we generate observations from π(θ D). ǫ reflects the tension between computability and accuracy. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 6 / 9

Approximate Bayesian Computation II If the data are too high dimensional we never observe simulations that are close to the field data. Reduce the dimension using summary statistics, S(D). Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D P( θ) Accept θ if ρ(s(d),s(d )) < ǫ If S is sufficient this is equivalent to the previous algorithm. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 7 / 9

Error Structure Example (Gaussian Distribution) Suppose X i N(µ,σ 2 ), with σ 2 known, and give µ an improper flat prior distribution, π(µ) = for µ R. Suppose we observe data with x =. samples Pick µ U(, ) ǫ =. ǫ =.5 Simulate X i N(µ,σ 2 ) Accept µ if x < ǫ. Then π(µ x ǫ) = ( ) ( ǫ µ Φ Φ σ 2 /n 2ǫ ǫ µ σ 2 /n and Var(µ x ǫ) = Var(µ x = )+ ǫ2 3 ) Density Density..4.8.2..4.8.2 5 5 5 5 Density..4.8.2..4.8.2 5 5 ǫ = ǫ = 5 Density 5 5 R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 8 / 9 µ µ µ µ

Approximate MCMC Rejection sampling is inefficient, as θ is repeatedly sampled from its prior distribution. The idea behind MCMC is that by correlating observations more time is spent in regions of high likelihood. Approximate Metropolis-Hastings Algorithm Suppose we are currently at θ. Propose θ from density q(θ,θ ). Simulate D from P( θ ). If ρ(d, D ) ǫ, calculate h(θ,θ ) = min (, π(θ )q(θ ),θ) π(θ)q(θ,θ. ) Accept the move to θ with probability h(θ,θ ), else stay at θ. Adaptive tolerance choices. Sisson et al. and Robert et al. proposed an approximate sequential importance sampling algorithm. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 9 / 9

ABC-within-MCMC Problem: a low acceptance rate leads to slow convergence. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

ABC-within-MCMC Problem: a low acceptance rate leads to slow convergence. Suppose θ = (θ,θ 2 ) with π(θ D,θ 2 ) known, π(θ 2 D,θ ) unknown. We can combine Gibbs update steps (or any M-H update) with ABC. ABC-within-Gibbs Algorithm Suppose we are at θ t = (θ t,θt 2 ). Draw θ t+ π(θ D,θ2 t) 2. Draw θ2 π θ 2 ( ) Simulate D P( θ t+, θ2 ) If ρ(d, D ) < ǫ, set θ2 t+ = θ2. Else return to step 2. This is often the case for models with a hidden tree structure generating highly dependent data. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

Example From Population Biology Inferring ancestral divergence times Time t R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

Choosing summary statistics and metrics We need summaries S(D), which are sensitive to changes in θ, but robust to random variations in D a definition of approximate sufficiency (LeCam 963): distance between π(θ D) and π(θ S(D))?..5..5 D..5 a systematic implementable approach for finding good summary statistics. Complex dependence structures can be accounted for. D2..5..5..5 2. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 2 / 9

ABC Approach Data can be thought of in two parts: the observed number of fossils D i found in ith interval the total number of fossils found, D +. D denotes simulated data. A suitable metric might be ρ(d, D ) = Note: no data summaries here k D i D i D + + D + D + i= D + R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 3 / 9

Not going so well Extant Population Size 5 5 2 25 3 2 4 6 8 Iteration Number R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 4 / 9

Tweak the metric The simulated N values are too small (376 modern species) Easy to combine different types of information with ABC Change the metric ρ(d, D ) = k D i D i D + + D + D + + N N i= D + This gives approximate samples from π(θ D,N = 376) P(D,N = 376 θ)π(θ) R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 5 / 9

.2.. Density.3 Results 6 8 2 4 Divergence Time (My) R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 6 / 9

Extensions Model selection: Ratio of acceptance rates π M (S S) π M2 (S S) Bayes Factor. Relative acceptance rates gives posterior model probabilities. Hopeless in practice as it is too sensitive to the tolerance ǫ. Raftery and Lewis (992) and Chib (995) give computational schemes to calculate Bayes factors. Neither works. Expensive Simulators: Emulate the stochastic model with a Gaussian process emulator. Richard Boys, Darren Wilkinson et al.. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 7 / 9

Pros and cons of ABC Pros Cons Issues Likelihood is not needed Easy to code Easy to adapt Generates independent observations (parallel computation) Hard to anticipate effect of summary statistics (needs intuition) Over dispersion of posterior due to ρ(d, D ) < ǫ For complex problems, sampling from the prior does not make good use of observations One run or many? How to choose good summary statistics? How good an approximation do we get? R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 8 / 9

References M. A. Beaumont and W. Zhang and D. J. Balding, Approximate Bayesian Computatation in Population Genetics, Genetics, 22. P. Marjoram and J. Molitor and V. Plagnol and S. Tavaré, Markov Chain Monte Carlo without likelihoods, PNAS, 23. S. A. Sisson and Y. Fan and M. M. Tanaka, Sequential Monte Carlo without Likelihoods, PNAS, 27. C. P. Robert, M. A. Beaumont, J. Marin and J. Cornuet, Adaptivity for ABC algorithms: the ABC-PMC scheme, arxiv, 28. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 9 / 9