Approximate Bayesian Computation: a simulation based approach to inference

Similar documents
Tutorial on ABC Algorithms

Approximate Bayesian computation (ABC) gives exact results under the assumption of. model error

Approximate Bayesian computation: methods and applications for complex systems

Approximate Bayesian Computation

Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds.

Bayesian Inference and MCMC

Markov Chain Monte Carlo methods

Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian Computation

An introduction to Approximate Bayesian Computation methods

Likelihood-free MCMC

Approximate Bayesian Computation

Bayesian Methods for Machine Learning

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Approximate Bayesian computation (ABC) and the challenge of big simulation

Accelerating ABC methods using Gaussian processes

Bayesian inference and model selection for stochastic epidemics and other coupled hidden Markov models

an introduction to bayesian inference

Monte Carlo in Bayesian Statistics

arxiv: v1 [stat.me] 30 Sep 2009

Approximate Bayesian computation (ABC) NIPS Tutorial

Adaptive Monte Carlo methods

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

Fast Likelihood-Free Inference via Bayesian Optimization

MCMC: Markov Chain Monte Carlo

Calibrating computer simulators using ABC: An example from evolutionary biology

Metropolis-Hastings Algorithm

MARKOV CHAIN MONTE CARLO

Markov Chain Monte Carlo

Bayesian Methods in Multilevel Regression

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

MCMC and Gibbs Sampling. Kayhan Batmanghelich

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC

Computer Practical: Metropolis-Hastings-based MCMC

CPSC 540: Machine Learning

An introduction to Sequential Monte Carlo

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Answers and expectations

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

A Bayesian Approach to Phylogenetics

Tutorial on Approximate Bayesian Computation

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Notes on pseudo-marginal methods, variational Bayes and ABC

CONDITIONING ON PARAMETER POINT ESTIMATES IN APPROXIMATE BAYESIAN COMPUTATION

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Multimodal Nested Sampling

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Monte Carlo Inference Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Reminder of some Markov Chain properties:

Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Markov Chain Monte Carlo, Numerical Integration

Markov chain Monte Carlo Lecture 9

Bayesian Phylogenetics:

Probabilistic Machine Learning

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

MCMC and likelihood-free methods

Sampling Methods (11/30/04)

Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods

Hmms with variable dimension structures and extensions

ABC methods for phase-type distributions with applications in insurance risk problems

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Multi-level Approximate Bayesian Computation

Monte Carlo Methods for Computation and Optimization (048715)

New Insights into History Matching via Sequential Monte Carlo

Introduction to Bayesian Methods

Learning the hyper-parameters. Luca Martino

LECTURE 15 Markov chain Monte Carlo

A modelling approach to ABC

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Bayesian Inference. Chapter 1. Introduction and basic concepts

A tutorial introduction to Bayesian inference for stochastic epidemic models using Approximate Bayesian Computation

Machine Learning for Data Science (CS4786) Lecture 24

Approximate Bayesian Computation for Astrostatistics

Markov chain Monte Carlo methods for visual tracking

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Lecture 13 : Variational Inference: Mean Field Approximation

Session 3A: Markov chain Monte Carlo (MCMC)

Example: Ground Motion Attenuation

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

[MCEN 5228]: Project report: Markov Chain Monte Carlo method with Approximate Bayesian Computation

Fundamentals and Recent Developments in Approximate Bayesian Computation

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Approximate Bayesian computation for the parameters of PRISM programs

Markov chain Monte Carlo

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Doing Bayesian Integrals

Bayesian Phylogenetics

Bagging During Markov Chain Monte Carlo for Smoother Predictions

F denotes cumulative density. denotes probability density function; (.)

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Data Analysis and Uncertainty Part 2: Estimation

State Space and Hidden Markov Models

Transcription:

Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics and Theoretical Physics University of Cambridge Workshop on Approximate Inference in Stochastic Processes and Dynamical System R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

Stochastic Computation Implicit Statistical Models Two types of statistical model: Prescribed models - likelihood function is specified. Implicit models - mechanism to simulate observations. Implicit models give scientists more freedom to accurately model the phenomenon under consideration. The increase in computer power has made there use more practicable. Popular in many disciplines. Time t R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 2 / 9

Fitting to data Most models are forwards models, i.e., specify parameters θ and i.c.s and the model generates output D. Usually, we are interested in the inverse-problem, i.e., observe data, want to estimate parameter values. Different terminology: Calibration Data assimilation Parameter estimation Inverse-problem Bayesian inference R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 3 / 9

Monte Carlo Inference Aim to sample from the posterior distribution: π(θ D) prior likelihood = π(θ)p(d θ). Monte Carlo methods enable Bayesian inference to be done in more complex models. MCMC can be difficult or impossible in many stochastic models, e.g., if P(D θ) unknown - true for many stochastic models, or where there are convergence or mixing problems, often caused by highly dependent data arising from an underlying tree or graphical structure. Population Genetics Epidemiology Evolutionary Biology R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 4 / 9

Likelihood-Free Inference Rejection Algorithm Draw θ from prior π( ) Accept θ with probability P(D θ) Accepted θ are independent draws from the posterior distribution, π(θ D). R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 5 / 9

Likelihood-Free Inference Rejection Algorithm Draw θ from prior π( ) Accept θ with probability P(D θ) Accepted θ are independent draws from the posterior distribution, π(θ D). If the likelihood, P(D θ), is unknown: Mechanical Rejection Algorithm Draw θ from π( ) Simulate D P( θ) Accept θ if D = D The acceptance rate is P(D): the number of runs to get n observations is negative binomial, with mean n P(D). R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 5 / 9

Approximate Bayesian Computation I If P(D) is small, we will rarely accept any θ. Instead, there is an approximate version: Approximate Rejection Algorithm Draw θ from π(θ) Simulate D P( θ) Accept θ if ρ(d, D ) ǫ R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 6 / 9

Approximate Bayesian Computation I If P(D) is small, we will rarely accept any θ. Instead, there is an approximate version: Approximate Rejection Algorithm Draw θ from π(θ) Simulate D P( θ) Accept θ if ρ(d, D ) ǫ This generates observations from π(θ ρ(d, D ) < ǫ): As ǫ, we get observations from the prior, π(θ). If ǫ =, we generate observations from π(θ D). ǫ reflects the tension between computability and accuracy. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 6 / 9

Approximate Bayesian Computation II If the data are too high dimensional we never observe simulations that are close to the field data. Reduce the dimension using summary statistics, S(D). Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D P( θ) Accept θ if ρ(s(d),s(d )) < ǫ If S is sufficient this is equivalent to the previous algorithm. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 7 / 9

Error Structure Example (Gaussian Distribution) Suppose X i N(µ,σ 2 ), with σ 2 known, and give µ an improper flat prior distribution, π(µ) = for µ R. Suppose we observe data with x =. samples Pick µ U(, ) ǫ =. ǫ =.5 Simulate X i N(µ,σ 2 ) Accept µ if x < ǫ. Then π(µ x ǫ) = ( ) ( ǫ µ Φ Φ σ 2 /n 2ǫ ǫ µ σ 2 /n and Var(µ x ǫ) = Var(µ x = )+ ǫ2 3 ) Density Density..4.8.2..4.8.2 5 5 5 5 Density..4.8.2..4.8.2 5 5 ǫ = ǫ = 5 Density 5 5 R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 8 / 9 µ µ µ µ

Approximate MCMC Rejection sampling is inefficient, as θ is repeatedly sampled from its prior distribution. The idea behind MCMC is that by correlating observations more time is spent in regions of high likelihood. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 9 / 9

Approximate MCMC Rejection sampling is inefficient, as θ is repeatedly sampled from its prior distribution. The idea behind MCMC is that by correlating observations more time is spent in regions of high likelihood. Approximate Metropolis-Hastings Algorithm Suppose we are currently at θ. Propose θ from density q(θ,θ ). Simulate D from P( θ ). If ρ(d, D ) ǫ, calculate h(θ,θ ) = min (, π(θ )q(θ ),θ) π(θ)q(θ,θ. ) Accept the move to θ with probability h(θ,θ ), else stay at θ. Adaptive tolerance choices. Sisson et al. and Robert et al. proposed an approximate sequential importance sampling algorithm. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 9 / 9

ABC-within-MCMC Problem: a low acceptance rate leads to slow convergence. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

ABC-within-MCMC Problem: a low acceptance rate leads to slow convergence. Suppose θ = (θ,θ 2 ) with π(θ D,θ 2 ) known, π(θ 2 D,θ ) unknown. We can combine Gibbs update steps (or any M-H update) with ABC. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

ABC-within-MCMC Problem: a low acceptance rate leads to slow convergence. Suppose θ = (θ,θ 2 ) with π(θ D,θ 2 ) known, π(θ 2 D,θ ) unknown. We can combine Gibbs update steps (or any M-H update) with ABC. ABC-within-Gibbs Algorithm Suppose we are at θ t = (θ t,θt 2 ). Draw θ t+ π(θ D,θ2 t) 2. Draw θ2 π θ 2 ( ) Simulate D P( θ t+, θ2 ) If ρ(d, D ) < ǫ, set θ2 t+ = θ2. Else return to step 2. This is often the case for models with a hidden tree structure generating highly dependent data. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

Example From Population Biology Inferring ancestral divergence times Time t R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 / 9

Choosing summary statistics and metrics We need summaries S(D), which are sensitive to changes in θ, but robust to random variations in D a definition of approximate sufficiency (LeCam 963): distance between π(θ D) and π(θ S(D))?..5..5 D..5 a systematic implementable approach for finding good summary statistics. Complex dependence structures can be accounted for. D2..5..5..5 2. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 2 / 9

ABC Approach Data can be thought of in two parts: the observed number of fossils D i found in ith interval the total number of fossils found, D +. D denotes simulated data. A suitable metric might be ρ(d, D ) = Note: no data summaries here k D i D i D + + D + D + i= D + R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 3 / 9

Not going so well Extant Population Size 5 5 2 25 3 2 4 6 8 Iteration Number R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 4 / 9

Tweak the metric The simulated N values are too small (376 modern species) Easy to combine different types of information with ABC Change the metric ρ(d, D ) = k D i D i D + + D + D + + N N i= D + This gives approximate samples from π(θ D,N = 376) P(D,N = 376 θ)π(θ) R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 5 / 9

.2.. Density.3 Results 6 8 2 4 Divergence Time (My) R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 6 / 9

Extensions Model selection: Ratio of acceptance rates π M (S S) π M2 (S S) Bayes Factor. Relative acceptance rates gives posterior model probabilities. Hopeless in practice as it is too sensitive to the tolerance ǫ. Raftery and Lewis (992) and Chib (995) give computational schemes to calculate Bayes factors. Neither works. Expensive Simulators: Emulate the stochastic model with a Gaussian process emulator. Richard Boys, Darren Wilkinson et al.. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 7 / 9

Pros and cons of ABC Pros Cons Issues Likelihood is not needed Easy to code Easy to adapt Generates independent observations (parallel computation) Hard to anticipate effect of summary statistics (needs intuition) Over dispersion of posterior due to ρ(d, D ) < ǫ For complex problems, sampling from the prior does not make good use of observations One run or many? How to choose good summary statistics? How good an approximation do we get? R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 8 / 9

References M. A. Beaumont and W. Zhang and D. J. Balding, Approximate Bayesian Computatation in Population Genetics, Genetics, 22. P. Marjoram and J. Molitor and V. Plagnol and S. Tavaré, Markov Chain Monte Carlo without likelihoods, PNAS, 23. S. A. Sisson and Y. Fan and M. M. Tanaka, Sequential Monte Carlo without Likelihoods, PNAS, 27. C. P. Robert, M. A. Beaumont, J. Marin and J. Cornuet, Adaptivity for ABC algorithms: the ABC-PMC scheme, arxiv, 28. R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 28 9 / 9