Approximate Bayesian Computation

Size: px

Start display at page:

Download "Approximate Bayesian Computation"

Stephanie Palmer
5 years ago
Views:

1 Approximate Bayesian Computation Michael Gutmann University of Helsinki and Aalto University 1st December 2015

2 Content Two parts: 1. The basics of approximate Bayesian computation (ABC) 2. ABC methods used in practice What is ABC? A set of methods for approximate Bayesian inference which can be used whenever sampling from the model is possible. Michael Gutmann ABC 2 / 47

3 Part I Basic ABC

4 Bayesian inference Inference for simulator-based models Recap Simulator-based models Recap of Bayesian inference The ingredients for Bayesian parameter inference: Observed data y o Y R n A statistical model for the data generating process, py θ, parametrized by θ Θ R d. A prior probability density function (pdf) for the parameters θ, p θ The mechanics of Bayesian inference: p θ y (θ y o ) p y θ (y o θ) p θ (θ) (1) posterior likelihood function prior (2) Often written without subscripts ( function overloading ) p(θ y o ) p(y o θ) p(θ) (3) Michael Gutmann ABC 4 / 47

5 Bayesian inference Inference for simulator-based models Recap Simulator-based models Likelihood function Likelihood function: L(θ) = p(y o θ) For discrete random variables: L(θ) = p(y o θ) = Pr(y = y o θ) (4) Probability that data generated from the model, when using parameter value θ, are equal to y o. For continuous random variables: L(θ) = p(y o θ) = lim ɛ 0 Pr(y B ɛ (y o ) θ) Vol(B ɛ (y o )) (5) Proportional to the probability that the generated data are in a small ball B ɛ (y o ) around y o. L(θ) indicates to which extent different values of the model parameters are consistent with the observed data. Michael Gutmann ABC 5 / 47

6 Bayesian inference Inference for simulator-based models Recap Simulator-based models Example ( ) p(θ) = 1 exp 2π 4 θ ) y o = 2 p(y θ) = 1 2π exp ( (y θ)2 2 data model 0.1 prior 0.4 likelihood function prior pdf likelihood mean mean posterior pdf mean posterior Michael Gutmann ABC 6 / 47

7 Bayesian inference Inference for simulator-based models Recap Simulator-based models Different kinds of statistical models The statistical model was defined via the family of pdfs p(y θ). Statistical models can be specified in other ways as well. In this lecture: models which are specified via a mechanism (rule) for generating data Example: Instead of p(y θ) = 1 2π exp ( we could have specified the model via ) (y θ)2 2 (6) y = z + θ z = 2 log(ω) cos(2πν) (7) where ω and ν are independent random variables uniformly distributed on (0, 1). Advantage? Michael Gutmann ABC 7 / 47

8 Bayesian inference Inference for simulator-based models Recap Simulator-based models Simulator-based models Sampling from the model is straightforward. For example: 1. Sampling ω i and ν i from the uniform random variables ω and ν, 2. computing the nonlinear transformation y i = f (ω i, ν i, θ) = θ + 2 log(ω i ) cos(2πν i ) produces samples y i p(y θ). Enables direct modeling of how data are generated. Names for models specified via a data generating mechanism: Generative models Implicit models Stochastic simulation models Simulator-based models Michael Gutmann ABC 8 / 47

Bayesian inference Inference for simulator-based models Recap Simulator-based models Examples Simulator-based models are used in: I Astrophysics: Simulating the formation of galaxies, stars, or

9 Bayesian inference Inference for simulator-based models Recap Simulator-based models Examples Simulator-based models are used in: I Astrophysics: Simulating the formation of galaxies, stars, or planets I Evolutionary biology: Simulating the evolution of life I Health science: Simulating the spread of an infectious disease I... Dark matter density simulated by the Illustris collaboration (Figure from Michael Gutmann ABC 9 / 47

Examples (evolutionary biology) Simulation of different hypothesized evolutionary scenarios Interaction between early modern humans (Homo sapiens) and their Neanderthal contemporaries in Europe

10 Examples (evolutionary biology) Simulation of different hypothesized evolutionary scenarios Interaction between early modern humans (Homo sapiens) and their Neanderthal contemporaries in Europe Immigration of Modern Humans into Europe from the Near East. Light gray: Neanderthal population. Dark: Homo sapiens. from (Currat and Excoffier, Plos Biology, 2004, /journal.pbio ). The numbers in the figures indicate generations. See also Pinhasi et al, The genetic history of Europeans, Trends in Genetics, 2012 Michael Gutmann ABC 10 / 47

11 Strain Examples (health science) Simulation of bacterial transmission dynamics in child day care centers (Numminen et al, Biometrics, 2013) 5 10 Strain Individual Individual 5 10 Strain Individual 10 Strain Strain Time Individual Individual Michael Gutmann ABC 11 / 47

12 Bayesian inference Inference for simulator-based models Recap Simulator-based models Formal definition of a simulator-based model Let (Ω, F, P) be a probability space. A simulator-based model is a collection of (measurable) functions g(., θ) parametrized by θ, ω Ω y = g(ω, θ) Y (8) For any fixed θ, y θ = g(., θ) is a random variable. Simulation / Sampling Michael Gutmann ABC 12 / 47

13 Bayesian inference Inference for simulator-based models Recap Simulator-based models Advantages of simulator-based models Direct implementation of hypotheses of how the observed data were generated. Neat interface with physical or biological models of data. Modeling by replicating the mechanisms of nature which produced the observed/measured data. ( Analysis by synthesis ) Possibility to perform experiments in silico. Michael Gutmann ABC 13 / 47

14 Bayesian inference Inference for simulator-based models Recap Simulator-based models Disadvantages of simulator-based models Generally elude analytical treatment. Can be easily made more complicated than necessary. Statistical inference is difficult... but possible! This lecture is about inference for simulator-based models Michael Gutmann ABC 14 / 47

15 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Family of pdfs induced by the simulator For any fixed θ, the output of the simulator y θ = g(., θ) is a random variable. Generally, it is impossible to write down the pdf of y θ analytically in closed form. No closed-form formulae available for p(y θ). Simulator defines the model pdfs p(y θ) implicitly. Michael Gutmann ABC 15 / 47

16 Implicit definition of the model pdfs A A Michael Gutmann ABC 16 / 47

17 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Implicit definition of the likelihood function The implicit definition of the model pdfs implies an implicit definition of the likelihood function. For discrete random variables: Michael Gutmann ABC 17 / 47

18 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Implicit definition of the likelihood function For continuous random variables: L(θ) = lim ɛ 0 L ɛ (θ) Michael Gutmann ABC 18 / 47

19 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Implicit definition of the likelihood function To compute the likelihood function, we need to compute the probability that the simulator generates data close to y o, Pr (y = y o θ) or Pr (y B ɛ (y o ) θ) No analytical expression available. But we can empirically test whether simulated data equals y o or is in B ɛ (y o ). This property will be exploited to perform inference for simulator-based models. Michael Gutmann ABC 19 / 47

20 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Exact inference for discrete random variables For discrete random variables, we can perform exact Bayesian inference without knowing the likelihood function. Idea: the posterior is obtained by conditioning p(θ, y) on the event y = y o : p(θ y o ) = p(θ, yo ) p(y o ) = p(θ, y = yo ) p(y = y o ) (9) Given tuples (θ i, y i ) where θ i p θ (iid from the prior) yi = g(ω i, θ i ) (obtained by running the simulator) retain only those where y i = y o. The θ i from the retained tuples are samples from the posterior p(θ y o ). Michael Gutmann ABC 20 / 47

21 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Example Posterior inference of the success probability θ in a Bernoulli trial. Data: y o = 1 Prior: p θ = 1 on (0, 1) Data generating process: Given θ i p θ ω i { U(0, 1) 1 if ω i < θ i y i = 0 otherwise Retain those θ i for which y i = y o. Michael Gutmann ABC 21 / 47

22 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Example The method produces samples from the posterior. Monte Carlo error when summarizing the samples as an empirical distribution or computing expectations via sample averages. Histogram for N simulated tuples (θ i, y i ) Estimated posterior pdf True posterior pdf Prior pdf Estimated posterior pdf True posterior pdf Prior pdf Estimated posterior pdf True posterior pdf Prior pdf pdf pdf 1 pdf Success probability N = Success probability N = 10, Success probability N = 100, 000 Michael Gutmann ABC 22 / 47

23 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference The good and the bad The method produces samples from p(θ y o ). This is good. But only applicable to discrete random variables. And even for discrete random variables: Computationally not feasible in higher dimensions Reason: The probability of the event y θ = y o becomes smaller and smaller as the dimension of the data increases. Out of N simulated tuples only a small fraction will be accepted. The small number of accepted samples do not represent the posterior well. Large Monte Carlo errors This is bad. Michael Gutmann ABC 23 / 47

24 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Approximations to make inference feasible Settle for approximate yet computationally feasible inference. Introduce two types of approximations: 1. Instead of working with the whole data, work with lower dimensional summary statistics t θ and t o, t θ = T (y θ ) t o = T (y o ). (10) 2. Instead of checking t θ = t o, check whether θ = d(t o, t θ ) is less than ɛ. (d may or may not be a metric) In other words: 1. Replace Pr (y B ɛ (y o ) θ) with Pr ( θ ɛ θ) 2. Do not take the limit ɛ 0 Defines an approximate likelihood function L ɛ (θ), L ɛ (θ) Pr ( θ ɛ θ) (11) Michael Gutmann ABC 24 / 47

25 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Example Inference of the mean θ of a Gaussian of variance one. Pr(y = y o θ) = 0. Discrepancy θ : θ = (ˆµ o ˆµ θ ) 2, ˆµ o = 1 n yi o, n ˆµ θ = 1 n i=1 n y i, i=1 y i N (θ, 1) discrepancy , 0.9 quantiles mean realization of stochastic process realizations at θ θ Discrepancy θ is a random variable. Michael Gutmann ABC 25 / 47

26 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Example Probability that θ is below some threshold ɛ approximates the likelihood function , 0.9 quantiles mean threshold (0.1) approximate likelihood (rescaled) true likelihood (rescaled) θ Michael Gutmann ABC 26 / 47

27 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Example Here, T (y) = 1 n n i=1 y i is a sufficient statistics for inference of the mean θ The only approximation is ɛ > 0. In general, the summary statistics will not be sufficient. Michael Gutmann ABC 27 / 47

28 Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference Rejection ABC algorithm The two approximations made yield the rejection algorithm for approximate Bayesian computation (ABC): 1. Sample θ i p θ 2. Simulate a data set y i by running the simulator with θ i (y i = g(ω i, θ i )) 3. Compute the discrepancy i = d(t (y o ), T (y i )) 4. Retain θ i if i ɛ This is the basic ABC algorithm. It produces samples θ p ɛ (θ y o ), p ɛ (θ y o ) p θ (θ) L ɛ (θ) (12) L ɛ (θ) Pr ( d(t (y o ) ), T (y)) ɛ θ (13) }{{} θ Michael Gutmann ABC 28 / 47

29 Part II ABC methods used in practice

30 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Brief recap Simulator-based models: Models which are specified by a data generating mechanism. By construction, we can sample from simulator-based models. Likelihood function can generally not be written down. Rejection ABC: Trial and error scheme to find parameter values which produce simulated data resembling the observed data. Simulated data resemble the observed data if some discrepancy measure is small. Michael Gutmann ABC 30 / 47

31 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Critique of the rejection ABC algorithm The rejection ABC algorithm works. But it is computationally not efficient. The probability of the event θ ɛ is usually small when θ p θ. In particular for small ɛ. Michael Gutmann ABC 31 / 47

32 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Critique of the rejection ABC algorithm In the Gaussian example, the probability for θ ɛ can be computed in closed form θ = ( ˆµ o ˆµ θ ) 2 Pr( θ ɛ) = Φ ( n(ˆµ o θ) + nɛ ) Φ ( n(ˆµ o θ) nɛ ) For nɛ small: L ɛ (θ) Pr( θ ɛ) ɛl(θ) Φ(x) = x 1 exp ( 1 2π 2 u2) du For small ɛ good approximation of the likelihood function. But for small ɛ, Pr( θ ɛ) 0: Very few samples will be accepted , 0.9 quantiles mean threshold (0.1) approximate likelihood (rescaled) true likelihood (rescaled) θ Michael Gutmann ABC 32 / 47

33 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Two widely used algorithms Two widely used algorithms which improve upon rejection ABC: 1. Regression ABC (Beaumont et al, Genetics, 2002) 2. Sequential Monte Carlo ABC (Sisson et al, PNAS, 2007) Both use rejection ABC as a building block. Sequential Monte Carlo (SMC) ABC is also known as Population Monte Carlo (PMC) ABC. Michael Gutmann ABC 33 / 47

34 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Two widely used algorithms Regression ABC consists in running rejection ABC with a relatively large ɛ and then adjusting the obtained samples so that they are closer to samples from the true posterior. Sequential Monte Carlo ABC consists in sampling θ from an adaptively constructed proposal distribution φ(θ) rather than from the prior in order to avoid simulating many data sets which are not accepted. Michael Gutmann ABC 34 / 47

35 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Basic idea of regression ABC The summary statistics t θ = T (y θ ) and θ have a joint distribution. Let t i be the summary statistics for simulated data y i = g(ω i, θ i ). We can learn a regression model between the summary statistics (covariates) and the parameters (response variables) θ i = f (t i ) + ξ i (14) where ξ i is the error term (zero mean random variable). The training data for the regression are typically tuples (θ i, t i ) produced by rejection-abc with some sufficiently large ɛ. Michael Gutmann ABC 35 / 47

36 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Basic idea of regression ABC Fitting the regression model to the training data (θ i, t i ) yields an estimated regression function ˆf and the residuals ˆξ i, ˆξ i = θ i ˆf (t i ) (15) Regression ABC consists in replacing θ i with θ i, θ i = ˆf (t o ) + ˆξ i = ˆf (t o ) + θ i ˆf (t i ) (16) Corresponds to an adjustment of θ i. If the relation between t and θ is learned correctly, the θ i correspond to samples from an approximation with ɛ = 0. Michael Gutmann ABC 36 / 47

37 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Basic idea of sequential Monte Carlo ABC We may modify the rejection ABC algorithm and use φ(θ) instead of the prior p θ. 1. Sample θ i φ(θ) 2. Simulate a data set y i by running the simulator with θ i (y i = g(ω i, θ i )) 3. Compute the discrepancy i = d(t (y o ), T (y i )) 4. Retain θ i if i ɛ The retained samples follow a distribution proportional to φ(θ) L ɛ (θ) Michael Gutmann ABC 37 / 47

38 Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC Basic idea of sequential Monte Carlo ABC Parameters θ i weighted with w i, w i = p θ(θ i ) φ(θ i ), (17) follow a distribution proportional to p θ (θ) L ɛ (θ). Can be used to iteratively morph the prior into a posterior: Use a sequence of shrinking thresholds ɛt Run rejection ABC with ɛ0. Define φ t at iteration t based on the weighted samples from the previous iteration (e.g Gaussian mixture with means equal to the θ i from the previous iteration). More efficient than rejection ABC: φ t (θ) is close to the approximate posterior in the final iterations. Michael Gutmann ABC 38 / 47

39 Standard algorithms Recent developments Bayesian optimization for ABC Application Another approach Evaluating θ is computationally costly. We are only interested in small θ (thresholding!) We could increase the computational efficiency by evaluating θ predominantly where it tends to be small. Use a combination of probabilistic modeling of θ and optimization to figure out where to evaluate θ , 0.9 quantiles mean realization of stochastic process realizations at theta 10 objective theta Michael Gutmann ABC 39 / 47

40 Standard algorithms Recent developments Bayesian optimization for ABC Application Learning a model of the discrepancy The approximate likelihood function L ɛ (θ) is determined by the distribution of the discrepancy θ L ɛ(θ) Pr ( θ ɛ θ) If we new the distribution of θ we could compute L ɛ (θ). In recent work, we proposed to learn a model of θ and to approximate L ɛ (θ) by ˆL ɛ (θ), L ɛ (θ) Pr ( θ ɛ θ), (18) where Pr is the probability under the model of θ. (Gutmann and Corander, Journal of Machine Learning Research, in press, 2015) Model is learned more accurately in regions where θ tends to be small, using techniques from Bayesian optimization. Michael Gutmann ABC 40 / 47

41 Bayesian optimization discrepancy discrepancy Acquisition function theta theta Where to evaluate next Where to evaluate next Start: 1 data point Belief based on 1 data point 3 data points Updated belief Acquisition function Target theta Where to evaluate next discrepancy Current belief Updated belief 2 data points Updated belief Exploration vs exploitation Bayes theorem Acquisition of new data Michael Gutmann ABC 41 / 47

Strain Application to epidemiology of infectious diseases Inference about bacterial transmission dynamics in child day care centers (Numminen et al, Biometrics, 2013) 5 10 Strain 15 20 25 30 5 10 15

42 Strain Application to epidemiology of infectious diseases Inference about bacterial transmission dynamics in child day care centers (Numminen et al, Biometrics, 2013) 5 10 Strain Individual Individual 5 10 Strain Individual 10 Strain Strain Time Individual Individual Michael Gutmann ABC 42 / 47

43 Standard algorithms Recent developments Bayesian optimization for ABC Application Application to epidemiology of infectious diseases Data: Colonization states of sampled attendees of 29 child day care centers (DCCs) in Oslo greater area Strain Individual Example data from a DCC. Each square indicates an attendee colonized with a strain of the bacterium Streptococcus pneumoniae. Michael Gutmann ABC 43 / 47

44 Standard algorithms Recent developments Bayesian optimization for ABC Application Application to epidemiology of infectious diseases Simulator-based model: latent continuous-time Markov chain for the transmission dynamics in a DCC and an observation model (Numminen et, Biometrics, 2013). The model has three parameters: β: rate of infections within a DCC Λ: rate of infections outside a DCC θ: possibility to be infected with multiple strains Likelihood is intractable (data at a single time point are available only). Michael Gutmann ABC 44 / 47

45 Standard algorithms Recent developments Bayesian optimization for ABC Application Application to epidemiology of infectious diseases Comparison of the model-based approach with a sequential/population Monte Carlo ABC approach. Roughly equal results using 1000 times fewer simulations. The minimizer of the regression function under the model does not involve choosing a threshold ɛ. Posterior means: solid lines with markers, credibility intervals: shaded areas or dashed lines. β minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) Log 10 number of simulated data sets Michael Gutmann ABC 45 / 47

46 Standard algorithms Recent developments Bayesian optimization for ABC Application Application to epidemiology of infectious diseases Comparison of the model-based approach with a sequential/population Monte Carlo ABC approach. Λ minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) θ minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) Log 10 number of simulated data sets Log 10 number of simulated data sets Posterior means are shown as solid lines with markers, credibility intervals as shaded areas or dashed lines. Michael Gutmann ABC 46 / 47

47 Summary The topic was Bayesian inference for models specified via a simulator (implicit / generative models). Introduced approximate Bayesian computation (ABC). Principle of ABC: Find parameter values which yield simulated data resembling the observed data. Covered three classical algorithms: 1. Rejection ABC 2. Regression ABC 3. Sequential Monte Carlo ABC Introduced recent work which uses Bayesian optimization to increase the efficiency of the inference. Not covered: How to choose the summary statistics / the discrepancy measure between simulated and observed data. Michael Gutmann ABC 47 / 47

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016