Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology Joint work with Jukka Corander 17 May 2016

References For all the details: M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research, in press. http://arxiv.org/abs/1501.03291 Early results: Bayesian Optimization for Likelihood-Free Estimation Poster at ABC in Rome, 2013. Michael Gutmann Fast Likelihood-Free Inference 2 / 23

Likelihood-free inference Statistical inference for models where 1. the likelihood function is too costly to compute 2. sampling simulating data from the model is possible Michael Gutmann Fast Likelihood-Free Inference 3 / 23

Why does it matter? I Such models occur widely: I I I I I Astrophysics: Simulating the formation of galaxies, stars, or planets Evolutionary biology: Simulating the evolution of life Health science: Simulating the spread of an infectious disease... Enables inference for models with complex data generating mechanisms (e.g. scientific models) Michael Gutmann Dark matter density simulated by the Illustris collaboration (Figure from http://www.illustris-project.org) Fast Likelihood-Free Inference 4 / 23

Likelihood-free inference is an umbrella term There are several flavors of likelihood-free inference. In Bayesian setting e.g. Approximate Bayesian computation (ABC) (for review, see e.g. Marin et al, Statistics and Computing, 2012) Synthetic likelihood (Wood, Nature, 2010) General idea: Identify the values of the parameters of interest θ for which simulated data resemble the observed data Simulated data resemble the observed data if some discrepancy measure 0 is small. Here: Focus on ABC, see reference paper for more Michael Gutmann Fast Likelihood-Free Inference 5 / 23

Meta ABC algorithm Let y o be the observed data. Iterate many many times: 1. Sample θ from a proposal distribution q(θ) 2. Sample y θ according to the model 3. Compute discrepancy between y o and y 4. Retain θ if ɛ Different choices for q(θ) give different algorithms rejection ABC (Tavaré et al, 1997; Pritchard et al, 1999) MCMC ABC (Marjoram et al, 2003) Population Monte Carlo ABC (Sisson et al, 2007) ɛ: trade-off between statistical and computational performance Produces samples from an approximate posterior Michael Gutmann Fast Likelihood-Free Inference 6 / 23

Two major difficulties 1. How to measure the discrepancy 2. How to handle the computational cost Michael Gutmann Fast Likelihood-Free Inference 7 / 23

Two major difficulties 1. How to measure the discrepancy Use classification M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander Statistical Inference of Intractable Generative Models via Classification http://arxiv.org/abs/1407.4981 2. How to handle the computational cost Use Bayesian optimization M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research, in press. http://arxiv.org/abs/1501.03291 Michael Gutmann Fast Likelihood-Free Inference 8 / 23

Strain Example: Bacterial infections in child care centers Likelihood intractable for cross-sectional data But generating data from the model is possible Strain 5 10 15 20 Parameters of interest: - : rate of infections within a DCC - : rate of infections from outside - : competition between the strains 25 30 5 10 15 20 25 30 35 Individual Individual 5 10 Strain 15 20 5 25 10 30 15 5 5 1020 15 20 25 30 35 Individual 10 Strain 25 30 Strain 15 20 (Numminen et al, 2013) Time 5 10 15 20 25 30 35 25 Individual 30 5 10 15 20 25 30 35 Individual Michael Gutmann Fast Likelihood-Free Inference 9 / 23

Example: Bacterial infections in child care centers (Numminen et al, 2013) Data: Streptococcus pneumoniae colonization for 29 centers Inference with Population Monte Carlo ABC Reveals strong competition between different bacterial strains Expensive: 4.5 days on a cluster with 200 cores More than one million simulated data sets probability density function 18 16 14 12 10 8 6 4 2 strong Competition weak prior posterior 0 0 0.2 0.4 0.6 0.8 1 Competition parameter Michael Gutmann Fast Likelihood-Free Inference 10 / 23

Why is ABC so expensive? Let y o be the observed data. Building block of several ABC algorithms: 1. Sample θ from a proposal distribution q(θ) 2. Sample y θ according to the model 3. Compute discrepancy between y o and y 4. Retain θ if ɛ Previous work: focus on choice of proposal distribution Key bottleneck: presence of the rejection step small ɛ small acceptance probability Pr( ɛ θ) Michael Gutmann Fast Likelihood-Free Inference 11 / 23

How to make the rejection step disappear? Conditional acceptance probability corresponds to a likelihood approximation, L(θ) Pr ( ɛ θ) The conditional distribution of determines L(θ). If we knew the distribution of we could compute L(θ). Suggests an approach based on statistical modeling of. Michael Gutmann Fast Likelihood-Free Inference 12 / 23

Proposed approach 1. Model and estimate the distribution of Estimated model yields computable approximation ˆL(θ) ˆL(θ) Pr ( ɛ θ) Pr is probability under the estimated model. Data for estimation by sampling θ from the prior or from some other proposal distribution 2. Give priority to regions in the parameter space where discrepancy tends to be small. Prioritize modal regions of the likelihood/posterior Use Bayesian optimization to find the regions where tends to be small. Michael Gutmann Fast Likelihood-Free Inference 13 / 23

Bayesian optimization Set of methods to minimize black-box functions Basic idea: A probabilistic model of guides the selection of points θ where is next evaluated. Observed values of are used to update the model by Bayes theorem. When deciding where to evaluate, balance points where is believed to be small ( exploitation ) points where we are uncertain about ( exploration ) Michael Gutmann Fast Likelihood-Free Inference 14 / 23

Bayesian optimization 5 Model based on 2 data points 95% 6 Model based on 3 data points 0 mean 90% 80% 50% 5 4 3 distance -5 20% 10% 2 1-10 Acquisition function 5% 0-1 -2-15 0 0.05 0.1 0.15 0.2 6 Competition parameter Model based on 4 data points Next parameter to try -3 0 0.05 0.1 0.15 0.2 Competition parameter 5 4 Exploration vs exploitation distance 3 2 Model Data 1 0 Bayes' theorem -1 0 0.05 0.1 0.15 0.2 Competition parameter Michael Gutmann Fast Likelihood-Free Inference 15 / 23

Vanilla implementation Assume (log) discrepancy follows a Gaussian process model. Assume a squared exponential covariance function cov( θ, θ ) = k(θ, θ ), k(θ, θ ) = σf 2 exp 1 λ 2 (θ j θ j) 2. (1) j j Use lower confidence bound acquisition function (e.g. Cox and John, 1992; Srinivas et al, 2012) A t (θ) = µ t (θ) }{{} post mean η 2 t }{{} v t (θ) }{{} weight post var Possibly use stochastic acquisition rule: sample from Gaussian centered at argmin θ A t (θ) while respecting boundaries. (2) Michael Gutmann Fast Likelihood-Free Inference 16 / 23

Recipe for fast likelihood-free inference 1. Estimate a model of the discrepancy using Bayesian optimization 2. Choose threshold ɛ to obtain the likelihood approximation ˆL(θ) Pr ( ɛ θ) 3. MLE or posterior inference with any standard method, using ˆL in place of true likelihood function. Michael Gutmann Fast Likelihood-Free Inference 17 / 23

Inference results Comparison of the proposed approach with a population Monte Carlo (PMC) ABC approach. Roughly equal results using 1000 times fewer simulations. The minimizer of the regression function under the model does not involve choosing a threshold ɛ. Posterior means: solid lines with markers, credibility intervals: shaded areas or dashed lines. β 11 10 9 8 7 6 5 4 3 2 1 minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) 2 2.5 3 3.5 4 4.5 5 5.5 6 Log 10 number of simulated data sets Michael Gutmann Fast Likelihood-Free Inference 19 / 23

Inference results Comparison of the model-based approach with a population Monte Carlo (PMC) ABC approach. Λ 1.8 1.6 1.4 1.2 1 minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) θ 0.4 0.35 0.3 0.25 0.2 minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) 0.8 0.15 0.6 0.1 0.4 0.05 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 Log 10 number of simulated data sets Log 10 number of simulated data sets Posterior means are shown as solid lines with markers, credibility intervals as shaded areas or dashed lines. Michael Gutmann Fast Likelihood-Free Inference 20 / 23

Further benefits Enables inference for models which were out of reach till now model of evolution where simulating a single data set took us 12-24 hours (Marttinen et al, 2015) Allowed us to perform far more comprehensive data analysis than with standard approach (Numminen et al, 2016) Estimated ˆL(θ) can be used to assess parameter identifiability for complex models model about transmission dynamics of tuberculosis (Lintusaari et al, 2016) For point estimation, minimize Ê( θ) no thresholds required Michael Gutmann Fast Likelihood-Free Inference 21 / 23

Some open questions Modeling of the discrepancy: Vanilla GP-model worked surprisingly well but there are likely more suitable models. Exploration/exploitation trade-off: Can we find strategies which are optimal for parameter inference? Michael Gutmann Fast Likelihood-Free Inference 22 / 23

Summary Problem considered: Computational cost of likelihood-free inference Proposed approach: Combine optimization with modeling of the discrepancy between simulated and observed data Outcome: Approach increases the efficiency of the inference by several orders of magnitude Talk was on approximate Bayesian computation with uniform kernels. For other kernels and synthetic likelihood see M.U. Gutmann and J. Corander Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models, Journal of Machine Learning Research, in press. http://arxiv.org/abs/1501.03291 Michael Gutmann Fast Likelihood-Free Inference 23 / 23

Appendix Ricker model Details of the bacterial transmission model Michael Gutmann Fast Likelihood-Free Inference 1 / 6

Application to parameter inference in chaotic systems Data: Time series with counts y t (animal population size) Simulator-based model: Stochastic version of the Ricker map followed by an observation model log N t = log(r) + log N t 1 N t 1 + σe t, e t N (0, 1) y t N t, ϕ Poisson(ϕN t ) Parameters θ: log r (growth rate) σ (noise var), ϕ (scale parameter) 20 0 0 10 20 30 40 50 y t 180 160 140 120 100 80 60 40 time t Example data, θ o = (3.8, 0.3, 10). Michael Gutmann Fast Likelihood-Free Inference 2 / 6

Application to parameter inference in chaotic systems Speed up: 600 times fewer evaluations of the distance function. Slight shift in posterior mean towards the data generating parameter θ o (green circle) estimated posterior pdf 2.5 2 1.5 1 prior evals: 150 ratio: 667 evals: 200 ratio: 500 evals: 400 ratio: 250 evals: 500 ratio: 200 evals: 100000 (MCMC) estimated posterior pdf 0.7 0.6 0.5 0.4 0.3 0.2 prior evals: 150 ratio: 667 evals: 200 ratio: 500 evals: 400 ratio: 250 evals: 500 ratio: 200 evals: 100000 (MCMC) 0.5 0.1 0 3 3.5 4 4.5 5 log r 0 4 6 8 10 12 14 16 18 20 phi Comparison with results using MCMC (Wood, Nature, 2010) Michael Gutmann Fast Likelihood-Free Inference 3 / 6

Application to parameter inference in chaotic systems Speed up: 600 times fewer evaluations of the distance function. Slight shift in posterior mean towards the data generating parameter θ o (green circle) estimated posterior pdf 3.5 3 2.5 2 1.5 1 prior evals: 150 ratio: 667 evals: 200 ratio: 500 evals: 400 ratio: 250 evals: 500 ratio: 200 evals: 100000 (MCMC) 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 sigma Comparison with results using MCMC (Wood, Nature, 2010) Michael Gutmann Fast Likelihood-Free Inference 4 / 6

Bacterial transmission model (Numminen et al, 2013) Latent continuous time Markov chain for the transmissions inside a center Pr(I t+h is Pr(I t+h is Pr(I t+h is = 0 I t is = 1) = h + o(h) (3) = 1 I t is = 0 s ) = R s (t)h + o(h) (4) = 1 Iis t = 0, s : Iis t s(t)h + o(h) (5) R s (t) = βe s (t) + ΛP s (6) P s : infections from outside the group (static) E s (t) = i 1 N 1 I is t 1 n i (t) : infections from within the group n i (t) = s I t is : number of strains that individual i carries Observation model: Cross-sectional sampling at random time. Michael Gutmann Fast Likelihood-Free Inference 5 / 6

Distance measure used Summary statistics for each center: the diversity of the strains present the number of different strains present the proportion of infected individuals the proportion of individuals with more than one strain. Distance Distance between the empirical cumulative distribution functions (cdfs) of the four summary statistics. Michael Gutmann Fast Likelihood-Free Inference 6 / 6