Fast Likelihood-Free Inference via Bayesian Optimization

Similar documents
Efficient Likelihood-Free Inference

Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation

Bayesian Inference by Density Ratio Estimation

arxiv: v3 [stat.co] 3 Mar 2017

STAT 518 Intro Student Presentation

Fundamentals and Recent Developments in Approximate Bayesian Computation

Bayesian Regression Linear and Logistic Regression

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

arxiv: v3 [stat.ml] 8 Aug 2018

arxiv: v4 [stat.ml] 18 Dec 2018

Approximate Bayesian Computation: a simulation based approach to inference

Density Estimation. Seungjin Choi

Approximate Bayesian Computation and Particle Filters

Bayesian Methods for Machine Learning

Monte Carlo in Bayesian Statistics

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

STA 4273H: Sta-s-cal Machine Learning

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Nonparametric Bayesian Methods (Gaussian Processes)

New Insights into History Matching via Sequential Monte Carlo

Approximate Bayesian computation: an application to weak-lensing peak counts

Recent Advances in Bayesian Inference Techniques

Tutorial on ABC Algorithms

Estimating Unnormalized models. Without Numerical Integration

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Lecture 2: From Linear Regression to Kalman Filter and Beyond

ABC methods for phase-type distributions with applications in insurance risk problems

An introduction to Approximate Bayesian Computation methods

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Multistate Modeling and Applications

Probabilistic Machine Learning. Industrial AI Lab.

MCMC: Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

VCMC: Variational Consensus Monte Carlo

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Supplementary Note on Bayesian analysis

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Talk on Bayesian Optimization

On Markov chain Monte Carlo methods for tall data

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Statistical Inference for Stochastic Epidemic Models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian Computation

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Statistical Methods in Particle Physics

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

ABC random forest for parameter estimation. Jean-Michel Marin

Relevance Vector Machines

Introduction. Chapter 1

Assessment of uncertainty in computer experiments: from Universal Kriging to Bayesian Kriging. Céline Helbert, Delphine Dupuy and Laurent Carraro

MCMC for non-linear state space models using ensembles of latent sequences

Lecture 13 : Variational Inference: Mean Field Approximation

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Probabilistic KNN.

Kernel adaptive Sequential Monte Carlo

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Graphical Models for Collaborative Filtering

Markov Chain Monte Carlo methods

Bayesian Inference and MCMC

Part 1: Expectation Propagation

Probabilistic Graphical Models

Bayesian Phylogenetics:

CPSC 540: Machine Learning

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Variational Inference via Stochastic Backpropagation

Riemann Manifold Methods in Bayesian Statistics

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Intractable Likelihood Functions

Lecture 16 Deep Neural Generative Models

STA 4273H: Statistical Machine Learning

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Introduction to Gaussian Processes

STA414/2104 Statistical Methods for Machine Learning II

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Introduction to Markov Chain Monte Carlo

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Notes on pseudo-marginal methods, variational Bayes and ABC

Probabilistic numerics for deep learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Gaussian Mixture Model

An introduction to Bayesian statistics and model calibration and a host of related topics

Computational Genomics

Bayesian Deep Learning

Disease mapping with Gaussian processes

Evidence estimation for Markov random fields: a triply intractable problem

Scaling up Bayesian Inference

Principles of Bayesian Inference

Gaussian Process Optimization with Mutual Information

Non-Parametric Bayes

The Recycling Gibbs Sampler for Efficient Learning

an introduction to bayesian inference

Inference for partially observed stochastic dynamic systems: A new algorithm, its theory and applications

Approximate Bayesian computation: methods and applications for complex systems

Transcription:

Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology Joint work with Jukka Corander 17 May 2016

References For all the details: M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research, in press. http://arxiv.org/abs/1501.03291 Early results: Bayesian Optimization for Likelihood-Free Estimation Poster at ABC in Rome, 2013. Michael Gutmann Fast Likelihood-Free Inference 2 / 23

Likelihood-free inference Statistical inference for models where 1. the likelihood function is too costly to compute 2. sampling simulating data from the model is possible Michael Gutmann Fast Likelihood-Free Inference 3 / 23

Why does it matter? I Such models occur widely: I I I I I Astrophysics: Simulating the formation of galaxies, stars, or planets Evolutionary biology: Simulating the evolution of life Health science: Simulating the spread of an infectious disease... Enables inference for models with complex data generating mechanisms (e.g. scientific models) Michael Gutmann Dark matter density simulated by the Illustris collaboration (Figure from http://www.illustris-project.org) Fast Likelihood-Free Inference 4 / 23

Likelihood-free inference is an umbrella term There are several flavors of likelihood-free inference. In Bayesian setting e.g. Approximate Bayesian computation (ABC) (for review, see e.g. Marin et al, Statistics and Computing, 2012) Synthetic likelihood (Wood, Nature, 2010) General idea: Identify the values of the parameters of interest θ for which simulated data resemble the observed data Simulated data resemble the observed data if some discrepancy measure 0 is small. Here: Focus on ABC, see reference paper for more Michael Gutmann Fast Likelihood-Free Inference 5 / 23

Meta ABC algorithm Let y o be the observed data. Iterate many many times: 1. Sample θ from a proposal distribution q(θ) 2. Sample y θ according to the model 3. Compute discrepancy between y o and y 4. Retain θ if ɛ Michael Gutmann Fast Likelihood-Free Inference 6 / 23

Meta ABC algorithm Let y o be the observed data. Iterate many many times: 1. Sample θ from a proposal distribution q(θ) 2. Sample y θ according to the model 3. Compute discrepancy between y o and y 4. Retain θ if ɛ Different choices for q(θ) give different algorithms rejection ABC (Tavaré et al, 1997; Pritchard et al, 1999) MCMC ABC (Marjoram et al, 2003) Population Monte Carlo ABC (Sisson et al, 2007) ɛ: trade-off between statistical and computational performance Produces samples from an approximate posterior Michael Gutmann Fast Likelihood-Free Inference 6 / 23

Two major difficulties 1. How to measure the discrepancy 2. How to handle the computational cost Michael Gutmann Fast Likelihood-Free Inference 7 / 23

Two major difficulties 1. How to measure the discrepancy Use classification M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander Statistical Inference of Intractable Generative Models via Classification http://arxiv.org/abs/1407.4981 2. How to handle the computational cost Use Bayesian optimization M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research, in press. http://arxiv.org/abs/1501.03291 Michael Gutmann Fast Likelihood-Free Inference 8 / 23

Strain Example: Bacterial infections in child care centers Likelihood intractable for cross-sectional data But generating data from the model is possible Strain 5 10 15 20 Parameters of interest: - : rate of infections within a DCC - : rate of infections from outside - : competition between the strains 25 30 5 10 15 20 25 30 35 Individual Individual 5 10 Strain 15 20 5 25 10 30 15 5 5 1020 15 20 25 30 35 Individual 10 Strain 25 30 Strain 15 20 (Numminen et al, 2013) Time 5 10 15 20 25 30 35 25 Individual 30 5 10 15 20 25 30 35 Individual Michael Gutmann Fast Likelihood-Free Inference 9 / 23

Example: Bacterial infections in child care centers (Numminen et al, 2013) Data: Streptococcus pneumoniae colonization for 29 centers Inference with Population Monte Carlo ABC Reveals strong competition between different bacterial strains Expensive: 4.5 days on a cluster with 200 cores More than one million simulated data sets probability density function 18 16 14 12 10 8 6 4 2 strong Competition weak prior posterior 0 0 0.2 0.4 0.6 0.8 1 Competition parameter Michael Gutmann Fast Likelihood-Free Inference 10 / 23

Why is ABC so expensive? Let y o be the observed data. Building block of several ABC algorithms: 1. Sample θ from a proposal distribution q(θ) 2. Sample y θ according to the model 3. Compute discrepancy between y o and y 4. Retain θ if ɛ Previous work: focus on choice of proposal distribution Key bottleneck: presence of the rejection step small ɛ small acceptance probability Pr( ɛ θ) Michael Gutmann Fast Likelihood-Free Inference 11 / 23

How to make the rejection step disappear? Conditional acceptance probability corresponds to a likelihood approximation, L(θ) Pr ( ɛ θ) The conditional distribution of determines L(θ). If we knew the distribution of we could compute L(θ). Suggests an approach based on statistical modeling of. Michael Gutmann Fast Likelihood-Free Inference 12 / 23

Proposed approach 1. Model and estimate the distribution of Estimated model yields computable approximation ˆL(θ) ˆL(θ) Pr ( ɛ θ) Pr is probability under the estimated model. Data for estimation by sampling θ from the prior or from some other proposal distribution 2. Give priority to regions in the parameter space where discrepancy tends to be small. Prioritize modal regions of the likelihood/posterior Use Bayesian optimization to find the regions where tends to be small. Michael Gutmann Fast Likelihood-Free Inference 13 / 23

Bayesian optimization Set of methods to minimize black-box functions Basic idea: A probabilistic model of guides the selection of points θ where is next evaluated. Observed values of are used to update the model by Bayes theorem. When deciding where to evaluate, balance points where is believed to be small ( exploitation ) points where we are uncertain about ( exploration ) Michael Gutmann Fast Likelihood-Free Inference 14 / 23

Bayesian optimization 5 Model based on 2 data points 95% 6 Model based on 3 data points 0 mean 90% 80% 50% 5 4 3 distance -5 20% 10% 2 1-10 Acquisition function 5% 0-1 -2-15 0 0.05 0.1 0.15 0.2 6 Competition parameter Model based on 4 data points Next parameter to try -3 0 0.05 0.1 0.15 0.2 Competition parameter 5 4 Exploration vs exploitation distance 3 2 Model Data 1 0 Bayes' theorem -1 0 0.05 0.1 0.15 0.2 Competition parameter Michael Gutmann Fast Likelihood-Free Inference 15 / 23

Vanilla implementation Assume (log) discrepancy follows a Gaussian process model. Assume a squared exponential covariance function cov( θ, θ ) = k(θ, θ ), k(θ, θ ) = σf 2 exp 1 λ 2 (θ j θ j) 2. (1) j j Use lower confidence bound acquisition function (e.g. Cox and John, 1992; Srinivas et al, 2012) A t (θ) = µ t (θ) }{{} post mean η 2 t }{{} v t (θ) }{{} weight post var Possibly use stochastic acquisition rule: sample from Gaussian centered at argmin θ A t (θ) while respecting boundaries. (2) Michael Gutmann Fast Likelihood-Free Inference 16 / 23

Recipe for fast likelihood-free inference 1. Estimate a model of the discrepancy using Bayesian optimization 2. Choose threshold ɛ to obtain the likelihood approximation ˆL(θ) Pr ( ɛ θ) 3. MLE or posterior inference with any standard method, using ˆL in place of true likelihood function. Michael Gutmann Fast Likelihood-Free Inference 17 / 23

Strain Example: Bacterial infections in child care centers Likelihood intractable for cross-sectional data But generating data from the model is possible Strain 5 10 15 20 Parameters of interest: - : rate of infections within a DCC - : rate of infections from outside - : competition between the strains 25 30 5 10 15 20 25 30 35 Individual Individual 5 10 Strain 15 20 5 25 10 30 15 5 5 1020 15 20 25 30 35 Individual 10 Strain 25 30 Strain 15 20 (Numminen et al, 2013) Time 5 10 15 20 25 30 35 25 Individual 30 5 10 15 20 25 30 35 Individual Michael Gutmann Fast Likelihood-Free Inference 18 / 23

Inference results Comparison of the proposed approach with a population Monte Carlo (PMC) ABC approach. Roughly equal results using 1000 times fewer simulations. The minimizer of the regression function under the model does not involve choosing a threshold ɛ. Posterior means: solid lines with markers, credibility intervals: shaded areas or dashed lines. β 11 10 9 8 7 6 5 4 3 2 1 minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) 2 2.5 3 3.5 4 4.5 5 5.5 6 Log 10 number of simulated data sets Michael Gutmann Fast Likelihood-Free Inference 19 / 23

Inference results Comparison of the model-based approach with a population Monte Carlo (PMC) ABC approach. Λ 1.8 1.6 1.4 1.2 1 minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) θ 0.4 0.35 0.3 0.25 0.2 minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) 0.8 0.15 0.6 0.1 0.4 0.05 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 Log 10 number of simulated data sets Log 10 number of simulated data sets Posterior means are shown as solid lines with markers, credibility intervals as shaded areas or dashed lines. Michael Gutmann Fast Likelihood-Free Inference 20 / 23

Further benefits Enables inference for models which were out of reach till now model of evolution where simulating a single data set took us 12-24 hours (Marttinen et al, 2015) Allowed us to perform far more comprehensive data analysis than with standard approach (Numminen et al, 2016) Estimated ˆL(θ) can be used to assess parameter identifiability for complex models model about transmission dynamics of tuberculosis (Lintusaari et al, 2016) For point estimation, minimize Ê( θ) no thresholds required Michael Gutmann Fast Likelihood-Free Inference 21 / 23

Some open questions Modeling of the discrepancy: Vanilla GP-model worked surprisingly well but there are likely more suitable models. Exploration/exploitation trade-off: Can we find strategies which are optimal for parameter inference? Michael Gutmann Fast Likelihood-Free Inference 22 / 23

Summary Problem considered: Computational cost of likelihood-free inference Proposed approach: Combine optimization with modeling of the discrepancy between simulated and observed data Outcome: Approach increases the efficiency of the inference by several orders of magnitude Talk was on approximate Bayesian computation with uniform kernels. For other kernels and synthetic likelihood see M.U. Gutmann and J. Corander Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models, Journal of Machine Learning Research, in press. http://arxiv.org/abs/1501.03291 Michael Gutmann Fast Likelihood-Free Inference 23 / 23

Appendix Ricker model Details of the bacterial transmission model Michael Gutmann Fast Likelihood-Free Inference 1 / 6

Application to parameter inference in chaotic systems Data: Time series with counts y t (animal population size) Simulator-based model: Stochastic version of the Ricker map followed by an observation model log N t = log(r) + log N t 1 N t 1 + σe t, e t N (0, 1) y t N t, ϕ Poisson(ϕN t ) Parameters θ: log r (growth rate) σ (noise var), ϕ (scale parameter) 20 0 0 10 20 30 40 50 y t 180 160 140 120 100 80 60 40 time t Example data, θ o = (3.8, 0.3, 10). Michael Gutmann Fast Likelihood-Free Inference 2 / 6

Application to parameter inference in chaotic systems Speed up: 600 times fewer evaluations of the distance function. Slight shift in posterior mean towards the data generating parameter θ o (green circle) estimated posterior pdf 2.5 2 1.5 1 prior evals: 150 ratio: 667 evals: 200 ratio: 500 evals: 400 ratio: 250 evals: 500 ratio: 200 evals: 100000 (MCMC) estimated posterior pdf 0.7 0.6 0.5 0.4 0.3 0.2 prior evals: 150 ratio: 667 evals: 200 ratio: 500 evals: 400 ratio: 250 evals: 500 ratio: 200 evals: 100000 (MCMC) 0.5 0.1 0 3 3.5 4 4.5 5 log r 0 4 6 8 10 12 14 16 18 20 phi Comparison with results using MCMC (Wood, Nature, 2010) Michael Gutmann Fast Likelihood-Free Inference 3 / 6

Application to parameter inference in chaotic systems Speed up: 600 times fewer evaluations of the distance function. Slight shift in posterior mean towards the data generating parameter θ o (green circle) estimated posterior pdf 3.5 3 2.5 2 1.5 1 prior evals: 150 ratio: 667 evals: 200 ratio: 500 evals: 400 ratio: 250 evals: 500 ratio: 200 evals: 100000 (MCMC) 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 sigma Comparison with results using MCMC (Wood, Nature, 2010) Michael Gutmann Fast Likelihood-Free Inference 4 / 6

Bacterial transmission model (Numminen et al, 2013) Latent continuous time Markov chain for the transmissions inside a center Pr(I t+h is Pr(I t+h is Pr(I t+h is = 0 I t is = 1) = h + o(h) (3) = 1 I t is = 0 s ) = R s (t)h + o(h) (4) = 1 Iis t = 0, s : Iis t s(t)h + o(h) (5) R s (t) = βe s (t) + ΛP s (6) P s : infections from outside the group (static) E s (t) = i 1 N 1 I is t 1 n i (t) : infections from within the group n i (t) = s I t is : number of strains that individual i carries Observation model: Cross-sectional sampling at random time. Michael Gutmann Fast Likelihood-Free Inference 5 / 6

Distance measure used Summary statistics for each center: the diversity of the strains present the number of different strains present the proportion of infected individuals the proportion of individuals with more than one strain. Distance Distance between the empirical cumulative distribution functions (cdfs) of the four summary statistics. Michael Gutmann Fast Likelihood-Free Inference 6 / 6