Bayesian Inference by Density Ratio Estimation

Bayesian Inference by Density Ratio Estimation Michael Gutmann https://sites.google.com/site/michaelgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 20th June 2017

Task Perform Bayesian inference for models where 1. the likelihood function is too costly to compute 2. sampling simulating data from the model is possible Michael Gutmann Bayesian Inference by Density Ratio Estimation 2 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 3 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 4 / 40

Simulator-based models Goal: Inference for models that are specified by a mechanism for generating data e.g. stochastic dynamical systems e.g. computer models / simulators of some complex physical or biological process Such models occur in multiple and diverse scientific fields. Different communities use different names: Simulator-based models Stochastic simulation models Implicit models Generative (latent-variable) models Probabilistic programs Michael Gutmann Bayesian Inference by Density Ratio Estimation 5 / 40

Strain Examples Simulator-based models are widely used: Evolutionary biology: Simulating evolution Neuroscience: Simulating neural circuits Ecology: Simulating species migration Health science: Simulating the spread of an infectious disease 5 Strain 10 15 20 25 30 5 10 15 20 25 30 35 Individual 30 Individual 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30 35 30 5 10 15 20 25 Time 5 10 15 20 25 30 35 30 5 10 15 20 25 30 35 Sampled individuals Michael Gutmann Bayesian Inference by Density Ratio Estimation 6 / 40

Definition of simulator-based models Let (Ω, F, P) be a probability space. A simulator-based model is a collection of (measurable) functions g(., θ) parametrised by θ, ω Ω x θ = g(ω, θ) X (1) For any fixed θ, x θ = g(., θ) is a random variable. Simulation / Sampling Michael Gutmann Bayesian Inference by Density Ratio Estimation 7 / 40

Implicit definition of the model pdfs A A Michael Gutmann Bayesian Inference by Density Ratio Estimation 8 / 40

Advantages of simulator-based models Direct implementation of hypotheses of how the observed data were generated. Neat interface with scientific models (e.g. from physics or biology). Modelling by replicating the mechanisms of nature that produced the observed/measured data. ( Analysis by synthesis ) Possibility to perform experiments in silico. Michael Gutmann Bayesian Inference by Density Ratio Estimation 9 / 40

Disadvantages of simulator-based models Generally elude analytical treatment. Can be easily made more complicated than necessary. Statistical inference is difficult. Michael Gutmann Bayesian Inference by Density Ratio Estimation 10 / 40

Disadvantages of simulator-based models Generally elude analytical treatment. Can be easily made more complicated than necessary. Statistical inference is difficult. Main reason: Likelihood function is intractable Michael Gutmann Bayesian Inference by Density Ratio Estimation 10 / 40

The likelihood function L(θ) Probability that the model generates data like x o when using parameter value θ Generally well defined but intractable for simulator-based / implicit models Data source Observation Data space Unknown properties M(θ) Model Data generation Michael Gutmann Bayesian Inference by Density Ratio Estimation 11 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 12 / 40

Three foundational issues 1. How should we assess whether x θ x o? 2. How should we compute the probability of the event x θ x o? 3. For which values of θ should we compute it? Data source Observation Data space Unknown properties M(θ) Model Data generation Likelihood: Probability that the model generates data like x o for parameter value θ Michael Gutmann Bayesian Inference by Density Ratio Estimation 13 / 40

Approximate Bayesian computation 1. How should we assess whether x θ x o? Check whether T (x θ ) T (x o ) ɛ 2. How should we compute the proba of the event x θ x o? By counting 3. For which values of θ should we compute it? Sample from the prior (or other proposal distributions) Difficulties: Choice of T () and ɛ Typically high computational cost For recent review, see: Lintusaari et al (2017) Fundamentals and recent developments in approximate Bayesian computation, Systematic Biology Michael Gutmann Bayesian Inference by Density Ratio Estimation 14 / 40

Synthetic likelihood (Simon Wood, Nature, 2010) 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Compute summary statistics t θ = T (x θ ) Model their distribution as a Gaussian Compute likelihood function with T (x o ) as observed data 3. For which values of θ should we compute it? Use obtained synthetic likelihood function as part of a Monte Carlo method Michael Gutmann Bayesian Inference by Density Ratio Estimation 15 / 40

Synthetic likelihood (Simon Wood, Nature, 2010) 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Compute summary statistics t θ = T (x θ ) Model their distribution as a Gaussian Compute likelihood function with T (x o ) as observed data 3. For which values of θ should we compute it? Use obtained synthetic likelihood function as part of a Monte Carlo method Difficulties: Choice of T () Gaussianity assumption may not hold Typically high computational cost Michael Gutmann Bayesian Inference by Density Ratio Estimation 15 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 16 / 40

Overview of some of my work 1. How should we assess whether x θ x o? Use classification (Gutmann et al, 2014, 2017) 2. How should we compute the proba of the event x θ x o? 3. For which values of θ should we compute it? Use Bayesian optimisation (Gutmann and Corander, 2013-2016) Compared to standard approaches: speed-up by a factor of 1000 more 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Use density ratio estimation (Dutta et al, 2016, arxiv:1611.10242) Michael Gutmann Bayesian Inference by Density Ratio Estimation 17 / 40

Basic idea (Dutta et al, 2016, arxiv:1611.10242) Frame posterior estimation as ratio estimation problem p(θ x) = p(θ)p(x θ) p(x) r(x, θ) = p(x θ) p(x) = p(θ)r(x, θ) (2) (3) Estimating r(x, θ) is the difficult part since p(x θ) unknown. Estimate ˆr(x, θ) yields estimate of the likelihood function and posterior ˆL(θ) ˆr(x o, θ), ˆp(θ x o ) = p(θ)ˆr(x o, θ). (4) Michael Gutmann Bayesian Inference by Density Ratio Estimation 19 / 40

Estimating density ratios in general Relatively well studied problem (Textbook by Sugiyama et al, 2012) Bregman divergence provides general framework (Gutmann and Hirayama, 2011; Sugiyama et al, 2011) Here: density ratio estimation by logistic regression Michael Gutmann Bayesian Inference by Density Ratio Estimation 20 / 40

Density ratio estimation by logistic regression Samples from two data sets x (1) i p (1), i = 1,..., n (1) (5) x (2) i p (2), i = 1,..., n (2) (6) Probability that a test data point x was sampled from p (1) P(x p (1) x, h) = 1 1 + ν exp( h(x)), ν = n(2) n (1) (7) Michael Gutmann Bayesian Inference by Density Ratio Estimation 21 / 40

Density ratio estimation by logistic regression Estimate h by minimising J (h) = 1 n (1) n i=1 ( ) = h h (1) i x (1) i n = n (1) + n (2) [ ( log 1 + ν exp h (2) i ( = h h (1) i x (2) i ) )] n (2) + i=1 log [1 + 1ν ( exp h (2) i ) ] Objective is the re-scaled negated log-likelihood. For large n (1) and n (2) ĥ = argmin h J (h) = log p(1) p (2) without any constraints on h Michael Gutmann Bayesian Inference by Density Ratio Estimation 22 / 40

Estimating the posterior Property was used to estimate unnormalised models (Gutmann & Hyvärinen, 2010, 2012) It was used to estimate likelihood ratios (Pham et al, 2014; Cranmer et al, 2015) For posterior estimation, we use data generating pdf p(x θ) for p (1) marginal p(x) for p (2) (Other choices for p(x) possible too) sample sizes entirely under our control Michael Gutmann Bayesian Inference by Density Ratio Estimation 23 / 40

Estimating the posterior Logistic regression gives (point-wise in θ) ĥ(x, θ) log p(x θ) p(x) = log r(x, θ) (8) Estimated posterior and likelihood function: ˆp(θ x o ) = p(θ) exp(ĥ(x o, θ)) ˆL(θ) exp(ĥ(x o, θ)) (9) Michael Gutmann Bayesian Inference by Density Ratio Estimation 24 / 40

Estimating the posterior p(θ) θ x 0 θ 1 θ 2... θ nm Model : p(x θ) x m 1 x m 2... x m nm x θ 1 x θ 2... x θ nθ X m X θ Logistic regression: ĥ = arg min h J (h, θ) log-ratio ĥ(x, θ) ĥ(x 0, θ) ) ˆp(θ x 0 ) = p(θ) exp (ĥ(x0, θ) (Dutta et al, 2016, arxiv:1611.10242) Michael Gutmann Bayesian Inference by Density Ratio Estimation 25 / 40

Auxiliary model We need to specify a model for h. For simplicity: linear model b h(x) = β i ψ i (x) = β ψ(x) (10) i=1 where ψ i (x) are summary statistics More complex models possible Michael Gutmann Bayesian Inference by Density Ratio Estimation 26 / 40

Exponential family approximation Logistic regression yields ĥ(x; θ) = ˆβ(θ) ψ(x), ˆr(x, θ) = exp( ˆβ(θ) ψ(x)) (11) Resulting posterior ˆp(θ x o ) = p(θ) exp( ˆβ(θ) ψ(x o )) (12) Implicit exponential family approximation of p(x θ) ˆr(x, θ) = ˆp(x θ) ˆp(x) (13) ˆp(x θ) = ˆp(x) exp( ˆβ(θ) ψ(x)) (14) Implicit because ˆp(x) never explicitly constructed. Michael Gutmann Bayesian Inference by Density Ratio Estimation 27 / 40

Remarks Vector of summary statistics ψ(x) should include a constant for normalisation of the pdf (log partition function) Normalising constant is estimated via the logistic regression Simple linear model leads to a generalisation of synthetic likelihood L 1 penalty on β for weighing and selecting summary statistics Michael Gutmann Bayesian Inference by Density Ratio Estimation 28 / 40

Application to ARCH model Model: x (t) = θ 1 x (t 1) + e (t) (15) e (t) = ξ 0.2 (t) + θ 2 (e (t 1) ) 2 (16) ξ (t) and e (0) independent standard normal r.v., x (0) = 0 100 time points Parameters: θ 1 ( 1, 1), θ 2 (0, 1) Uniform prior on θ 1, θ 2 Michael Gutmann Bayesian Inference by Density Ratio Estimation 29 / 40

Application to ARCH model Summary statistics: auto-correlations with lag one to five all (unique) pairwise combinations of them a constant To check robustness: 50% irrelevant summary statistics (drawn from standard normal) Comparison with synthetic likelihood with equivalent set of summary statistics (relevant sum. stats. only) Michael Gutmann Bayesian Inference by Density Ratio Estimation 30 / 40

Example posterior 1 1 θ2 0.5 θ2 0.5 true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (a) synthetic likelihood true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (b) proposed method Michael Gutmann Bayesian Inference by Density Ratio Estimation 31 / 40

Example posterior 1 1 θ2 0.5 θ2 0.5 true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (c) synthetic likelihood true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (d) proposed method subject to noise Michael Gutmann Bayesian Inference by Density Ratio Estimation 32 / 40

Systematic analysis Jensen-Shannon div between estimated and true posterior Point-wise comparison with synthetic likelihood (100 data sets) JSD = JSD for proposed method JSD for synthetic likelihood 0.3 0.25 without noise with noise 0.2 density 0.15 0.1 0.05 82% 18% 83% 17% 0-10 -5 0 5 10 JSD Michael Gutmann Bayesian Inference by Density Ratio Estimation 33 / 40

Application to Ricker model Model x (t) N (t), φ Poisson(φN (t) ) (17) log N (t) = log r + log N (t 1) N (t 1) + σe (t) (18) t = 1,..., 50, N (0) = 0 (19) Parameters and priors log growth rate log r U(3, 5) scaling parameter φ U(5, 15) standard deviation σ U(0, 0.6) Michael Gutmann Bayesian Inference by Density Ratio Estimation 34 / 40

Application to Ricker model Summary statistics: same as Simon Wood (Nature, 2010) 100 inference problems For each problem, relative errors in posterior means were computed Point-wise comparison with synthetic likelihood Michael Gutmann Bayesian Inference by Density Ratio Estimation 35 / 40

Results for log r rel error = rel error proposed method rel error synth likelihood 20 15 density 10 5 67% 33% 0-0.1-0.05 0 0.05 0.1 rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 36 / 40

Results for σ rel error = rel error proposed method rel error synth likelihood 2 density 1.5 1 0.5 59% 41% 0-0.6-0.4-0.2 0 0.2 0.4 rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 37 / 40

Results for φ rel error = rel error proposed method rel error synth likelihood 8 6 density 4 2 55% 45% 0-0.3-0.2-0.1 0 0.1 0.2 rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 38 / 40

Observations Compared two auxiliary models: exponential vs Gaussian family For same summary statistics, typically more accurate inferences for the richer exponential family model Robustness to irrelevant summary statistics thanks to L 1 regularisation Michael Gutmann Bayesian Inference by Density Ratio Estimation 39 / 40

Conclusions Background and previous work on inference with simulator-based / implicit statistical models Our work on: Framing the posterior estimation problem as a density ratio estimation problem Estimating the ratio with logistic regression Using regularisation to automatically select summary statistics Multitude of research possibilities: Choice of the auxiliary model Choice of the loss function used to estimate the density ratio Combine with Bayesian optimisation framework to reduce computational cost Michael Gutmann Bayesian Inference by Density Ratio Estimation 40 / 40

Conclusions Background and previous work on inference with simulator-based / implicit statistical models Our work on: Framing the posterior estimation problem as a density ratio estimation problem Estimating the ratio with logistic regression Using regularisation to automatically select summary statistics Multitude of research possibilities: Choice of the auxiliary model Choice of the loss function used to estimate the density ratio Combine with Bayesian optimisation framework to reduce computational cost More results and details in arxiv:1611.10242v1 Michael Gutmann Bayesian Inference by Density Ratio Estimation 40 / 40