Bayesian Inference by Density Ratio Estimation

Similar documents
Efficient Likelihood-Free Inference

Tutorial on Approximate Bayesian Computation

arxiv: v4 [stat.ml] 18 Dec 2018

Fast Likelihood-Free Inference via Bayesian Optimization

Approximate Bayesian Computation

Estimating Unnormalised Models by Score Matching

Intractable Likelihood Functions

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

Bayesian Machine Learning

Notes on Noise Contrastive Estimation (NCE)

Estimating Unnormalized models. Without Numerical Integration

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Density Estimation. Seungjin Choi

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

CSC321 Lecture 18: Learning Probabilistic Models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Outline Lecture 2 2(32)

Variational Inference via Stochastic Backpropagation

Bayesian Regression Linear and Logistic Regression

Part 1: Expectation Propagation

Bayesian Learning (II)

Can we do statistical inference in a non-asymptotic way? 1

Auto-Encoding Variational Bayes

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Variational Autoencoders

Notes on pseudo-marginal methods, variational Bayes and ABC

Topic 12 Overview of Estimation

Lecture : Probabilistic Machine Learning

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bayesian Deep Learning

Variational Autoencoder

an introduction to bayesian inference

Adversarial Sequential Monte Carlo

Bayesian Machine Learning - Lecture 7

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Kernel methods, kernel SVM and ridge regression

arxiv: v3 [stat.co] 3 Mar 2017

Variational Bayesian Logistic Regression

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Lecture 16 Deep Neural Generative Models

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Fast Dimension-Reduced Climate Model Calibration and the Effect of Data Aggregation

17 : Optimization and Monte Carlo Methods

Approximate Inference Part 1 of 2

Estimation theory and information geometry based on denoising

Bayesian Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Posterior Regularization

Approximate Inference Part 1 of 2

Time Series and Dynamic Models

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Gaussian processes for inference in stochastic differential equations

CPSC 540: Machine Learning

Introduction to Statistical Learning Theory

Stat 451 Lecture Notes Monte Carlo Integration

Probabilistic Graphical Models

Lecture 13 : Variational Inference: Mean Field Approximation

Bayesian Machine Learning

Statistical Data Mining and Machine Learning Hilary Term 2016

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Variational Autoencoders (VAEs)

Why (Bayesian) inference makes (Differential) privacy easy

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Probabilistic Graphical Models & Applications

LDA with Amortized Inference

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme

Chapter 20. Deep Generative Models

Week 5: Logistic Regression & Neural Networks

From independent component analysis to score matching

Bayesian Inference and MCMC

Recurrent Latent Variable Networks for Session-Based Recommendation

Introduction to Bayesian inference

Tutorial on Probabilistic Programming with PyMC3

arxiv: v1 [stat.ml] 6 Dec 2018

Learning the hyper-parameters. Luca Martino

Probabilistic Reasoning in Deep Learning

Variational inference

A variational radial basis function approximation for diffusion processes

ESTIMATION THEORY AND INFORMATION GEOMETRY BASED ON DENOISING. Aapo Hyvärinen

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Bayesian Methods for Machine Learning

Ch 4. Linear Models for Classification

Robotics. Lecture 4: Probabilistic Robotics. See course website for up to date information.

Probabilistic & Unsupervised Learning

Statistical learning. Chapter 20, Sections 1 3 1

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Parametric Techniques Lecture 3

GWAS V: Gaussian processes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transcription:

Bayesian Inference by Density Ratio Estimation Michael Gutmann https://sites.google.com/site/michaelgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 20th June 2017

Task Perform Bayesian inference for models where 1. the likelihood function is too costly to compute 2. sampling simulating data from the model is possible Michael Gutmann Bayesian Inference by Density Ratio Estimation 2 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 3 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 4 / 40

Simulator-based models Goal: Inference for models that are specified by a mechanism for generating data e.g. stochastic dynamical systems e.g. computer models / simulators of some complex physical or biological process Such models occur in multiple and diverse scientific fields. Different communities use different names: Simulator-based models Stochastic simulation models Implicit models Generative (latent-variable) models Probabilistic programs Michael Gutmann Bayesian Inference by Density Ratio Estimation 5 / 40

Strain Examples Simulator-based models are widely used: Evolutionary biology: Simulating evolution Neuroscience: Simulating neural circuits Ecology: Simulating species migration Health science: Simulating the spread of an infectious disease 5 Strain 10 15 20 25 30 5 10 15 20 25 30 35 Individual 30 Individual 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30 35 30 5 10 15 20 25 Time 5 10 15 20 25 30 35 30 5 10 15 20 25 30 35 Sampled individuals Michael Gutmann Bayesian Inference by Density Ratio Estimation 6 / 40

Definition of simulator-based models Let (Ω, F, P) be a probability space. A simulator-based model is a collection of (measurable) functions g(., θ) parametrised by θ, ω Ω x θ = g(ω, θ) X (1) For any fixed θ, x θ = g(., θ) is a random variable. Simulation / Sampling Michael Gutmann Bayesian Inference by Density Ratio Estimation 7 / 40

Implicit definition of the model pdfs A A Michael Gutmann Bayesian Inference by Density Ratio Estimation 8 / 40

Advantages of simulator-based models Direct implementation of hypotheses of how the observed data were generated. Neat interface with scientific models (e.g. from physics or biology). Modelling by replicating the mechanisms of nature that produced the observed/measured data. ( Analysis by synthesis ) Possibility to perform experiments in silico. Michael Gutmann Bayesian Inference by Density Ratio Estimation 9 / 40

Disadvantages of simulator-based models Generally elude analytical treatment. Can be easily made more complicated than necessary. Statistical inference is difficult. Michael Gutmann Bayesian Inference by Density Ratio Estimation 10 / 40

Disadvantages of simulator-based models Generally elude analytical treatment. Can be easily made more complicated than necessary. Statistical inference is difficult. Main reason: Likelihood function is intractable Michael Gutmann Bayesian Inference by Density Ratio Estimation 10 / 40

The likelihood function L(θ) Probability that the model generates data like x o when using parameter value θ Generally well defined but intractable for simulator-based / implicit models Data source Observation Data space Unknown properties M(θ) Model Data generation Michael Gutmann Bayesian Inference by Density Ratio Estimation 11 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 12 / 40

Three foundational issues 1. How should we assess whether x θ x o? 2. How should we compute the probability of the event x θ x o? 3. For which values of θ should we compute it? Data source Observation Data space Unknown properties M(θ) Model Data generation Likelihood: Probability that the model generates data like x o for parameter value θ Michael Gutmann Bayesian Inference by Density Ratio Estimation 13 / 40

Approximate Bayesian computation 1. How should we assess whether x θ x o? Check whether T (x θ ) T (x o ) ɛ 2. How should we compute the proba of the event x θ x o? By counting 3. For which values of θ should we compute it? Sample from the prior (or other proposal distributions) Michael Gutmann Bayesian Inference by Density Ratio Estimation 14 / 40

Approximate Bayesian computation 1. How should we assess whether x θ x o? Check whether T (x θ ) T (x o ) ɛ 2. How should we compute the proba of the event x θ x o? By counting 3. For which values of θ should we compute it? Sample from the prior (or other proposal distributions) Difficulties: Choice of T () and ɛ Typically high computational cost For recent review, see: Lintusaari et al (2017) Fundamentals and recent developments in approximate Bayesian computation, Systematic Biology Michael Gutmann Bayesian Inference by Density Ratio Estimation 14 / 40

Synthetic likelihood (Simon Wood, Nature, 2010) 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Compute summary statistics t θ = T (x θ ) Model their distribution as a Gaussian Compute likelihood function with T (x o ) as observed data 3. For which values of θ should we compute it? Use obtained synthetic likelihood function as part of a Monte Carlo method Michael Gutmann Bayesian Inference by Density Ratio Estimation 15 / 40

Synthetic likelihood (Simon Wood, Nature, 2010) 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Compute summary statistics t θ = T (x θ ) Model their distribution as a Gaussian Compute likelihood function with T (x o ) as observed data 3. For which values of θ should we compute it? Use obtained synthetic likelihood function as part of a Monte Carlo method Difficulties: Choice of T () Gaussianity assumption may not hold Typically high computational cost Michael Gutmann Bayesian Inference by Density Ratio Estimation 15 / 40

Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 16 / 40

Overview of some of my work 1. How should we assess whether x θ x o? Use classification (Gutmann et al, 2014, 2017) 2. How should we compute the proba of the event x θ x o? 3. For which values of θ should we compute it? Use Bayesian optimisation (Gutmann and Corander, 2013-2016) Compared to standard approaches: speed-up by a factor of 1000 more 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Use density ratio estimation (Dutta et al, 2016, arxiv:1611.10242) Michael Gutmann Bayesian Inference by Density Ratio Estimation 17 / 40

Overview of some of my work 1. How should we assess whether x θ x o? Use classification (Gutmann et al, 2014, 2017) 2. How should we compute the proba of the event x θ x o? 3. For which values of θ should we compute it? Use Bayesian optimisation (Gutmann and Corander, 2013-2016) Compared to standard approaches: speed-up by a factor of 1000 more 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Use density ratio estimation (Dutta et al, 2016, arxiv:1611.10242) Michael Gutmann Bayesian Inference by Density Ratio Estimation 18 / 40

Basic idea (Dutta et al, 2016, arxiv:1611.10242) Frame posterior estimation as ratio estimation problem p(θ x) = p(θ)p(x θ) p(x) r(x, θ) = p(x θ) p(x) = p(θ)r(x, θ) (2) (3) Estimating r(x, θ) is the difficult part since p(x θ) unknown. Estimate ˆr(x, θ) yields estimate of the likelihood function and posterior ˆL(θ) ˆr(x o, θ), ˆp(θ x o ) = p(θ)ˆr(x o, θ). (4) Michael Gutmann Bayesian Inference by Density Ratio Estimation 19 / 40

Estimating density ratios in general Relatively well studied problem (Textbook by Sugiyama et al, 2012) Bregman divergence provides general framework (Gutmann and Hirayama, 2011; Sugiyama et al, 2011) Here: density ratio estimation by logistic regression Michael Gutmann Bayesian Inference by Density Ratio Estimation 20 / 40

Density ratio estimation by logistic regression Samples from two data sets x (1) i p (1), i = 1,..., n (1) (5) x (2) i p (2), i = 1,..., n (2) (6) Probability that a test data point x was sampled from p (1) P(x p (1) x, h) = 1 1 + ν exp( h(x)), ν = n(2) n (1) (7) Michael Gutmann Bayesian Inference by Density Ratio Estimation 21 / 40

Density ratio estimation by logistic regression Estimate h by minimising J (h) = 1 n (1) n i=1 ( ) = h h (1) i x (1) i n = n (1) + n (2) [ ( log 1 + ν exp h (2) i ( = h h (1) i x (2) i ) )] n (2) + i=1 log [1 + 1ν ( exp h (2) i ) ] Objective is the re-scaled negated log-likelihood. For large n (1) and n (2) ĥ = argmin h J (h) = log p(1) p (2) without any constraints on h Michael Gutmann Bayesian Inference by Density Ratio Estimation 22 / 40

Estimating the posterior Property was used to estimate unnormalised models (Gutmann & Hyvärinen, 2010, 2012) It was used to estimate likelihood ratios (Pham et al, 2014; Cranmer et al, 2015) For posterior estimation, we use data generating pdf p(x θ) for p (1) marginal p(x) for p (2) (Other choices for p(x) possible too) sample sizes entirely under our control Michael Gutmann Bayesian Inference by Density Ratio Estimation 23 / 40

Estimating the posterior Logistic regression gives (point-wise in θ) ĥ(x, θ) log p(x θ) p(x) = log r(x, θ) (8) Estimated posterior and likelihood function: ˆp(θ x o ) = p(θ) exp(ĥ(x o, θ)) ˆL(θ) exp(ĥ(x o, θ)) (9) Michael Gutmann Bayesian Inference by Density Ratio Estimation 24 / 40

Estimating the posterior p(θ) θ x 0 θ 1 θ 2... θ nm Model : p(x θ) x m 1 x m 2... x m nm x θ 1 x θ 2... x θ nθ X m X θ Logistic regression: ĥ = arg min h J (h, θ) log-ratio ĥ(x, θ) ĥ(x 0, θ) ) ˆp(θ x 0 ) = p(θ) exp (ĥ(x0, θ) (Dutta et al, 2016, arxiv:1611.10242) Michael Gutmann Bayesian Inference by Density Ratio Estimation 25 / 40

Auxiliary model We need to specify a model for h. For simplicity: linear model b h(x) = β i ψ i (x) = β ψ(x) (10) i=1 where ψ i (x) are summary statistics More complex models possible Michael Gutmann Bayesian Inference by Density Ratio Estimation 26 / 40

Exponential family approximation Logistic regression yields ĥ(x; θ) = ˆβ(θ) ψ(x), ˆr(x, θ) = exp( ˆβ(θ) ψ(x)) (11) Resulting posterior ˆp(θ x o ) = p(θ) exp( ˆβ(θ) ψ(x o )) (12) Implicit exponential family approximation of p(x θ) ˆr(x, θ) = ˆp(x θ) ˆp(x) (13) ˆp(x θ) = ˆp(x) exp( ˆβ(θ) ψ(x)) (14) Implicit because ˆp(x) never explicitly constructed. Michael Gutmann Bayesian Inference by Density Ratio Estimation 27 / 40

Remarks Vector of summary statistics ψ(x) should include a constant for normalisation of the pdf (log partition function) Normalising constant is estimated via the logistic regression Simple linear model leads to a generalisation of synthetic likelihood L 1 penalty on β for weighing and selecting summary statistics Michael Gutmann Bayesian Inference by Density Ratio Estimation 28 / 40

Application to ARCH model Model: x (t) = θ 1 x (t 1) + e (t) (15) e (t) = ξ 0.2 (t) + θ 2 (e (t 1) ) 2 (16) ξ (t) and e (0) independent standard normal r.v., x (0) = 0 100 time points Parameters: θ 1 ( 1, 1), θ 2 (0, 1) Uniform prior on θ 1, θ 2 Michael Gutmann Bayesian Inference by Density Ratio Estimation 29 / 40

Application to ARCH model Summary statistics: auto-correlations with lag one to five all (unique) pairwise combinations of them a constant To check robustness: 50% irrelevant summary statistics (drawn from standard normal) Comparison with synthetic likelihood with equivalent set of summary statistics (relevant sum. stats. only) Michael Gutmann Bayesian Inference by Density Ratio Estimation 30 / 40

Example posterior 1 1 θ2 0.5 θ2 0.5 true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (a) synthetic likelihood true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (b) proposed method Michael Gutmann Bayesian Inference by Density Ratio Estimation 31 / 40

Example posterior 1 1 θ2 0.5 θ2 0.5 true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (c) synthetic likelihood true posterior estimated posterior 0-1 -0.5 0 0.5 1 θ 1 (d) proposed method subject to noise Michael Gutmann Bayesian Inference by Density Ratio Estimation 32 / 40

Systematic analysis Jensen-Shannon div between estimated and true posterior Point-wise comparison with synthetic likelihood (100 data sets) JSD = JSD for proposed method JSD for synthetic likelihood 0.3 0.25 without noise with noise 0.2 density 0.15 0.1 0.05 82% 18% 83% 17% 0-10 -5 0 5 10 JSD Michael Gutmann Bayesian Inference by Density Ratio Estimation 33 / 40

Application to Ricker model Model x (t) N (t), φ Poisson(φN (t) ) (17) log N (t) = log r + log N (t 1) N (t 1) + σe (t) (18) t = 1,..., 50, N (0) = 0 (19) Parameters and priors log growth rate log r U(3, 5) scaling parameter φ U(5, 15) standard deviation σ U(0, 0.6) Michael Gutmann Bayesian Inference by Density Ratio Estimation 34 / 40

Application to Ricker model Summary statistics: same as Simon Wood (Nature, 2010) 100 inference problems For each problem, relative errors in posterior means were computed Point-wise comparison with synthetic likelihood Michael Gutmann Bayesian Inference by Density Ratio Estimation 35 / 40

Results for log r rel error = rel error proposed method rel error synth likelihood 20 15 density 10 5 67% 33% 0-0.1-0.05 0 0.05 0.1 rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 36 / 40

Results for σ rel error = rel error proposed method rel error synth likelihood 2 density 1.5 1 0.5 59% 41% 0-0.6-0.4-0.2 0 0.2 0.4 rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 37 / 40

Results for φ rel error = rel error proposed method rel error synth likelihood 8 6 density 4 2 55% 45% 0-0.3-0.2-0.1 0 0.1 0.2 rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 38 / 40

Observations Compared two auxiliary models: exponential vs Gaussian family For same summary statistics, typically more accurate inferences for the richer exponential family model Robustness to irrelevant summary statistics thanks to L 1 regularisation Michael Gutmann Bayesian Inference by Density Ratio Estimation 39 / 40

Conclusions Background and previous work on inference with simulator-based / implicit statistical models Our work on: Framing the posterior estimation problem as a density ratio estimation problem Estimating the ratio with logistic regression Using regularisation to automatically select summary statistics Multitude of research possibilities: Choice of the auxiliary model Choice of the loss function used to estimate the density ratio Combine with Bayesian optimisation framework to reduce computational cost Michael Gutmann Bayesian Inference by Density Ratio Estimation 40 / 40

Conclusions Background and previous work on inference with simulator-based / implicit statistical models Our work on: Framing the posterior estimation problem as a density ratio estimation problem Estimating the ratio with logistic regression Using regularisation to automatically select summary statistics Multitude of research possibilities: Choice of the auxiliary model Choice of the loss function used to estimate the density ratio Combine with Bayesian optimisation framework to reduce computational cost More results and details in arxiv:1611.10242v1 Michael Gutmann Bayesian Inference by Density Ratio Estimation 40 / 40