A Review of Pseudo-Marginal Markov Chain Monte Carlo

Similar documents
Notes on pseudo-marginal methods, variational Bayes and ABC

An ABC interpretation of the multiple auxiliary variable method

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Inexact approximations for doubly and triply intractable problems

Evidence estimation for Markov random fields: a triply intractable problem

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

A = {(x, u) : 0 u f(x)},

Controlled sequential Monte Carlo

Markov Chain Monte Carlo methods

Monte Carlo in Bayesian Statistics

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Inference in state-space models with multiple paths from conditional SMC

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Learning the hyper-parameters. Luca Martino

Pseudo-marginal MCMC methods for inference in latent variable models

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

STA 4273H: Statistical Machine Learning

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

Monte Carlo Inference Methods

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Principles of Bayesian Inference

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

An introduction to Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo

Bayesian Phylogenetics:

Kernel Sequential Monte Carlo

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Fully Bayesian Analysis of Calibration Uncertainty In High Energy Spectral Analysis

Adaptive HMC via the Infinite Exponential Family

Bayesian Inference and MCMC

SMC 2 : an efficient algorithm for sequential analysis of state-space models

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Slice sampling covariance hyperparameters of latent Gaussian models

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Density Estimation. Seungjin Choi

Markov Chain Monte Carlo Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Elliptical slice sampling

Bayesian Estimation with Sparse Grids

Riemann Manifold Methods in Bayesian Statistics

arxiv: v1 [stat.co] 2 Nov 2017

Approximate Slice Sampling for Bayesian Posterior Inference

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Lecture 7 and 8: Markov Chain Monte Carlo

20: Gaussian Processes

17 : Markov Chain Monte Carlo

MCMC algorithms for fitting Bayesian models

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

The Gaussian Process Density Sampler

Variational Scoring of Graphical Model Structures

Approximate Inference using MCMC

CPSC 540: Machine Learning

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

LECTURE 15 Markov chain Monte Carlo

arxiv: v1 [stat.co] 1 Jun 2015

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Markov Chain Monte Carlo

Markov chain Monte Carlo

Advanced MCMC methods

Lecture 6: Graphical Models: Learning

an introduction to bayesian inference

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015.

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Adaptive Monte Carlo methods

Bayesian GLMs and Metropolis-Hastings Algorithm

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC

Computer Practical: Metropolis-Hastings-based MCMC

Bayesian Linear Models

Linear-time Gibbs Sampling in Piecewise Graphical Models

Kernel Adaptive Metropolis-Hastings

Metropolis-Hastings Algorithm

Reminder of some Markov Chain properties:

Machine Learning Summer School

Bayesian Linear Models

Likelihood-free MCMC

Theory of Stochastic Processes 8. Markov chain Monte Carlo

DAG models and Markov Chain Monte Carlo methods a short overview

Graphical Models and Kernel Methods

Markov Chain Monte Carlo, Numerical Integration

Point spread function reconstruction from the image of a sharp edge

Advanced Statistical Modelling

On Markov chain Monte Carlo methods for tall data

Markov Chain Monte Carlo (MCMC)

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Calibration of Simulators with Structured Discretization Uncertainty


The Expectation-Maximization Algorithm

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

On the Pitfalls of Nested Monte Carlo

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

STAT 425: Introduction to Bayesian Analysis

Transcription:

A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016

Outline 1 Overview 2 Paper review 3 experiment 4 conclusion

Motivation & overview Notation: θ denotes the parameter of interests. Y denotes the observations. Standard task of Markov Chain Monte Carlo (MCMC): draw samples from posterior π(θ), where and p(y θ) can be estimated pointwise. π(θ) p(y θ)p(θ), (1) Pseudo-Marginal MCMC: p(y θ) cannot easily be evaluated pointwise. However, an unbiased estimator ˆf(θ), where E ˆf(θ) = p(y θ) can be achieved for any θ.

Examples Latent variable models: {X n } n=1, X denote latent variable indexed by n. X n g 1 ( θ), Y n g 2 ( X n, θ), (2) p(y n θ) = g 1 (X k θ)g 2 (Y k X n, θ)dx k, (3) The above integral can be intractable. X Approximate Bayesian computation (ABC): V denotes auxiliary variable. K(, ) : Y Y R + denotes a kernel. Y are some simulated Pseudo-observations. V g 1 ( ), Y = g 2 (V, θ), (4) p(y θ) = K(Y, Y )p(y θ)dy = K(Y, g 2 (V, θ))p(v )dv, (5) Y Again, this likelihood can be intractable.

Examples Doubly-intractable distribution: We know p(y θ) up to a constant Z(θ), i.e., p(y θ) = g(y; θ)/z(θ) We define an f(θ), where π(θ) f(θ): f(θ) = g(y; θ)p(θ)z(ˆθ), (6) Z(θ) where ˆθ is a fixed parameter, thus Z(ˆθ) is a constant An unbiased estimator ˆf(θ) can be achieved by importance sampling estimates. Z(ˆθ) Z(θ) = g(y; ˆθ)p(y ˆθ)dy g(y; θ)p(y θ)dy ˆf(θ) = g(y; θ)p(θ)g(y ; ˆθ) g(y ; θ) (7) where y is a sample from p(y θ) g(y; θ), y p(y θ) (8)

Literature Review Pseudo-marginal represents exact-approximate of likelihood. Representative papers on pseudo-marginal MCMC: Jesper Møller et al. (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. In: Biometrika 93.2, pp. 451 458 Christophe Andrieu and Gareth O Roberts (2009). The pseudo-marginal approach for efficient Monte Carlo computations. In: The Annals of Statistics, pp. 697 725 Iain Murray and Matthew M Graham (2015). Pseudo-marginal slice sampling. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016) Fredrik Lindsten and Arnaud Doucet (2016). Pseudo-Marginal Hamiltonian Monte Carlo. In: arxiv preprint arxiv:1607.02516

Outline 1 Overview 2 Paper review 3 experiment 4 conclusion

Pseudo-marginal with Metropolis-Hastings updates Developed upon Møller et al. (2006).

Pseudo-marginal with Metropolis-Hastings updates (Cont d) Note that the ˆf is recycled each time. i.e. The ˆf, as auxiliary variables, becomes part of the Markov Chain. When ˆf = f, the algorithm recovers standard MH. Andrieu and Roberts (2009) demonstrated that, only by doing this, the convergence to the correct target distribution can be guaranteed. Problem: the acceptance ratio, α = ˆf q(θ; θ ) ˆfq(θ ; θ), (9) can be low if the ˆf has large variance sticking behavior of sampler How to overcome this? Using conditional updating.

Auxiliary pseudo-marginal (APM) Alternate between updating θ and randomness u Figure: Framework for Auxiliary Pseudo-Marginal (APM) methods.

Thoughts on the acceptance ratio They denote all the randomness as u. (i.e. the results of all calls to a random number generator) Joint update: α = ˆf(θ,u ) ˆf(θ,u) Conditional updates for θ: α = ˆf(θ,u )q(u )q(u) ˆf(θ,u)q(u)q(u ) Conditional updates for u: α = ˆf(θ,u ) ˆf(θ,u) = ˆf(θ,u) ˆf(θ,u)

Thoughts on the acceptance ratio The θ will almost always move, but the u may still not move using MH. Try other MCMC updates rules.

A quick review of slice sampling (SS)

Pseudo-marginal slice sampling (PM-SS) Noisy slice sampler [Murray (2007)]: Replacing f(θ) with ˆf(θ) still leads to valid slice sampler. Elliptical SS [Murray, Adams, and MacKay (2010)]: efficient when dealing with Gaussian prior. Pseudo-marginal slice sampling: Use noisy SS or noisy Elliptical SS to sample u.

Outline 1 Overview 2 Paper review 3 experiment 4 conclusion

Demonstration 5-dimensional Gaussian p(θ) = 1, g(x θ) = exp[ 1/2(x θ) T (x θ)] (10) Here we pretend the normalizer Z(θ) is unknown. Given the single observation x = 0, which gives the posterior of θ will be N (0, I). The unbiased estimator is by importance sampling as in (8). Comparing 5 schemes: PM-MH, APM (MI+MH), APM (MI+SS), APM (SS+MH), APM (SS+SS)

Demonstration (cont d)

Experiments Ising model: p(y θ) = 1 Z(θ) exp( i j E θ J y i y j + i θ h y i ) (11)

Experiments

Experiments Hierarchical Gaussian process: for y {+1, 1}, X R d, the marginal likelihood p(y θ, X) cannot be evaluated exactly, as the marginalization does not have close form. Sample θ = (σ, τ) of the exponential covariance function. (variance σ, length-scale τ) Use an importance sampling estimate. Figure: Convergence and efficiency results

Experiments Figure: Cost-scaled autocorrelations

Outline 1 Overview 2 Paper review 3 experiment 4 conclusion

Wrap-ups Do more conventional MCMC on auxiliary distributions. Easy to tune (no tuning on MI/MH), often better. Easy to implement.

Additional slides for ABC (from wikipedia)

References I Andrieu, Christophe and Gareth O Roberts (2009). The pseudo-marginal approach for efficient Monte Carlo computations. In: The Annals of Statistics, pp. 697 725. Lindsten, Fredrik and Arnaud Doucet (2016). Pseudo-Marginal Hamiltonian Monte Carlo. In: arxiv preprint arxiv:1607.02516. Møller, Jesper et al. (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. In: Biometrika 93.2, pp. 451 458. Murray, Iain, Ryan Prescott Adams, and David JC MacKay (2010). Elliptical slice sampling. In: AISTATS. Vol. 13, pp. 541 548. Murray, Iain and Matthew M Graham (2015). Pseudo-marginal slice sampling. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016). Murray, Iain Andrew (2007). Advances in Markov chain Monte Carlo methods. University of London.