An ABC interpretation of the multiple auxiliary variable method

Similar documents
Evidence estimation for Markov random fields: a triply intractable problem

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Inexact approximations for doubly and triply intractable problems

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Structure Learning in Markov Random Fields

Bayesian Model Scoring in Markov Random Fields

A = {(x, u) : 0 u f(x)},

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

STA 4273H: Statistical Machine Learning

Exact Bayesian Inference for the Bingham Distribution

Inference in state-space models with multiple paths from conditional SMC

Introduction to Machine Learning CMU-10701

Kernel adaptive Sequential Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

Machine Learning Summer School

Advanced MCMC methods

Notes on pseudo-marginal methods, variational Bayes and ABC

eqr094: Hierarchical MCMC for Bayesian System Reliability

VCMC: Variational Consensus Monte Carlo

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

Bayesian density regression for count data

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Kernel Sequential Monte Carlo

Lecture 6: Graphical Models: Learning

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Controlled sequential Monte Carlo

Annealing Between Distributions by Averaging Moments

SMC 2 : an efficient algorithm for sequential analysis of state-space models

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

STA 4273H: Sta-s-cal Machine Learning

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data

Bayesian inference for intractable distributions

Bayesian Modeling of Inhomogeneous Marked Spatial Point Patterns with Location Dependent Mark Distributions

arxiv: v1 [stat.me] 6 Nov 2013

Likelihood-free MCMC

Graphical Models and Kernel Methods

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Methods for Machine Learning

MCMC algorithms for fitting Bayesian models

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

Contrastive Divergence

arxiv: v1 [stat.co] 1 Jun 2015

Semi-Parametric Importance Sampling for Rare-event probability Estimation

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Machine Learning for Data Science (CS4786) Lecture 24

Delayed Rejection Algorithm to Estimate Bayesian Social Networks

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods

Markov Chain Monte Carlo, Numerical Integration

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC

Approximate Bayesian Computation and Particle Filters

Introduction to Machine Learning CMU-10701

An introduction to Sequential Monte Carlo

Monte Carlo in Bayesian Statistics

an introduction to bayesian inference

Markov Chains and MCMC

STA 4273H: Statistical Machine Learning

Package RcppSMC. March 18, 2018

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Bayesian model selection in graphs by using BDgraph package

Evidence Estimation for Bayesian Partially Observed MRFs

On Reparametrization and the Gibbs Sampler

Introduction to Restricted Boltzmann Machines

Variational Scoring of Graphical Model Structures

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

MCMC: Markov Chain Monte Carlo

Bayesian Inference and MCMC

Control Variates for Markov Chain Monte Carlo

Markov Chain Monte Carlo

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Integrated Non-Factorized Variational Inference

MARKOV CHAIN MONTE CARLO

The zig-zag and super-efficient sampling for Bayesian analysis of big data

Bayesian computation for statistical models with intractable normalizing constants

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Adaptive Monte Carlo methods

Markov Chain Monte Carlo Methods

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

LECTURE 15 Markov chain Monte Carlo

Markov chain Monte Carlo

Markov Chain Monte Carlo Lecture 4

Principles of Bayesian Inference

Paul Karapanagiotidis ECO4060

Investigation into the use of confidence indicators with calibration

10. Exchangeability and hierarchical models Objective. Recommended reading

Bayesian Estimation of Input Output Tables for Russia

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Graphical Models for Query-driven Analysis of Multimodal Data

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Bayesian Nonparametric Regression for Diabetes Deaths

Miscellanea An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants

Stat 516, Homework 1

Monte Carlo Techniques for Regressing Random Variables. Dan Pemstein. Using V-Dem the Right Way

Mixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models

Research Report no. 1 January 2004

Lecture 6: Markov Chain Monte Carlo

Transcription:

School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle and Richard G. Everitt

An ABC interpretation of the multiple auxiliary variable method arxiv:1604.08102v1 [stat.co] 27 Apr 2016 Dennis Prangle and Richard G. Everitt Abstract We show that the auxiliary variable method (Møller et al., 2006; Murray et al., 2006) for inference of Markov random fields can be viewed as an approximate Bayesian computation method for likelihood estimation. Keywords: ABC, Markov random field, annealed importance sampling, multiple auxiliary variable method 1 Introduction Markov random fields (MRFs) have densities of the form f(y θ) = γ(y θ)/z(θ), (1) where γ(y θ) can be evaluated numerically but Z(θ) cannot in a reasonable time. This makes it challenging to perform inference. This note considers two approaches which both use simulation from f(y θ). The single auxiliary variable (SAV) method (Møller et al., 2006) and the multiple auxiliary variable (MAV) method (Murray et al., 2006) provide unbiased likelihood estimates. Approximate Bayesian computation (Marin et al., 2012) finds parameters which produce simulations similar to the observed data. We will demonstrate that these two methods are in fact closely linked. An additional challenge for inference of MRFs is that exact sampling from f(y θ) is difficult. It is possible to implement Markov chain Monte Carlo (MCMC) algorithms which sample from a close approximation to this distribution. These MCMC algorithms have been used for inference through their use as a replacement for an exact sampler in SAV and MAV (Caimo and Friel, 2011; Everitt, 2012) as well as ABC (Grelaud et al., 2009). We will use this approach and discuss it further below. The remainder of the paper is as follows. Section 2 reviews ABC and MAV methods, and Section 3 derives our result. Throughout the paper y refers to an observed dataset, and x variables refer to simulated datasets used in inference. Newcastle University. Email dennis.prangle@newcastle.ac.uk University of Reading 1

2 Background 2.1 Auxiliary variable methods The SAV method makes use of an unbiased estimate of f(y θ), given by using the following importance sampling (IS) estimate of 1/Z(θ) 1 Z = q x(x y,θ) γ(x θ), where q x is some arbitrary (normalised) density and x f( θ). MAV extends this idea by instead using annealed IS (AIS) (Neal, 2001) for this estimate 1 a Z =., (2) γ i (x i θ,y) where f i ( θ,y) γ i ( θ,y) are bridging densities between f a ( θ,y) = f( θ) and f 1 ( θ,y) = γ 1 ( θ,y) = q x ( θ,y), x a f( θ) and for 2 i < a, x i K i ( x i+1 ) where K i is a reversible Markov kernel with invariant distribution f i. In this description we have imposed that γ 1 is normalised in order to obtain an estimate of 1/Z. However we note that a common choice for q x ( θ,y) is f( θ) for some estimate θ, in which case the normalising constant Z( θ) is not available. In this case we obtain an estimate of Z( θ)/z(θ) from Equation (2). The SAV and MAV estimates are usually used as constituent parts of other Monte Carlo algorithms for parameter inference: in MCMC (Møller et al., 2006) or IS (Everitt et al., 2016). The estimates of f(y θ) just described may be used here since only an unbiased estimate of the posterior up to proportionality is required (Andrieu and Roberts, 2009). As noted in the introduction, the requirement of being able to draw x a exactly from f( θ) is potentially problematic. Caimo and Friel (2011) and Everitt (2012) explore the possibility of replacing this exact sampler with a long run (of b iterations) of an MCMC sampler targeting f( θ), and taking the final point. Such an approach results in biased estimates of 1/Z, although as b this bias goes to zero. Everitt et al. (2016) observes empirically that a similar argument appears to hold when b = 0 but a is large. 2.2 Approximate Bayesian computation ABC refers to a family of inference algorithms(described in Marin et al., 2012) which perform an approximation to Bayesian inference when numerical evaluation of the likelihood function is intractable. They instead use simulation from the model of interest. The core of these algorithms is producing estimates of the likelihood f(y θ) using some version of the following method. Simulate a dataset x from f( θ) and return the ABC likelihood estimate: L ABC = ½( y x ǫ). Here ½ represents an indicator function,. is some distance norm, and the acceptance threshold ǫ is a tuning parameter. The expectation of the random variable L ABC is f(x θ)½( y x ǫ)dx. 2

This is often referred to as the ABC likelihood. It is proportional to a convolution of the likelihood with a uniform density, evaluated at y. For ǫ > 0 this is generally an inexact approximation to the likelihood. For discrete data it is possible to use ǫ = 0 in which case the ABC likelihood equals the exact likelihood, and so L ABC is unbiased. For MRFs empirically it is observed that, compared with competitors such as the exchange algorithm (Murray et al., 2006), ABC requires a relatively large number of simulations to yield an efficient algorithm (Friel, 2013). 3 Derivation 3.1 ABC for MRF models Suppose that the model f(y θ) has an intractable likelihood but can be targeted by a MCMC chain x = (x 1,x 2,...,x n ). Let π represent densities relating to this chain. Then π n (y θ) := π(x n = y θ) is an approximation of f(y θ) which can be estimated by ABC. For now suppose that y is discrete and consider the ABC likelihood estimate requiring an exact match: simulate from π(x θ) and return ½(x n = y). We will consider an IS variation on this: simulate from g(x θ) and return ½(x n = y)π(x θ)/g(x θ). Under the mild assumption that g(x θ) has the same support as π(x θ) (typically true unless n is small), both estimates have the expectation Pr(x n = y θ). This can be generalised to cover continuous data using the identity π n (y θ) = π(x θ)dx 1:, x n=y where x i:j represents (x i,x i+1,...,x j ). An importance sampling estimate of this integral is w = π(x θ) g(x 1: θ) (3) where x is sampled from g(x 1: θ)δ(x n = y), with δ representing a Dirac delta measure. Then, under mild conditions on the support of g, w is an unbiased estimate of π n (y θ). The ideal choice of g(x 1: θ) is π(x 1: x n,θ), as then w = π(x n = y θ) exactly. This represents sampling from the Markov chain conditional on its final state being y. 3.2 Equivalence to MAV We now show that natural choices of π(x θ) andg(x 1: θ) in the ABC methodjust outlined results in the MAV estimator (2). Our choices are g(x 1: θ) = K i (x i x i+1 ) π(x θ) = f 1 (x 1 θ,y) K i (x i+1 x i ). 3

Here π(x θ) defines a MCMC chain with transitions K i (x i+1 x i ). Suppose K i is as in Section 2.1 for i a, and for i > a it is a reversible Markov kernel with invariant distribution f( θ). Also assume b := n a. Then the MCMC chain ends in a long sequence of steps targeting f( θ) so that lim n π n ( θ) = f( θ). Thus the likelihood being estimated converges on the true likelihood for large n. Note this is the case even for fixed a. The importance density g(x 1: θ) specifies a reverse time MCMC chain starting from x n = y with transitions K i (x i x i+1 ). Simulating x is straightforward by sampling x, then x n 2 and so on. This importance density is an approximation to the ideal choice stated at the end of Section 3.1. The resulting likelihood estimator is Using detailed balance gives K i (x i+1 x i ) w = f 1 (x 1 θ,y) K i (x i x i+1 ). K i (x i+1 x i ) K i (x i x i+1 ) = f i(x i+1 θ,y) = γ i(x i+1 θ,y) f i (x i θ,y) γ i (x i θ,y), so that w = f 1 (x 1 θ,y) γ i (x i+1 θ,y) γ i (x i θ,y) This is an unbiased estimator of π n (y θ). Hence = γ(y θ) n γ i (x i θ,y). v = n γ i (x i θ,y) = a γ i (x i θ,y). is an unbiased estimator of π n (y θ)/γ(y θ) 1/Z(θ). In the above we have assumed, as in Section 3.1, that γ 1 is normalised. When this is not the case then we instead get an estimator of Z( θ)/z(θ), as for MAV methods. Also note that in either case a valid estimator is produced for any choice of y. The ABC estimate can be viewed by a two stage procedure. First run a MCMC chain of length b with any starting value, targeting f( θ). Let its final value be x a. Secondly run a MCMC chain x a,x a 1,... using kernels K a 1,K a 2,... and evaluate the estimator v. This is unbiased in the limit b, so the first stage could be replaced by perfect sampling methods where these exist. The resulting procedure is thus equivalent to that for MAV. 4 Conclusion We have demonstrated that the MAV method can be interpreted as an ABC algorithm. We hope this insight will be useful for the development of novel methods for MRFs. 4

References Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, pages 697 725. Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Social Networks, 33:41 55. Everitt, R. G. (2012). Bayesian parameter estimation for latent markov random fields and social networks. Journal of Computational and Graphical Statistics, 21:940 960. Everitt, R. G., Johansen, A. M., Rowing, E., and Evdemon-Hogan, M. (2016). Bayesian model comparison with un-normalised likelihoods. Statistics and Computing: early online version. Friel, N. (2013). Evidence and Bayes factor estimation for Gibbs random fields. Journal of Computational and Graphical Statistics, 22:518 532. Grelaud, A., Robert, C. P., Marin, J.-M., Rodolphe, F., and Taly, J. F. (2009). ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis, 4(2):317 336. Marin, J.-M., Pudlo, P., Robert, C. P., and Ryder, R. J. (2012). Approximate Bayesian computational methods. Statistics and Computing, 22(6):1167 1180. Møller, J., Pettitt, A. N., Reeves, R. W., and Berthelsen, K. K. (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93:451 458. Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006). MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 359 366. Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11:125 139. 5