A = {(x, u) : 0 u f(x)},

Similar documents
Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

A Review of Pseudo-Marginal Markov Chain Monte Carlo

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Likelihood Inference for Lattice Spatial Processes

An ABC interpretation of the multiple auxiliary variable method

Markov Chain Monte Carlo Lecture 6

Research Report no. 1 January 2004

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

STA 4273H: Statistical Machine Learning

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo Lecture 4

LECTURE 15 Markov chain Monte Carlo

Phase transitions and finite-size scaling

Learning the hyper-parameters. Luca Martino

Interpretations of Cluster Sampling by Swendsen-Wang

17 : Markov Chain Monte Carlo

Monte Carlo Inference Methods

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

MCV172, HW#3. Oren Freifeld May 6, 2017

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

Markov Chain Monte Carlo

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Monte Carlo-based statistical methods (MASM11/FMS091)

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

19 : Slice Sampling and HMC

The Ising model and Markov chain Monte Carlo

6 Markov Chain Monte Carlo (MCMC)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Computational statistics

Inexact approximations for doubly and triply intractable problems

Perfect simulation of repulsive point processes

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Metropolis-Hastings Algorithm

Monte Carlo in Bayesian Statistics

MCMC algorithms for fitting Bayesian models

Advances and Applications in Perfect Sampling

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

VCMC: Variational Consensus Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the

Image segmentation combining Markov Random Fields and Dirichlet Processes

Perfect simulation for image analysis

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Monte Carlo integration

Monte Carlo Methods. Geoff Gordon February 9, 2006

Inference in state-space models with multiple paths from conditional SMC

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov chain Monte Carlo

The Phase Transition of the 2D-Ising Model

CSC 2541: Bayesian Methods for Machine Learning

Stat 8501 Lecture Notes Spatial Lattice Processes Charles J. Geyer November 16, Introduction

Computer intensive statistical methods

Markov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher

Bayesian Methods for Machine Learning

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Bayesian Inference and MCMC

Evidence estimation for Markov random fields: a triply intractable problem

Likelihood-free MCMC

Markov chain Monte Carlo

Machine Learning. Probabilistic KNN.

Markov Chain Monte Carlo

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

MONTE CARLO METHODS. Hedibert Freitas Lopes

Advanced Monte Carlo Methods Problems

Hamiltonian Monte Carlo

Three examples of a Practical Exact Markov Chain Sampling

Dynamics for the critical 2D Potts/FK model: many questions and a few answers

Exact and approximate recursive calculations for binary Markov random fields defined on graphs

13 Notes on Markov Chain Monte Carlo

Graphical Representations and Cluster Algorithms

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

MCMC: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

MARKOV CHAIN MONTE CARLO

Stochastic Approximation Monte Carlo and Its Applications

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Adaptive HMC via the Infinite Exponential Family

Markov Chain Monte Carlo methods

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Critical Dynamics of Two-Replica Cluster Algorithms

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

DAG models and Markov Chain Monte Carlo methods a short overview

Introduction to Machine Learning CMU-10701

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference


MCMC and likelihood-free methods

Markov Chain Monte Carlo Methods

Uniqueness Condition for Auto-logistic model

Markov Chain Monte Carlo

Transcription:

Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent to sampling uniformly from the area under the graph f(x): A = {(x, u) : 0 u f(x)}, which is the basis of the acceptance-rejection algorithm described in??. To achieve this goal, one can augment the target distribution by an auxiliary variable U, which, conditional on x, is uniformly distributed on the interval [0, f(x)]. Therefore, the joint density function of (X, U) is f(x, u) = f(x)f(u x) 1 (x,u) A, which can be sampled using the Gibbs sampler as follows: Slice Sampler Draw u t+1 Uniform[0, f(x t )].

This sampler is called the slice sampler by Higdon (1998), which potentially can be more efficient than the simple MH algorithm for the multimodal distributions, due to the free between-mode-transitions within a slice (as illustrated by Figure 1).

Edwards and Sokal (1988) noted that when f(x) can be decomposed into a product of k distribution functions, i.e., f(x) f 1 (x) f 2 (x) f k (x), the slice sampler can be easily implemented. To sample from such a distribution, Edwards and Sokal introduced k auxiliary variables, U 1,..., U k, and proposed the following algorithm: Draw u (i) t+1 Uniform[0, f i (x t )], i = 1,..., k. Draw x t+1 uniformly from the region S = k i=1{x : f i (x) u (i) t+1}. This algorithm is a generalization of the Swendsen-Wang algorithm (Swendsen and Wang, 1987).

where n(i) denotes the set of neighbors of spin i. However, the Gibbs sampler slows down rapidly when the temperature is approaching or below the critical temperature. This is the so-called critical slowing down. Markov Chain Monte Carlo Lecture 5 Consider a 2-D Ising model with the Boltzmann density { f(x) exp K i j x i x j }, (1) where the spins x i = ±1, K is the inverse temperature, and i j represents the nearest neighbors on the lattice. When the temperature is high, this model can be easily simulated using the Gibbs sampler (Geman and Geman, 1984): Iteratively reset the value of each spin according to the conditional distribution P (x i = 1 x j, j n(i)) = 1 1 + exp{ 2K j n(i) }, P (x i = 1 x j, j n(i)) = 1 P (x i = 1 x j, j n(i)), (2)

Swendsen-Wang algorithm: As in slice sampling, the density (1) is rewritten in the form f(x) i j exp{k(1 + x i x j )} = i j exp{βi(x i = x j )}, (3) where β = 2K and I( ) is the indicator function, as 1 + x i x j is either 0 or 2. If we introduce auxiliary variables u = (u i j ), where each component u i j, conditional on x i and x j, is uniformly distributed on [0, exp{βi(x i = x j )}], then f(x, u) i j I(0 u i j exp{βi(x i = x j )}). In Swendsen and Wang (1987), u i j is called a bond variable, which can be viewed as a variable physically sitting on the edge between spin i and spin j.

If u i j > 1, then exp{βi(x i = x j ) > 1 and there must be x i = x j. Otherwise, there is no constraint on x i and x j. Let b i j be an indicator variable for the constraint. If x i and x j are constrained to be equal, we set b i j = 1 and 0 otherwise. Note that for any two like-spin (i.e., the two spins have the same values) neighbors, they are bonded with probability 1 exp( β). Based on the configurations of u, we can cluster the spins according to whether they are connected via a mutual bond (i.e., b i j = 1). Then all the spins within the same cluster will have identical values, and flipping all the spins in a cluster simultaneously will not change the equilibrium of f(x, u).

The Swendsen-Wang Algorithm Update bond values: Check all like-spin neighbors, and set b i j = 1 with probability 1 exp( β). Update spin values: Cluster spins by connecting neighboring sites with a mutual bond, and then flip each cluster with probability 0.5. For the Ising model, the introduction of the auxiliary variable u has the dependence between neighboring spins partially decoupled, and the resulting sampler can thus converge substantially faster than the single site updating algorithm. As demonstrated by Swendsen and Wang (1987), this algorithm can eliminate much of the critical slowing down.

spatial autologistic model Spatial models, e.g., the autologistic model, the Potts model, and the autonormal model (Besag, 1974), have been used in modeling of many scientific problems. A major problem with these models is that the normalizing constant is intractable. Suppose we have a dataset X generated from a statistical model with the likelihood function f(x θ) = 1 Z(θ) exp{ U(x, θ)}, x X, θ Θ, (4) where θ is the parameter, and Z(θ) is the normalizing constant which depends on θ and is not available in closed form. Let f(θ) denote the prior density of θ. The posterior distribution of θ given x is then given by f(θ x) 1 Z(θ) exp{ U(x, θ)}f(θ). (5)

Let y denote the auxiliary variable, which shares the same state space with x. Let f(θ, y x) = f(x θ)f(θ)f(y θ, x), (6) denote the joint distribution of θ and y condional on x, where f(y θ, x) is the distribution of the auxiliary variable y. To simulate from (6) using the MH algorithm, one can use the proposal distribution q(θ, y θ, y) = q(θ θ, y)q(y θ ), (7) which corresponds to the usual change on the parameter vector θ θ, followed by an exact sampling (Propp and Wilson, 1996) step of drawing y from q( θ ). If q(y θ ) is set as f(y θ), then the MH ratio can be written as r(θ, y, θ, y x) = f(x θ )f(θ )f(y θ, x)q(θ θ, y )f(y θ) f(x θ)f(θ)f(y θ, x)q(θ θ, y)f(y θ ), (8) where the unknown normalizing constant Z(θ) can be canceled. To ease computation, Møller et al. further suggested to set the proposal

distributions q(θ θ, y) = q(θ θ) and q(θ θ, y ) = q(θ θ ), and to set the auxiliary distributions f(y θ, x) = f(y θ), f(y θ, x) = f(y θ), (9) where θ denotes an estimate of θ, for example, which can be obtained by maximizing a pseudo-likelihood function.

The Møller Algorithm Generate θ from the proposal distribution q(θ θ t ). Generate an exact sample y from the distribution f(y θ ). Accept (θ, y ) with probability min(1, r), where r = f(x θ )f(θ )f(y θ)q(θ t θ )f(y θ t ) f(x θ t )f(θ t )f(y θ)q(θ θ t )f(y θ ). If it is accepted, set (θ t+1, y t+1 ) = (θ, y ). (θ t+1, y t+1 ) = (θ t, y t ). Otherwise, set

This simply validates the algorithm, following the arguments for auxiliary variable Markov chains made in previous lectures. Markov Chain Monte Carlo Lecture 5 The exchange algorithm Propose θ q(θ θ, x). Generate an auxiliary variable y f(y θ ) using an exact sampler. Accept θ with probability min{1, r(θ, θ, y x)}, where r(θ, θ, y x) = π(θ )f(x θ )f(y θ)q(θ θ ) π(θ)f(x θ)f(y θ )q(θ θ). (10) The exchange algorithm can be viewed as an auxiliary variable MCMC algorithm with the proposal distribution being augmented, for which the proposal distribution can be written as T (θ (θ, y)) = q(θ θ)f(y θ ), T (θ (θ, y)) = q(θ θ )f(y θ).

f(x) U 0 A B C X Figure 1: The slice sampler for a multimodal distribution: Draw X U uniformly from the sets labeled by A, B and C.