Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Similar documents
Markov Chain Monte Carlo (MCMC)

19 : Slice Sampling and HMC

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

STA 4273H: Statistical Machine Learning

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS

17 : Markov Chain Monte Carlo

Monte Carlo Methods. Leon Gu CSD, CMU

The Recycling Gibbs Sampler for Efficient Learning

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Markov Chain Monte Carlo

Computational statistics

Markov Chain Monte Carlo methods

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Computer intensive statistical methods

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

Reminder of some Markov Chain properties:

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Markov Chain Monte Carlo Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Statistical Inference for Stochastic Epidemic Models

Markov Chain Monte Carlo, Numerical Integration

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

eqr094: Hierarchical MCMC for Bayesian System Reliability

6 Markov Chain Monte Carlo (MCMC)

Monte Carlo methods for sampling-based Stochastic Optimization

Lecture 8: The Metropolis-Hastings Algorithm

CSC 2541: Bayesian Methods for Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Adaptive Rejection Sampling with fixed number of nodes

Markov Chains and MCMC

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

MCMC for Cut Models or Chasing a Moving Target with MCMC

Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Sargur Srihari

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Adaptive Monte Carlo methods

ST 740: Markov Chain Monte Carlo

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

16 : Markov Chain Monte Carlo (MCMC)

I. Bayesian econometrics

MCMC Sampling for Bayesian Inference using L1-type Priors

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Likelihood-free MCMC

Quantifying Uncertainty

Adaptive Rejection Sampling with fixed number of nodes

A = {(x, u) : 0 u f(x)},

Advances and Applications in Perfect Sampling

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

Brief introduction to Markov Chain Monte Carlo

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

Lecture 6: Markov Chain Monte Carlo

arxiv: v1 [stat.co] 18 Feb 2012

Markov chain Monte Carlo

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Advanced Statistical Modelling

Doing Bayesian Integrals

Monte Carlo Methods for Inference and Learning

MCMC algorithms for Subset Simulation

Markov chain Monte Carlo

A Repelling-Attracting Metropolis Algorithm for Multimodality

STA 294: Stochastic Processes & Bayesian Nonparametrics

Control Variates for Markov Chain Monte Carlo

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Markov chain Monte Carlo

University of Toronto Department of Statistics

CPSC 540: Machine Learning

MCMC: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo

Markov chain Monte Carlo methods in atmospheric remote sensing

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Markov Chain Monte Carlo in Practice

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

The Recycling Gibbs Sampler for Efficient Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Markov Chain Monte Carlo (MCMC)

Slice Sampling Mixture Models

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Introduction to Bayesian methods in inverse problems

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Lecture 7 and 8: Markov Chain Monte Carlo

Tutorial Lectures on MCMC II

Stochastic optimization Markov Chain Monte Carlo

Inference in state-space models with multiple paths from conditional SMC

Metropolis-Hastings Algorithm

Exploring Monte Carlo Methods

F denotes cumulative density. denotes probability density function; (.)

16 : Approximate Inference: Markov Chain Monte Carlo

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions

Transcription:

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Basic Ratio of Uniforms Method Introduced by Kinderman and Monahan (1977). Suppose f is a (possibly unnormalized) density Suppose V,U are uniform on A = {(v,u) : 0 < u < f (v/u)} Then X = V /U has density (proportional to) f (x) 1

Example: Standard Normal Distribution Suppose f (x) = 1 2π e x2 /2 Then A is bounded. Can use rejection sampling from an enclosing rectangle. u 0.0 0.2 0.4 0.6 0.8 1.0 0.5 0.0 0.5 v 2

Some Generalizations If f (x) = f 0 (x) f 1 (x) and V,U has density proportional to f 0 (v/u) on A = {(v,u) : 0 < u < f 1 (v/u)} then X = V /U has density proportional to f If V,U are uniform on A = {(v,u) : 0 < u < f (v/u + µ)} then X = V /U + µ has density proportional to f 3

Some Generalizations (cont.) Choosing µ as the mode of f often works well. For a Gamma(α = 50) density f (x) x 49 e x : u 0.00 0.05 0.10 0.15 0.20 u 0.00 0.05 0.10 0.15 0.20 0 2 4 6 8 10 12 v 1.0 0.5 0.0 0.5 1.0 1.5 v µ = 0 µ = α 1 = 49 4

Higher Dimensions Stefănescu and Văduva (1987); Wakefield, Gelfand and Smith (1991) If V,U are uniform on A = {(v,u) : v R d,0 < u < d+1 f (v/u + µ)} then X = V /U + µ has density proportional to f Can rejection sample from hyper-rectangle Usually not practical in higher dimensions Alternative: Sample A by MCMC 5

Some Regions for d = 2: Higher Dimensions (cont.) Bivariate Normal, ρ = 0.73 Variance Components 6

Some Properties of the Region A A is bounded if f (x) and x d+1 f (x) are bounded. A is convex if f (x) is log concave. If f (x) = f 0 (x) f 1 (x) and A = {(v,u) : 0 < u < d+1 f 1 (v/u + µ)} and f 1 is log concave, then A is convex A is bounded if and only if f 1 (x)dx <. Leydold (2001); Hörmann, Leydold, Derflinger (2004) 7

Auxiliary Variable Approach Suppose X has density proportional to f (x) on R d Let U X = x have density rd+1 f (x) urd for 0 < u < rd+1 f (x) where r > 1/d. Then X,U have joint density for 0 < u < rd+1 f (x). f X,U (x,u) f (x)u rd / f (x) = u rd Let V = XU r. Then V,U have joint density for 0 < u < rd+1 f (v/u r ). f V,U (v,u) u rd J(v,u) = u rd u rd = 1 8

Auxiliary Variable Approach (cont.) Generalized RU sampler: X = V /U r with V,U uniform on A r = {(u,v) : 0 < u < rd+1 f (v/u r )} r = 1: standard RU sampler r = 0: slice sampler 9

A Simple Gibbs Sampler Approach Alternate sampling V U = u and U V = v Sampling V U = u is equivalent to sample set X U = u V = Xu rd Sampling V U = u or X U = u may not be practically feasible for large d may be theoretically useful to consider Call this the simple RU sampler. 10

Slice and RU Sampler Comparison Standard Normal RU Region for Normal f(x) 0.0 0.1 0.2 0.3 0.4 u 0.0 0.2 0.4 0.6 3 2 1 0 1 2 3 x 0.4 0.2 0.0 0.2 0.4 v Standard Cauchy RU Region for Cauchy f(x) 0.0 0.1 0.2 0.3 0.4 u 0.0 0.2 0.4 0.6 0.8 1.0 4 2 0 2 4 x 0.6 0.4 0.2 0.0 0.2 0.4 0.6 v 11

Other MCMC Approaches Updating V and U separately: Updating V : by any method that leaves V U = u or X U = u invariant one or more coordinates at a time, or all at once Neal (2003) describes many approaches Updating U: for log convex f can use adaptive rejection again, methods from Neal (2003) can be used 12

Other MCMC Approaches (cont.) Marginal updating of X: update to X n by any method that is invariant for f generate U 0 X = x n as in the auxiliary variable approach compute V = X n U 0 choose U 1 by a method that leaves U V = v invariant compute X n+1 = V /U 1 = X n U 0 /U 1 This is a random rescaling of X n that leaves f invariant 13

Other MCMC Approaches (cont.) Joint sampling of V,U. Hit and run (random direction) methods uniform conditionals for log concave f other univariate uniform samplers when A is not convex Random walk Metropolis-Hastings methods For symmetric, e.g. Gaussian, proposal acceptance probability is zero or one. Shape/scaling of A is important. 14

Other MCMC Approaches (cont.) Suppose g is an approximation to f with RU region B, and q(y x) is a symmetric proposal kernel. Let q(y x) = λ 1 B 1 B(y) + (1 λ)q(y x) Then the Metropolis-Hastings acceptance probabilities are 1 if x A B or y A B c ( 1 α(x,y) = 1 + λ 1 1 λ B q(y x)) if x A B c and y A B 0 otherwise 15

Theoretical Properties Uniform ergodicity: π(x) f (x), πp = π and for some M < and ρ < 1 P n (x,dy) π(dy) TV Mρ n If f is log concave with mode at the origin, then the simple RU Gibbs sampler is uniformly ergodic. If f is log concave, then the hit-and-run sampler is uniformly ergodic. If f and x rd+1 f (x) are bounded then Gaussian MH samplers are uniformly ergodic. 16

A Simple Example Suppose f (x) is a density on R d with f (x) ( 2 1 + x 2 The RU region A is the sphere ) d+1 A = {(v,u) : v 2 + (u 1) 2 1} Simple RU sampler, run of 10,000, d = 10,20,50,100,200: absolute lag one autocorrelations are around 0.03 17

A Simple Example (cont.) Gaussian random walk Metropolis-Hastings: optimal proposal SD seems to be around 1.5/d 0.9 absolute lag d autocorrelations are around 0.5 Hit and run: absolute lag d autocorrelations are around 0.35 Coordinate-wise Gaussian random walk M-H: optimal proposal SD seems to be around 2.4/ d absolute lag one autocorrelations are around 0.6 Coordinate-wise Gibbs: absolute lag one autocorrelations are around 0.35 18

Further Work and Open Questions Computational issues numerical considerations computational efficiency efficient single coordinate updates Transformations and parameterization location scale choices for f scaling of U nonlinear transformations, e.g. logs opportunities for adaptive methods 19

Further Work and Open Questions (cont.) Use with marginals and conditionals sample joint conditionals for blocking sample marginals if available (variance components) Multi-modality issues A is connected at the origin allows transitions between modes 20

A Multi-Modal Target Density Two well-separated modes: y 0.00 0.05 0.10 0.15 0.20 u 0.0 0.1 0.2 0.3 0.4 5 0 5 10 15 20 25 x 0 2 4 6 8 v Mixture of Normals RU Region 21

Further Work and Open Questions (cont.) Diagnostic uses maybe useful for discovering additional modes unbounded behavior may indicate improper f Theoretical considerations explicit transition kernels and convergence rates perfect sampling opportunities hybrids to make other samplers uniformly ergodic 22

Conclusions The value of the ratio of uniforms transformation for random variate generation is well known It also has interesting uses in MCMC construction. Its simplicity leads to interesting theoretical properties. With good scaling there is promise for good practical results. More work and experience is needed to fully understand the practical and theoretical value. 23