Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet

Similar documents
Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Markov Chain Monte Carlo methods

Computational statistics

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection

CPSC 540: Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Bayesian model selection in graphs by using BDgraph package

Topic Modelling and Latent Dirichlet Allocation

Markov chain Monte Carlo methods in atmospheric remote sensing

Hmms with variable dimension structures and extensions

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

MCMC algorithms for fitting Bayesian models

Supplementary Notes: Segment Parameter Labelling in MCMC Change Detection

Markov chain Monte Carlo Lecture 9

Markov chain Monte Carlo

Monte Carlo methods for sampling-based Stochastic Optimization

Markov Chain Monte Carlo, Numerical Integration

Inference in state-space models with multiple paths from conditional SMC

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Graphical Models for Query-driven Analysis of Multimodal Data

Monte Carlo in Bayesian Statistics

A note on Reversible Jump Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Pseudo-marginal MCMC methods for inference in latent variable models

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Nonparametric Regression for Diabetes Deaths

Learning the hyper-parameters. Luca Martino

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Monte Carlo Solution of Integral Equations

MODEL selection is a fundamental data analysis task. It

Towards Automatic Reversible Jump Markov Chain Monte Carlo

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

LECTURE 15 Markov chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Bayesian Methods for Machine Learning

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

16 : Markov Chain Monte Carlo (MCMC)

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Markov-Chain Monte Carlo

an introduction to bayesian inference

Assessing Regime Uncertainty Through Reversible Jump McMC

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Lecture 13 : Variational Inference: Mean Field Approximation

17 : Markov Chain Monte Carlo

Bayesian Inference and MCMC

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Adaptive Monte Carlo methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

F denotes cumulative density. denotes probability density function; (.)

Bayesian room-acoustic modal analysis

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

On Markov chain Monte Carlo methods for tall data

MCMC: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Contents. Part I: Fundamentals of Bayesian Inference 1

Answers and expectations

Forward Problems and their Inverse Solutions

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Principles of Bayesian Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Computer Practical: Metropolis-Hastings-based MCMC

Bayesian nonparametrics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

MARKOV CHAIN MONTE CARLO

Session 3A: Markov chain Monte Carlo (MCMC)

Sampling Methods (11/30/04)

Bayesian Nonparametrics: Dirichlet Process

19 : Slice Sampling and HMC

STA 4273H: Sta-s-cal Machine Learning

On Solving Integral Equations using Markov Chain Monte Carlo Methods

Robert Collins CSE586, PSU Intro to Sampling Methods

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Accounting for Calibration Uncertainty

An introduction to Sequential Monte Carlo

Bayesian Analysis of Mixture Models with an Unknown Number of Components an alternative to reversible jump methods

Spatial Normalized Gamma Process

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Bayesian Linear Regression

Computer intensive statistical methods

Modeling Ultra-High-Frequency Multivariate Financial Data by Monte Carlo Simulation Methods

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Advances and Applications in Perfect Sampling

A comparison of reversible jump MCMC algorithms for DNA sequence segmentation using hidden Markov models

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Markov Chain Monte Carlo (MCMC)

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Marginal maximum a posteriori estimation using Markov chain Monte Carlo

Markov Chain Monte Carlo in Practice

Transcription:

Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 18-16th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1

1.1 Outline Trans-dimensional Markov chain Monte Carlo. Bayesian model for autoregressions. Bayesian analysis of finite mixture of Gaussians. Overview of the Lecture Page 2

2.1 Metropolis-Hastings The standard MH algorithm where X R d corresponds to K x, dx =α x, x q x, dx + 1 α x, z q x, dz δ x dx where You should think of not as just a number! α x, x =min { 1, π x q x },x π x q x, x π x q x,x π x q x, x Trans-dimensional Markov Chain Monte Carlo Page 3

2.1 Metropolis-Hastings The acceptance ratio corresponds to a ratio of probability measures -importance weight- defined on the same spaces π dx q x,dx π dx q x, dx = π x dx q x,x dx π x dxq x, x dx = π x q x,x π x q x, x. You can only compared points defined on the same joint space. If you have x =x 1,x 2 andπ 1 dx 1 =π 1 x 1 dx 1, π 2 dx 1,dx 2 =π 2 x 1,x 2 dx 1 dx 2, you can compute numerically π 2 x 1,x 2 π 1 x 1 but it means nothing as the measures π 1 and π 2 are not defined on the same space. You CANNOT compare a surface to a volume! Trans-dimensional Markov Chain Monte Carlo Page 4

2.2 Designing trans-dimensional moves In the general case where X is a union of subspaces of different dimensions, you might want to move from x R d to x R d. To construct this move, you can use u R r and u R r and a one-to-one differentiable mapping h:r d R r R d R r and x,u =h x, u whereu g x, u =h 1 x,u where u g. We need d + r = d + r and typically, if d<d,thenr =0andr = d r, that is in most case the variable u is not introduced. Trans-dimensional Markov Chain Monte Carlo Page 5

2.2 Designing trans-dimensional moves We can rewrite formally and π dx q x, dx,du = π x g u dxdu π dx q x, dx, du = π x g u dx du. An acceptance ratio ensuring π reversibility of this trans-dimensional move is given by π dx q x, dx, du π dx q x, dx,du = π x g u π x g u x,u x, u. In this respect, the RJMCMC is an extension of standard MH as you need introduce auxiliary variables u and u. Trans-dimensional Markov Chain Monte Carlo Page 6

2.3 Example: Birth/Death Moves Assume we have a distribution defined on {1} R {2} R R. Wewant to propose some moves to go from 1,θto2,θ 1,θ 2. We can propose u g R and set θ 1,θ 2 =hθ, u =θ, u, i.e. we do not need to introduce a variable u. Its inverse is given by θ, u =h θ 1,θ 2 =θ 1,θ 2. The acceptance probability for this birth move is given by min 1, π 2,θ 1,θ 2 π 1,θ 1 g u θ 1,θ 2 θ, u =min 1, π 2,θ 1,θ 2. π 1,θ 1 g θ 2 Trans-dimensional Markov Chain Monte Carlo Page 7

2.3 Example: Birth/Death Moves The acceptance probability for the associated death move is min 1, π 1,θ π 2,θ 1,θ 2 g u θ, u θ 1,θ 2 =min 1, π 1,θ g u π 2,θ,u Once the birth move is defined then the death move follows automatically. In the death move, we do not simulate from g but its expression still appear in the acceptance probability. Trans-dimensional Markov Chain Monte Carlo Page 8

2.4 Example: Birth/Death Moves To simplify notation -as in Green 1995 & Robert 2004-, we don t emphasize that actually we can have the proposal g which is a function of the current point θ but it is possible! We can propose u g θ R and set θ 1,θ 2 =hθ, u =θ, u. Its inverse is given by θ, u =h θ 1,θ 2 =θ 1,θ 2. The acceptance probability for this birth move is given by min 1, π 2,θ 1,θ 2 π 1,θ 1 g u θ θ 1,θ 2 θ, u =min 1, π 2,θ 1,θ 2. π 1,θ 1 g θ 2 θ 1 Trans-dimensional Markov Chain Monte Carlo Page 9

2.4 Example: Birth/Death Moves The acceptance probability for the associated death move is min 1, π 1,θ g u θ π 2,θ 1,θ 2 θ, u θ 1,θ 2 =min 1, π 1,θ g u θ π 2,θ,u Once the birth move is defined then the death move follows automatically. In the death move, we do not simulate from g but its expression still appears in the acceptance probability. Clearly if we have g θ 2 θ 1 =π θ 2 2,θ 1 then the expressions simplify min 1, min π 2,θ 1,θ 2 π 1,θ 1 g θ 2 θ 1 1, π 1,θ g u θ π 2,θ,u = min = min 1, π 2,θ 1, π 1,θ 1 1, π 1,θ. π 2,θ Trans-dimensional Markov Chain Monte Carlo Page 10

2.5 Example: Split/Merge Moves Assume we have a distribution defined on {1} R {2} R R. Wewant to propose some moves to go from 1,θto2,θ 1,θ 2. We can propose u g R and set θ 1,θ 2 =h θ, u =θ u, θ + u. Its inverse is given by θ, u =h θ 1,θ 2 = θ1 + θ 2 2, θ 2 θ 1 2. The acceptance probability for this split move is given by min 1, π 2,θ 1,θ 2 π 1,θ 1 g u θ 1,θ 2 θ, u =min 1, π 2,θ 1,θ 2 π 1, θ 1+θ 2 2 2 g θ 2 θ 1 2. Trans-dimensional Markov Chain Monte Carlo Page 11

2.5 Example: Split/Merge Moves The acceptance probability for the associated merge move is min 1, π 1,θ π 2,θ 1,θ 2 g u θ, u θ 1,θ 2 =min 1, π 1,θ π 2,θ u, θ + u g u 2 Once the split move is defined then the merge move follows automatically. In the merge move, we do not simulate from g but its expression still appear in the acceptance probability. Trans-dimensional Markov Chain Monte Carlo Page 12

2.6 Mixture of Moves In practice, the algorithm is based on a combination of moves to move from x =k, θ k tox =k,θ k indexed by i Mandinthiscasewejustneed to have π dx α i x, x q i x, dx = π dx α i x,x q i x,dx x,x A B x,x A B to ensure that the kernel P x, B defined for x/ B P x, B = 1 α i x, x q i x, dx M is π-reversible. i M In practice, we would like to have P x, B = j i x α i x, x q i x, dx i M where j i x is the probability of selecting the move i once we are in x and i M j i x =1. Trans-dimensional Markov Chain Monte Carlo Page 13

2.6 Mixture of Moves In this case reversibility is ensured if which is satisfied if x,x A B π dx j i x α i x, x q i x, dx = x,x A B π dx j i x α i x,x q i x,dx α i x, x =min 1, π x j i x g i u π x j i x g i u x,u x, u. In practice, we will only have a limited number of moves possible from each point x. Trans-dimensional Markov Chain Monte Carlo Page 14

2.7 Summary For each point x =k, θ k, we define a collection of potential moves selected randomly with probability j i x wherei M. To move from x =k, θ k tox =k,θ k, we build one or several deterministic differentiable and inversible mappings θ k,u k,k =T k,k θ k,u k,k where u k,k g k,k and u k,k g k,k and we accept the move with proba min 1, π k,θ k j i k,θ k g k,k u k,k π k, θ k j i k, θ k g k,k u k,k T k,k θ k,u k,k θ k,u k,k. Trans-dimensional Markov Chain Monte Carlo Page 15

2.8 One minute break This brilliant idea is due to P.J. Green, Reversible Jump MCMC and Bayesian Model Determination, Biometrika, 1995 although special cases had appeared earlier in physics. This is one of the top ten most cited paper in maths and is used nowadays in numerous applications including genetics, econometrics, computer graphics, ecology, etc. Trans-dimensional Markov Chain Monte Carlo Page 16

2.9 Example: Bayesian Model for Autoregressions The model k K= {1,..., k max } is given by an AR of order k k Y n = a i Y n i + σv n where V n N0, 1 i=1 and we have θ k = a k,1:k,σk 2 R k R + where p k = kmax 1 for k K, p θ k k = N a k,1:k ;0,σkδ 2 2 I k IG σ 2 ; ν 0 2, γ 0 2 For sake of simplicity, we assume here that the initial conditions y 1 kmax :0 =0,..., 0 are known and we want to sample from. p θ k,k y 1:T. Note that this is not very clever as p k y 1:T is known up to a normalizing constant! Trans-dimensional Markov Chain Monte Carlo Page 17

2.9 Example: Bayesian Model for Autoregressions We propose the following moves. If we have k, a 1:k,σ 2 k then with probability b k we propose a birth move if k k max,withprobau k we propose an update moveandwithprobad k =1 b k u k we propose a death move. We have d 1 =0andb k max =0. The update move can simply done in a Gibbs step as p θ k y 1:T,k=N a k,1:k ; m k,σ 2 Σ k IG σ 2 ; ν k 2, γ k 2 Trans-dimensional Markov Chain Monte Carlo Page 18

2.9 Example: Bayesian Model for Autoregressions Birth move: Weproposetomovefromk to k +1 ak+1,1:k,a k+1,k+1,σk+1 2 = ak,1:k,u,σk 2 where u gk,k+1 and the acceptance probability is min 1, p a k,1:k,u,σ 2 k,k+1 y 1:T dk+1 p a k,1:k,σ 2 k,k y 1:T b k g k,k+1 u. Death move: Weproposetomovefromk to k 1 ak 1,1:k 1,u,σk 1 2 = ak,1:k 1,a k,k,σk 2 and the acceptance probability is min 1, p a k,1:k 1,σk 2,k 1 y 1:T bk 1 g k 1,k a k,k p a k,1:k,σk 2,k y 1:T d k Trans-dimensional Markov Chain Monte Carlo Page 19

2.9 Example: Bayesian Model for Autoregressions The performance are obviously very dependent on the selection of the proposal distribution. We select whenever possible the full conditional distribution, i.e. we have u = a k+1,k+1 p a k+1,k+1 y 1:T,a k,1:k,σ 2 k,k+1 and min 1, p a k,1:k,u,σ 2 k,k+1 y 1:T dk+1 p a k,1:k,σ 2 k,k y 1:T b k p u y 1:T,a k,1:k,σ 2 k,k+1 = min 1, p a k,1:k,σ 2 k,k+1 y 1:T dk+1 p a k,1:k,σ 2 k,k y 1:T b k. In such cases, it is actually possible to reject a candidate before sampling it! Trans-dimensional Markov Chain Monte Carlo Page 20

2.9 Example: Bayesian Model for Autoregressions We simulate 200 data with k = 5 and use 10,000 iterations of RJMCMC. The algorithm output is k i,θ i k p θ k,k y asymptotically. The histogram of k i yields an estimate of p k y. Histograms of θ i k for which k i = k 0 yields estimates of p θ k0 y, k 0. The algorithm provides us with an estimate of p k y which matches analytical expressions. Trans-dimensional Markov Chain Monte Carlo Page 21

2.10 Finite Mixture of Gaussians The model k K= {1,..., k max } is given by a mixture of k Gaussians Y n k π i N μ i,σi 2. i=1 and we have θ k = π 1:k,μ 1:k,σ 2 1:k Sk R k R + k. We need to defined a prior p k, θ k =p k p θ k k, say p k = k 1 max for K k p θ k k = D π k,1:k ;1,..., 1 N μ k,i ; α, β IG i=1 σk,i; 2 ν 0 2, γ 0. 2 Given T data, we are interested in π k, θ k y 1:T. Trans-dimensional Markov Chain Monte Carlo Page 22

2.11 Trans-dimensional MCMC When k is fixed, we will use Gibbs steps to sample from π θ k,z 1:T y 1:T,k where z 1:T are the discrete latent variables such that Pr z n = i k, θ k =π k,i. To allow to move in the model space, we define a birth and death move. The birth and death moves use as a target π θ k y 1:T,k and not π θ k,z 1:T y 1:T,k. Reduced dimensionality, easier to design moves. Trans-dimensional Markov Chain Monte Carlo Page 23

2.12 Birth Move We propose a naive move to go from k k +1wherej U {1,...,k+1} μ k+1, j = μ k,1:k, σ 2 k+1, j = σ 2 k,1:k, where π k+1, j = 1 π k+1,j π k, j, π k+1,j,μ k+1,j,σk+1,j 2 g k,k+1 prior distribution in practice. The Jacobian of the transformation is 1 π k+1,j k 1 only k 1 true variables for π k, j Now one has to be careful when considering the reverse death move. Assume the death move going from k +1 k by removing the component j. Trans-dimensional Markov Chain Monte Carlo Page 24

2.12 Birth Move The acceptance probability of the birth move is given by min 1,A where A = π k +1,π k+1,1:k+1,μ k+1,1:k+1,σk+1,1:k+1 2 y 1:T π k, π k,1:k,μ k,1:k,σk,1:k 2 y 1:T d k+1,k / k + 1 1 π k+1,j k 1. b k,k+1 / k +1g k,k+1 πk+1,j,μ k+1,j,σj 2 This move will work properly if the prior is not too diffuse. Otherwise the acceptance probability will be small. We have k +1birthmovestomovefromk k +1andk +1 associated death moves. Trans-dimensional Markov Chain Monte Carlo Page 25

2.13 Split and Merge Moves To move from k k + 1, one can also select a split move of the component j U {1,...,k} with u 1,u 2,u 3 U0, 1. π k+1,j = u 1 π k,j, π k+1,j+1 =1 u 1 π k,j, μ k+1,j = u 2 μ k,j, μ k+1,j+1 = π k,j π k+1,j u 2 π k,j π k+1,j μ k,j, σ 2 k+1,j = u 3 σ 2 k,j, σ 2 k+1,j+1 = π k,j π k+1,j u 3 π k,j π k+1,j σ 2 k,j Trans-dimensional Markov Chain Monte Carlo Page 26

2.13 Split and Merge Moves The associated merge move is π k,j = π k+1,j + π k+1,j+1, π k,j μ k,j = π k+1,j μ k+1,j + π k+1,j+1 μ k+1,j+1, π k,j σ 2 k,j = π k+1,j σ 2 k+1,j + π k+1,j+1 σ 2 k+1,j+1. The Jacobian of the transformation of the split is given by π k+1,1:k+1,μ k+1,1:k+1,σk+1,1:k+1 2 π k,1:k,μ k,1:k,σ 2k,1:k,u 1,u 2,u 3 = π3 k,j 1 u 1 2 μ k,j σ 2 k,j. Trans-dimensional Markov Chain Monte Carlo Page 27

2.13 Split and Merge Moves It follows that the acceptance probability of the split move with j U {1,...,k} is min 1,Awhere A = π k +1,π k+1,1:k+1,μ k+1,1:k+1,σk+1,1:k+1 2 y 1:T π k, π k,1:k,μ k,1:k,σk,1:k 2 y 1:T m k+1,k/k s k,k+1 /k π3 k,j 1 u 1 2 μ k,j σ 2 k,j. You should think of the split move as a mixture of k split moves and you have k associated merge moves. Trans-dimensional Markov Chain Monte Carlo Page 28

2.14 Application to the Galaxy Dataset Velocity km/sc of galaxies in the Corona Borealis Region Trans-dimensional Markov Chain Monte Carlo Page 29

2.14 Application to the Galaxy Dataset We set k max = 20 and we select rather informative priors following Green & Richardson 1999. In practice, it is worth using a hierarchical prior. We run the algorithm for over 1,000,000 iterations. We set additional constraints on the mean μ k,1 <μ k,2 <... < μ k,k. The cumulative averages stabilize very quickly. Trans-dimensional Markov Chain Monte Carlo Page 30

2.14 Application to the Galaxy Dataset Histogram of k 0.00 0.10 0.20 0.30 0 5 10 15 20 k Estimation of the marginal posterior distribution p k y 1:T. Trans-dimensional Markov Chain Monte Carlo Page 31

2.14 Application to the Galaxy Dataset Galaxy dataset 0.0 0.5 1.0 1.5 2.0 1.0 1.5 2.0 2.5 3.0 3.5 Estimation of E [f y k, θ k y 1:T ] 2 Trans-dimensional Markov Chain Monte Carlo Page 31

2.15 Summary Trans-dimensional MCMC allows us to implement numerically problems with Bayesian model uncertainty. Practical implementation is relatively easy, theory behind not so easy... Designing efficient trans-dimensional MCMC algorithms is still a research problem. Trans-dimensional Markov Chain Monte Carlo Page 31