Parallel Tempering I

Similar documents
MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo

Computational statistics

17 : Markov Chain Monte Carlo

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Lecture 4

Reminder of some Markov Chain properties:

Tutorial on ABC Algorithms

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

The Ising model and Markov chain Monte Carlo

Markov chain Monte Carlo

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Sampling Algorithms for Probabilistic Graphical models

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

On Markov Chain Monte Carlo

Bayesian Methods for Machine Learning

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Monte Carlo methods for sampling-based Stochastic Optimization

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

A = {(x, u) : 0 u f(x)},

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

MCMC algorithms for fitting Bayesian models

Bayesian Computation in Color-Magnitude Diagrams

Markov Chain Monte Carlo (MCMC)

Approximate Bayesian Computation: a simulation based approach to inference

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

Answers and expectations

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Markov Chains: Basic Theory Definition 1. A (discrete time/discrete state space) Markov chain (MC) is a sequence of random quantities {X k }, each tak

Quantitative Biology II Lecture 4: Variational Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

16 : Approximate Inference: Markov Chain Monte Carlo

Markov chain Monte Carlo Lecture 9

CS281A/Stat241A Lecture 22

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Likelihood Inference for Lattice Spatial Processes

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Monte Carlo integration

Quantifying Uncertainty

On the flexibility of Metropolis-Hastings acceptance probabilities in auxiliary variable proposal generation

Markov Chains and MCMC

Computer intensive statistical methods

Ch5. Markov Chain Monte Carlo

Bayes Nets: Sampling

Session 5B: A worked example EGARCH model

Distributed Evolutionary Monte Carlo with Applications to Bayesian Analysis

Markov chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Advanced Statistical Modelling

Advanced Sampling Algorithms

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Learning Bayesian Networks for Biomedical Data

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

6 Markov Chain Monte Carlo (MCMC)

Bayesian Linear Models

MCMC: Markov Chain Monte Carlo

Paul Karapanagiotidis ECO4060

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Sampling Methods (11/30/04)

MINIMUM ENERGY DESIGNS: EXTENSIONS, ALGORITHMS, AND APPLICATIONS

CSC 2541: Bayesian Methods for Machine Learning

Markov chain Monte Carlo methods in atmospheric remote sensing

Machine Learning for Data Science (CS4786) Lecture 24

Asymptotics and Simulation of Heavy-Tailed Processes

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

The Metropolis-Hastings Algorithm. June 8, 2012

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Introduction to Machine Learning CMU-10701

Chapter 12 PAWL-Forced Simulated Tempering

Brief introduction to Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

An introduction to Bayesian statistics and model calibration and a host of related topics

Lecture 8: The Metropolis-Hastings Algorithm

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

CPSC 540: Machine Learning

Monte Carlo Methods. Geoff Gordon February 9, 2006

CPSC 540: Machine Learning

Simulation - Lectures - Part III Markov chain Monte Carlo

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Learning the hyper-parameters. Luca Martino

Markov chain Monte Carlo

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Results: MCMC Dancers, q=10, n=500

arxiv: v1 [stat.co] 23 Apr 2018

LECTURE 15 Markov chain Monte Carlo

Transcription:

Parallel Tempering I this is a fancy (M)etropolis-(H)astings algorithm it is also called (M)etropolis (C)oupled MCMC i.e. MCMCMC! (as the name suggests,) it consists of running multiple MH chains in parallel invented by Charles J. Geyer [1, Geyer, 1991] March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 1

Parallel Tempering II want samples from the target density: g(z ), z R d let H(z ) = log(g(z )), then we have, g(z ) = exp{ H(z )/1.0} H( ), the negative of the log density is called the fitness function in general, one might be interested in sampling from: g(z ) exp{ H(z )/τ min }, z R d assuming exp{ H(z )/τ min } dz < note H(ũ) H(ṽ) g(ũ) g(ṽ), so, low fitness values corresponds to good or high-probability samples March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 2

Parallel Tempering III consider a temperature ladder (just a decreasing sequence of positive numbers): t 1 > t 2 > > t N > 0.0, where t N = τ min extend the sample space: x := (x 1;...;x i;...;x N) R Nd terminology: population or state of the chain: (x 1, t 1 ;... ; x i, t i ;...;x N, t N ) i th chromosome: x i modified target density: N f(x ) f i (x i) i=1 f i (x i) exp{ H(x i)/t i }, i = 1, 2,...,N f N ( ) = g( ) where and note because t N = τ min March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 3

Parallel Tempering III (P)arallel (T)empering consists two types of moves: MH update (local move) apply MH updates to the individual chains at the different temperature levels or to the chromosomes also called the Mutation move Exchange update (global move) propose to swap the states of the chains at the two neighboring temperature levels or two neighboring chromosomes also called the Random Exchange move March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 4

Mutation I 1. choose i ɛ {1,...,N} using some distribution p(i = i x ), could be random or deterministic 2. for simple (R)andom (W)alk (M)etropolis, propose ỹ i = x i + ε i, where ε i is suitably chosen from a symmetric mean zero proposal distribution: T i (x i, ) note we could choose T i (x i, ) Normal d (x i,, σ 2 i I d; ) the σ 2 values may need some tweaking after observing the level-specific acceptance rates of the mutation move one can also do block or coordinate wise Gibbs or use a general MH on x i here 3. accept (ỹ, t ) = (x 1, t 1 ;...;ỹ i, t i ;...;x N, t N ) with probability α m = min(1, r m ) where, r m = f i(ỹ i ) f i (x i) p(i = i ỹ) p(i = i x ) March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 5

Mutation II computation of r m : here we are using a mixture of updaters note only change between x and ỹ is in the i-th chromosome: x i has changed to ỹ i so T(x, ỹ) = p(i = i x ) T i (x i, ỹ i ) and T(ỹ, x ) = p(i = i ỹ) T i (ỹ i, x i) hence we have: r m = f(ỹ)t(ỹ, x ) f(x )T(x, ỹ) = f i(ỹ i )p(i = i ỹ)t i (ỹ i, x i) f i (x i)p(i = i x )T i (x i, ỹ i ) = f i(ỹ i ) f i (x i) p(i = i ỹ) p(i = i x ), here T i(, ) s cancel in because it s a RWM in general, it may not March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 6

Mutation III at higher temperature levels, Mutation moves are easily accepted because the distribution is flat and thus hotter chains travel around the sample space a lot at lower temperature levels, Mutation moves are rarely accepted because the distribution is very spiky and hence any colder chains tend to get stuck around a mode thus Mutation does local exploration for lower temperatures and since the lowest temperature is the temperature of interest only doing Mutation doesn t help, one needs to consider the next move but the sticky nature of the Mutation move at lower temperature is a plus point as well, this tends to foster finer local exploration March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 7

Random Exchange I 1. select i ɛ {1,...,N} with p(i 1 = i x ) = 1 N also select j i s.t. p(i 2 = 2 x, I 1 = 1) = 1, p(i 2 = N 1 x, I 1 = N) = 1 and for i 1, N, p(i 2 = i ± 1 x, I 1 = i) = 0.5 2. propose to exchange x i and x j 3. accept (ỹ, t ) = (x 1, t 1 ;...;x j, t i ;... ; x i, t j ;...;x N, t N ), with probability α re = min(1, r re ) where, r re = f i(x j)f j (x i) f i (x i)f j (x j) what are T(x, ỹ), T(ỹ, x ) here? = exp[(h(x j) H(x i)) (1/t j 1/t i )] March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 8

Random Exchange II for i > j and H(x j) H(x i) implies r re 1, because 1/t j 1/t i so, good samples are brought down the ladder and in the process, the bad guys are pushed up so this move probabilistically transports good samples down and bad up the ladder this can cause jumps between two widely separate modes, thus this move has a global nature March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 9

Parallel Tempering Algorithm I (0) i (0), i = 1, 2,...,N} giving x we initialize the population to {x we take a suitably chosen temperature ladder {t i, i = 1, 2,...,N}, for a concrete recipe see [2, Goswami et. al.] choose a moves mixture probability vector (q, 1 q), q (0, 1) Algorithm 0.1 (PT: one iteration). 1. with probability q apply the Mutation move N-times on the population 2. with probability 1 q apply the Random Exchange move N-times on the resultant population thus we get draws: x (0) x (1) (2) x (t) x (t) samples of interest: upon convergence we look at N, t = 1, 2,...m} out {x of {x (t), t = 1, 2,...m} March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 10

Parallel Tempering III is doing all this extra work worth the effort: PT isn t stuck in one mode like MH! Figure 1: MH-PT Comparison March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 11

Parallel Tempering III is doing all this extra work worth the effort: PT yields less auto-correlation than MH (BTW, these plots are not enough, one need to look at AIAT, this is what others do, we shouldn t) Figure 2: MH-PT Comparison March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 12

Parallel Tempering III PT is a computationally expensive method and so it is generally used for harder problems where simple MH cannot possibly jump between modes in finite amount of time the draws produced by MH are very highly correlated intuitively why does PT work? the Mutation move at higher temperatures, helps to cover the whole space and at lower temperatures fosters finer local exploration the Exchange move does the transportation job facilitating global exploration: good ones go down bad guys go up March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 13

References [1] C. J. Geyer. Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proc. 23rd Symp. Interface, pages 156 163, 1991. [2] Gopika R. Goswami and Jun S. Liu. On real-parameter evolutionary monte carlo algorithm. Statistics and Computing, 2006. (just accepted). March 21, 2006 c 2006 - Gopi Goswami (goswami@stat.harvard.edu) Page 14