Computer Practical: Metropolis-Hastings-based MCMC

Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19

Markov Chain Monte Carlo (MCMC) Non-sequential Bayesian methods for parameter estimation use all available data in one batch. MCMC methods, most often employing a variation of the Metropolis-Hastings (MH) algorithm, are used to explore the posterior density π(θ D T ), D T = {b 1, b 2,..., b T }, which requires the forward map θ {b 1, b 2,..., b T }. Success of MCMC depends crucially on how effective the proposal distribution (MH kernel) is at producing good mixing and independent samples. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 2 / 19

Adaptive MH-based MCMC Adaptive MH algorithms 1 : MH kernel is adjusted as the algorithm proceeds to better account for the size and shape of the target distribution. Adaptive Proposal (AP) Adaptive Metropolis (AM) Delayed Rejection (DR) Delayed Rejection Adaptive Metropolis (DRAM) 1 C. Andrieu and J. Thoms (2008) A tutorial on adaptive MCMC. Statistics and Computing, 18 (4), pp. 343-373. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 3 / 19

Adaptive Proposal Initial tuning of the MH proposal can take a long time! Adaptive proposal 2 (AP): MH proposal updated as chain progresses using sample covariance calculated from a fixed number of previous points, thereby locally adapting the MCMC process to the target distribution. Assuming at time k a sample {x 0, x 1,..., x k M+1,..., x k 1, x k }, x j R d, j = 0, 1,..., k of at least M points has accumulated in the chain. 2 H. Haario, E. Saksman, and J. Tamminen (1999) Adaptive proposal distribution for random walk Metropolis algorithm. Comput. Statist., 14, pp. 375-395. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 4 / 19

Adaptive Proposal The proposal distribution q k for drawing the next proposed step y is chosen by q k (y x 0, x 1,..., x k ) N (x k, c 2 d R k) where R k R d d is the sample covariance matrix determined by the M points x k M+1,..., x k and c d is a scaling factor 3 depending only on the dimension d. The covariance is updated every η steps (update frequency). 3 A. G. Gelman, G. O. Roberts and W. R. Gilks (1996) Efficient Metropolis jumping rules, Bayesian Statistics V, pp. 599-608 (eds J.M. Bernardo, et al.). Oxford University Press. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 5 / 19

Adaptive Metropolis Adaptive Metropolis 4 (AM): Extension of AP in which the proposal is continuously adapted to the target distribution, using the full information x 0, x 1,..., x k M+1,..., x k 1, x k accumulated up to that point. While AP updates the covariance of the Gaussian proposal locally using only a fixed number M of previous states, AM does so globally using the entire chain. One advantage: Starts adapting right away, ensuring that the search is more efficient in the early stages of simulation. 4 H. Haario, E. Saksman and J. Tamminen, 2001. An adaptive Metropolis algorithm Bernoulli 7, pp. 223-242. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 6 / 19

Delayed Rejection Delayed Rejection 5 (DR): Reduces the number of outright rejected proposals by allowing for more than one candidate step per iteration Inserts a delaying process into the MH framework, allowing for a successive chain of candidate steps (stages) to be considered within the same iteration When a step is rejected, a new step can be proposed with an adjusted proposal and acceptance probability based on the current step and the first rejected step Note: Computation time for one iteration of DR may be considerably longer than one iteration of standard MH due to the multiple candidates at each iteration! 5 A. Mira, 2001. On Metropolis-Hastings algorithm with delayed rejection Metron, Vol. LIX, 3-4, pp. 231-241. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 7 / 19

DRAM A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 8 / 19

DRAM DRAM 6 : Combines DR and AM at the benefit of both AM enhances the success of DR in situations when good proposals are not available DR speeds up the exploration of the target density when AM has a slow start One combination strategy: Nest AM within an r-stage DR 1 Adapt the first-stage proposal of DR as in AM, i.e., compute the first-stage covariance R 1 using all previous sample points in the chain via the AM recursion formula. 2 Compute the covariance R i of the ith stage, i = 2,..., r, as a scaled version of the first-stage covariance, i.e., R i = γ i R 1 for some scaling factor γ i. 6 H. Haario, M. Laine, A. Mira and E. Saksman, 2006. DRAM: Efficient adaptive MCMC, Statistics and Computing 16, pp. 339-354. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 9 / 19

Codes to Download! 1 Download (and unzip) the.zip-file mcmcstat.zip from http://helios.fmi.fi/ lainema/mcmc/ 2 Download our practical-specific files from http://rtg.math.ncsu.edu/workshop/workshop-program/ 3 Place these files in the same folder on your computer! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 10 / 19

Example: Banana-shaped Distribution Banana distribution 7 : 2D Gaussian distribution with unit variances and covariances ρ = 0.9, twisted so that Gaussian coordinates x 1 and x 2 become ˆx 1 = ax 1 ˆx 2 = x 2 a b(ˆx2 1 + a 2 ) under transformation, where parameters a and b define the bananity of the target distribution. 7 Example 1 on http://helios.fmi.fi/ lainema/dram/ A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 11 / 19

Example: Banana-shaped Distribution 0-1 -2-3 -4-5 -6-7 a = 1 b = 1-8 -9-2 -1.5-1 -0.5 0 0.5 1 1.5 2 Goal: Use MH-based MCMC algorithms to explore distribution! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 12 / 19

Example: Banana-shaped Distribution MH: 62.1% < c50, 98.6% < c95 AM: 42.1% < c50, 97.9% < c95 0 0-2 -2 x 2-4 x 2-4 -6-6 -8-8 -10-2 0 2 x 1-10 -2 0 2 x 1 0-2 DR: 53.6% < c50, 97.6% < c95 0-2 DRAM: 55.7% < c50, 98.5% < c95 x 2-4 x 2-4 -6-8 -6-8 -10-2 0 2 x 1-10 -2 0 2 x 1 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 13 / 19

Example: SIR Model Recall the SIR model given by ds dt di dt dr dt = δn δs γkis = γkis (r + δ)i = ri δr with initial conditions S(0) = S 0, I(0) = I 0, R(0) = R 0 and parameters θ = {γ, k, r, δ}. (as in Ralph Smith s UQ tutorial) Note that this parameter set is unidentifiable! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 14 / 19

Example: SIR Model 0.6 gamma 0.6 k gamma k 0.4 0.4 0.2 0.2 0 0.8 r 0 0.25 delta 0 0.5 r 0 0.5 1 delta 0.7 0.2 0.6 0.15 0.5 2000 4000 0.1 2000 4000 0.4 0.6 0.8 0 0.2 0.4 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 15 / 19

Example: SIR Model gamma 0.4 k 0.3 0.2 0.1 k 0.7 r 0.65 0.6 0.55 delta r 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0.2 A. Arnold / F. Hamilton (NCSU) 0.3 0.4 0.1 0.2 0.3 0.4 MH-based MCMC 0.55 0.6 0.65 0.7 July 30, 2016 16 / 19

Example: SIR Model Now modify so that ds dt di dt dr dt with parameters θ = {β, r, δ}. = δn δs βis = βis (r + δ)i = ri δr This parameter set is identifiable! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 17 / 19

Example: SIR Model 0.03 beta 0.8 r beta r 0.025 0.7 0.02 0.6 0.015 0.25 delta 0.5 0 0.02 0.04 delta 0.4 0.6 0.8 0.2 0.15 0.1 2000 4000 0.1 0.2 0.3 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 18 / 19

r Example: SIR Model 0.7 beta 0.65 0.6 0.55 r 0.2 0.18 delta 0.16 0.14 0.12 0.018 0.02 0.0220.0240.0260.028 0.55 0.6 0.65 0.7 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 19 / 19