Computer Practical: Metropolis-Hastings-based MCMC

Similar documents
Some Results on the Ergodicity of Adaptive MCMC Algorithms

ST 740: Markov Chain Monte Carlo

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Markov chain Monte Carlo methods in atmospheric remote sensing

Advanced uncertainty evaluation of climate models by Monte Carlo methods

Markov Chain Monte Carlo methods

Computational statistics

AEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE

Kernel adaptive Sequential Monte Carlo

Kernel Sequential Monte Carlo

Markov Chain Monte Carlo, Numerical Integration

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Reminder of some Markov Chain properties:

Kobe University Repository : Kernel

MCMC algorithms for fitting Bayesian models

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Bayesian Methods for Machine Learning

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Parameter Selection, Model Calibration, and Uncertainty Propagation for Physical Models

Metropolis-Hastings Algorithm

Markov Chain Monte Carlo (MCMC)

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Computer intensive statistical methods

17 : Markov Chain Monte Carlo

An introduction to adaptive MCMC

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

Markov Chain Monte Carlo

F denotes cumulative density. denotes probability density function; (.)

Approximate Bayesian Computation: a simulation based approach to inference

Examples of Adaptive MCMC

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Examples of Adaptive MCMC

Monte Carlo Methods. Leon Gu CSD, CMU

TUNING OF MARKOV CHAIN MONTE CARLO ALGORITHMS USING COPULAS

MCMC algorithms for Subset Simulation

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Adaptive Independence Samplers

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Session 3A: Markov chain Monte Carlo (MCMC)

The Metropolis-Hastings Algorithm. June 8, 2012

Likelihood-free MCMC

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Monte Carlo in Bayesian Statistics

Optimizing and Adapting the Metropolis Algorithm

Bayesian Inference. Chapter 1. Introduction and basic concepts

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

eqr094: Hierarchical MCMC for Bayesian System Reliability

Advanced Statistical Modelling

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Monte Carlo methods for sampling-based Stochastic Optimization

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Markov chain Monte Carlo

An introduction to Sequential Monte Carlo

Precision Engineering

Adaptive Monte Carlo methods

A note on slice sampling

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection

Delayed Rejection Algorithm to Estimate Bayesian Social Networks

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

16 : Approximate Inference: Markov Chain Monte Carlo

Adaptively scaling the Metropolis algorithm using expected squared jumped distance

Lecture 6: Markov Chain Monte Carlo

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Hamiltonian Monte Carlo

Bayesian Gaussian Process Regression

Kernel Adaptive Metropolis-Hastings

Metropolis-Hastings sampling

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Surveying the Characteristics of Population Monte Carlo

Adaptive HMC via the Infinite Exponential Family

An Adaptive Sequential Monte Carlo Sampler

Introduction to Hamiltonian Monte Carlo Method

arxiv: v1 [stat.co] 23 Apr 2018

Adaptive Metropolis with Online Relabeling

CSC 2541: Bayesian Methods for Machine Learning

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MARKOV CHAIN MONTE CARLO

Gradient-based Monte Carlo sampling methods

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Markov Chain Monte Carlo in Practice

Learning the hyper-parameters. Luca Martino

Bayesian room-acoustic modal analysis

CPSC 540: Machine Learning

MCMC and Gibbs Sampling. Kayhan Batmanghelich

A Repelling-Attracting Metropolis Algorithm for Multimodality

Principles of Bayesian Inference

Markov chain Monte Carlo Lecture 9

Introduction to Bayesian methods in inverse problems

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

University of Toronto Department of Statistics

Control Variates for Markov Chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

The Recycling Gibbs Sampler for Efficient Learning

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Brief introduction to Markov Chain Monte Carlo

Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods

Transcription:

Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19

Markov Chain Monte Carlo (MCMC) Non-sequential Bayesian methods for parameter estimation use all available data in one batch. MCMC methods, most often employing a variation of the Metropolis-Hastings (MH) algorithm, are used to explore the posterior density π(θ D T ), D T = {b 1, b 2,..., b T }, which requires the forward map θ {b 1, b 2,..., b T }. Success of MCMC depends crucially on how effective the proposal distribution (MH kernel) is at producing good mixing and independent samples. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 2 / 19

Adaptive MH-based MCMC Adaptive MH algorithms 1 : MH kernel is adjusted as the algorithm proceeds to better account for the size and shape of the target distribution. Adaptive Proposal (AP) Adaptive Metropolis (AM) Delayed Rejection (DR) Delayed Rejection Adaptive Metropolis (DRAM) 1 C. Andrieu and J. Thoms (2008) A tutorial on adaptive MCMC. Statistics and Computing, 18 (4), pp. 343-373. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 3 / 19

Adaptive Proposal Initial tuning of the MH proposal can take a long time! Adaptive proposal 2 (AP): MH proposal updated as chain progresses using sample covariance calculated from a fixed number of previous points, thereby locally adapting the MCMC process to the target distribution. Assuming at time k a sample {x 0, x 1,..., x k M+1,..., x k 1, x k }, x j R d, j = 0, 1,..., k of at least M points has accumulated in the chain. 2 H. Haario, E. Saksman, and J. Tamminen (1999) Adaptive proposal distribution for random walk Metropolis algorithm. Comput. Statist., 14, pp. 375-395. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 4 / 19

Adaptive Proposal The proposal distribution q k for drawing the next proposed step y is chosen by q k (y x 0, x 1,..., x k ) N (x k, c 2 d R k) where R k R d d is the sample covariance matrix determined by the M points x k M+1,..., x k and c d is a scaling factor 3 depending only on the dimension d. The covariance is updated every η steps (update frequency). 3 A. G. Gelman, G. O. Roberts and W. R. Gilks (1996) Efficient Metropolis jumping rules, Bayesian Statistics V, pp. 599-608 (eds J.M. Bernardo, et al.). Oxford University Press. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 5 / 19

Adaptive Metropolis Adaptive Metropolis 4 (AM): Extension of AP in which the proposal is continuously adapted to the target distribution, using the full information x 0, x 1,..., x k M+1,..., x k 1, x k accumulated up to that point. While AP updates the covariance of the Gaussian proposal locally using only a fixed number M of previous states, AM does so globally using the entire chain. One advantage: Starts adapting right away, ensuring that the search is more efficient in the early stages of simulation. 4 H. Haario, E. Saksman and J. Tamminen, 2001. An adaptive Metropolis algorithm Bernoulli 7, pp. 223-242. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 6 / 19

Delayed Rejection Delayed Rejection 5 (DR): Reduces the number of outright rejected proposals by allowing for more than one candidate step per iteration Inserts a delaying process into the MH framework, allowing for a successive chain of candidate steps (stages) to be considered within the same iteration When a step is rejected, a new step can be proposed with an adjusted proposal and acceptance probability based on the current step and the first rejected step Note: Computation time for one iteration of DR may be considerably longer than one iteration of standard MH due to the multiple candidates at each iteration! 5 A. Mira, 2001. On Metropolis-Hastings algorithm with delayed rejection Metron, Vol. LIX, 3-4, pp. 231-241. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 7 / 19

DRAM A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 8 / 19

DRAM DRAM 6 : Combines DR and AM at the benefit of both AM enhances the success of DR in situations when good proposals are not available DR speeds up the exploration of the target density when AM has a slow start One combination strategy: Nest AM within an r-stage DR 1 Adapt the first-stage proposal of DR as in AM, i.e., compute the first-stage covariance R 1 using all previous sample points in the chain via the AM recursion formula. 2 Compute the covariance R i of the ith stage, i = 2,..., r, as a scaled version of the first-stage covariance, i.e., R i = γ i R 1 for some scaling factor γ i. 6 H. Haario, M. Laine, A. Mira and E. Saksman, 2006. DRAM: Efficient adaptive MCMC, Statistics and Computing 16, pp. 339-354. A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 9 / 19

Codes to Download! 1 Download (and unzip) the.zip-file mcmcstat.zip from http://helios.fmi.fi/ lainema/mcmc/ 2 Download our practical-specific files from http://rtg.math.ncsu.edu/workshop/workshop-program/ 3 Place these files in the same folder on your computer! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 10 / 19

Example: Banana-shaped Distribution Banana distribution 7 : 2D Gaussian distribution with unit variances and covariances ρ = 0.9, twisted so that Gaussian coordinates x 1 and x 2 become ˆx 1 = ax 1 ˆx 2 = x 2 a b(ˆx2 1 + a 2 ) under transformation, where parameters a and b define the bananity of the target distribution. 7 Example 1 on http://helios.fmi.fi/ lainema/dram/ A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 11 / 19

Example: Banana-shaped Distribution 0-1 -2-3 -4-5 -6-7 a = 1 b = 1-8 -9-2 -1.5-1 -0.5 0 0.5 1 1.5 2 Goal: Use MH-based MCMC algorithms to explore distribution! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 12 / 19

Example: Banana-shaped Distribution MH: 62.1% < c50, 98.6% < c95 AM: 42.1% < c50, 97.9% < c95 0 0-2 -2 x 2-4 x 2-4 -6-6 -8-8 -10-2 0 2 x 1-10 -2 0 2 x 1 0-2 DR: 53.6% < c50, 97.6% < c95 0-2 DRAM: 55.7% < c50, 98.5% < c95 x 2-4 x 2-4 -6-8 -6-8 -10-2 0 2 x 1-10 -2 0 2 x 1 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 13 / 19

Example: SIR Model Recall the SIR model given by ds dt di dt dr dt = δn δs γkis = γkis (r + δ)i = ri δr with initial conditions S(0) = S 0, I(0) = I 0, R(0) = R 0 and parameters θ = {γ, k, r, δ}. (as in Ralph Smith s UQ tutorial) Note that this parameter set is unidentifiable! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 14 / 19

Example: SIR Model 0.6 gamma 0.6 k gamma k 0.4 0.4 0.2 0.2 0 0.8 r 0 0.25 delta 0 0.5 r 0 0.5 1 delta 0.7 0.2 0.6 0.15 0.5 2000 4000 0.1 2000 4000 0.4 0.6 0.8 0 0.2 0.4 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 15 / 19

Example: SIR Model gamma 0.4 k 0.3 0.2 0.1 k 0.7 r 0.65 0.6 0.55 delta r 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0.2 A. Arnold / F. Hamilton (NCSU) 0.3 0.4 0.1 0.2 0.3 0.4 MH-based MCMC 0.55 0.6 0.65 0.7 July 30, 2016 16 / 19

Example: SIR Model Now modify so that ds dt di dt dr dt with parameters θ = {β, r, δ}. = δn δs βis = βis (r + δ)i = ri δr This parameter set is identifiable! A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 17 / 19

Example: SIR Model 0.03 beta 0.8 r beta r 0.025 0.7 0.02 0.6 0.015 0.25 delta 0.5 0 0.02 0.04 delta 0.4 0.6 0.8 0.2 0.15 0.1 2000 4000 0.1 0.2 0.3 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 18 / 19

r Example: SIR Model 0.7 beta 0.65 0.6 0.55 r 0.2 0.18 delta 0.16 0.14 0.12 0.018 0.02 0.0220.0240.0260.028 0.55 0.6 0.65 0.7 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 19 / 19