Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16

Overview Motivation Metropolis-Hastings Hamiltonian Monte Carlo Riemann Manifold Hamiltonian Monte Carlo Example - Heart Dataset Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 2 / 16

Motivation Bayes Theorem f (θ x) = Posterior Likelihood Prior f (x θ)f (θ) f (x) f (θ x) f (x θ) f (θ) Marginal Likelihood c 1 = f (x) = Often analytically intractable How do we find the Posterior? f (x θ)f (θ)dθ Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 3 / 16

Markov Chain Monte Carlo (MCMC) MCMC general technique adopted in Bayesian Statistics Idea: Design a Markov chain with stationary distribution as the posterior Simulate values Perform inferences Most commonly used is Metropolis-Hastings: Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 4 / 16

Metropolis-Hastings Algorithm Choose a suitable proposal distribution q ( θ θ (t)) Choose an arbitrary starting point θ (0) such that f (θ (0) x) > 0 At time t, Sample a point from θ θ (t 1) q ( θ θ (t 1)) Calculate the acceptance probability ( α θ (t 1) θ ) { q(θ (t 1) θ )f (θ } x) = min 1, q(θ θ (t 1) )f (θ (t 1) x) Generate U Uniform(0, 1) α ( θ (t 1) θ ) U then accept proposed point θ (t) = θ Otherwise reject proposed point θ (t) = θ (t 1) Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 5 / 16

Question Is there a method that reduces correlation? Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 6 / 16

Hamiltonian Monte Carlo method Let f (θ x) f (x θ)f (θ) be the posterior distribution (θ R N ) Let f (p) N(0, M) be an independent auxiliary variable interpreted as a momentum variable and the covariance matrix M denotes a mass matrix. (p R N ) The joint density follows in factorised form as f (θ, p) = f (θ x)f (p) Define the Hamiltonian H(θ, p) = log (f (θ, p)) = log(f (θ x)) log(f (p)) The Hamiltonian Dynamics = L(θ) + 1 2 log(2π)n M + 1 2 pt M 1 p dθ dt = H p dp dt = H θ Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 7 / 16

Hamiltonian Monte Carlo method Algorithm Set up the system as defined in the previous slide Choose an arbitrary starting point θ (0) such that f (θ (0) x) > 0 At time t, Generate p N(0, M) (θ (0), p (0)) = (θ (t 1), p) Perform Leapfrog algorithm to numerically solve (θ, p ) until time T Calculate the acceptance probability ( α θ (t 1) θ ) { } = min 1, exp[h(θ(t 1), p)] exp[h(θ, p )] Generate U Uniform(0, 1) α ( θ (t 1) θ ) U then accept proposed point θ (t) = θ Otherwise reject proposed point θ (t) = θ (t 1) Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 8 / 16

Question Why are shortest distances on a 2D map non-linear when the shortest distance is a straight line? Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 9 / 16

Riemann Manifold Hamiltonian Monte Carlo method Let f (θ x) f (x θ)f (θ) be the posterior distribution (θ R N ) Let f (p θ) N(0, G(θ)) be an auxiliary variable in a Riemann manifold with metric tensor, G(θ) (often the Fisher Information Matrix) The joint density follows in factorised form as f (θ, p) = f (θ x)f (p θ) Define the Hamiltonian H(θ, p) = log (f (θ, p)) = log(f (θ x)) log(f (p θ)) The Hamiltonian Dynamics = L(θ) + 1 2 log(2π)n G(θ) + 1 2 pt G(θ) 1 p dθ dt = H p dp dt = H θ Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 10 / 16

Riemann Manifold Hamiltonian Monte Carlo method Algorithm Set up the system as defined in the previous slide Choose an arbitrary starting point θ (0) such that f (θ (0) x) > 0 At time t, Generate p N(0, G(θ (t 1) )) (θ (0), p (0)) = (θ (t 1), p) Perform Leapfrog algorithm to numerically solve (θ, p ) until time T Calculate the acceptance probability ( α θ (t 1) θ ) { } = min 1, exp[h(θ(t 1), p)] exp[h(θ, p )] Generate U Uniform(0, 1) α ( θ (t 1) θ ) U then accept proposed point θ (t) = θ Otherwise reject proposed point θ (t) = θ (t 1) Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 11 / 16

Effective Sample Size Effective Sample Size (ESS) The effective sample size gives an estimate of the equivalent number of independent observations that the chain represents. n 1 + 2 k=1 ρ k Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 12 / 16

Example - Heart Dataset Bayesian Logistic Regression model Wish to sample from the posterior of covariates Dataset contained 13 Covariates for 270 data points Chain was run 10 times for 10000 iterations 5000 considered burn-in ESS calculated Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 13 / 16

Example - Heart Dataset Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 14 / 16

Example - Heart Dataset Min ESS Median ESS Max ESS Metropolis-Hastings 418 637 905 Hamiltonian 3246 3647 4003 Riemann Manifold Hamiltonian 4862 5000 5000 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 15 / 16

References Girolami, Mark, and Ben Calderhead. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73.2 (2011): 123-214. Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 16 / 16

Any Questions Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 17 / 16