Introduction to Hamiltonian Monte Carlo Method

Similar documents
17 : Optimization and Monte Carlo Methods

Hamiltonian Monte Carlo

Gradient-based Monte Carlo sampling methods

1 Geometry of high dimensional probability distributions

19 : Slice Sampling and HMC

Hamiltonian Monte Carlo with Fewer Momentum Reversals

Hamiltonian Monte Carlo

Markov Chain Monte Carlo (MCMC)

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Paul Karapanagiotidis ECO4060

Hamiltonian Monte Carlo for Scalable Deep Learning

Boltzmann-Gibbs Preserving Langevin Integrators

Probabilistic Graphical Models

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Computer Practical: Metropolis-Hastings-based MCMC

GSHMC: An efficient Markov chain Monte Carlo sampling method. Sebastian Reich in collaboration with Elena Akhmatskaya (Fujitsu Laboratories Europe)

Monte Carlo in Bayesian Statistics

arxiv: v4 [stat.co] 4 May 2016

Riemann Manifold Methods in Bayesian Statistics

Adaptive Monte Carlo methods

arxiv: v1 [stat.me] 6 Apr 2013

Tutorial on Probabilistic Programming with PyMC3

17 : Markov Chain Monte Carlo

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Closed-form Gibbs Sampling for Graphical Models with Algebraic constraints. Hadi Mohasel Afshar Scott Sanner Christfried Webers

Hamiltonian Monte Carlo Without Detailed Balance

Symplectic integration. Yichao Jing

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Nonlinear Single-Particle Dynamics in High Energy Accelerators

Kernel adaptive Sequential Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

Chaotic motion. Phys 750 Lecture 9

Legendre Transforms, Calculus of Varations, and Mechanics Principles

Dimension-Independent likelihood-informed (DILI) MCMC

Hamiltonian Dynamics

Quasi-Newton Methods for Markov Chain Monte Carlo

Physics 5153 Classical Mechanics. Canonical Transformations-1

Estimating Accuracy in Classical Molecular Simulation

A Review of Pseudo-Marginal Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Kernel Adaptive Metropolis-Hastings

REVIEW. Hamilton s principle. based on FW-18. Variational statement of mechanics: (for conservative forces) action Equivalent to Newton s laws!

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

EULER-LAGRANGE TO HAMILTON. The goal of these notes is to give one way of getting from the Euler-Lagrange equations to Hamilton s equations.

Sampling versus optimization in very high dimensional parameter spaces

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Homework 4. Goldstein 9.7. Part (a) Theoretical Dynamics October 01, 2010 (1) P i = F 1. Q i. p i = F 1 (3) q i (5) P i (6)

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

Understanding Automatic Differentiation to Improve Performance

Pattern Recognition and Machine Learning

Treecodes for Cosmology Thomas Quinn University of Washington N-Body Shop

Chaotic motion. Phys 420/580 Lecture 10

Hierarchical Bayesian Modeling

CSC 2541: Bayesian Methods for Machine Learning

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Hamiltonian flow in phase space and Liouville s theorem (Lecture 5)

Large Scale Bayesian Inference

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Leap Frog Solar System

Bayesian Phylogenetics:

Forward Problems and their Inverse Solutions

Notes on pseudo-marginal methods, variational Bayes and ABC

The Geometry of Euler s equation. Introduction

Calibrating Environmental Engineering Models and Uncertainty Analysis

Canonical transformations (Lecture 4)

Sampling. Very simple. high dimensional. Summary : Monte Carlo Methods. normalizing. problems. Importance. very general. good proposal qkl.

Comparison of Stochastic Volatility Models Using Integrated Information Criteria

Manifold Monte Carlo Methods

Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling

Physics 235 Chapter 7. Chapter 7 Hamilton's Principle - Lagrangian and Hamiltonian Dynamics

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

9.3: Separable Equations

Monte Carlo methods for sampling-based Stochastic Optimization

Adaptive HMC via the Infinite Exponential Family

Random Walks A&T and F&S 3.1.2

Likelihood-free MCMC

CE 530 Molecular Simulation

MCMC algorithms for fitting Bayesian models

Stochastic Variational Integrators

Bayesian Sampling Using Stochastic Gradient Thermostats

Twisted geodesic flows and symplectic topology

Kernel Sequential Monte Carlo

arxiv: v1 [stat.co] 2 Nov 2017

Package elhmc. R topics documented: July 4, Type Package

An introduction to adaptive MCMC

ICES REPORT March Tan Bui-Thanh And Mark Andrew Girolami

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Parallel MCMC with Generalized Elliptical Slice Sampling

Mix & Match Hamiltonian Monte Carlo

Physics 452 Lecture 33: A Particle in an E&M Field

Physics 106b: Lecture 7 25 January, 2018

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Session 3A: Markov chain Monte Carlo (MCMC)

Lecture 11 : Overview

Markov Chain Monte Carlo, Numerical Integration

Roll-back Hamiltonian Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Transcription:

Introduction to Hamiltonian Monte Carlo Method Mingwei Tang Department of Statistics University of Washington mingwt@uw.edu November 14, 2017 1

Hamiltonian System Notation: q R d : position vector, p R d : momentum vector Hamiltonian H(p, q): R 2d R 1 Evolution equation for Hamilton system dq dt = H p dp dt = H q (1) 2

Potential and Kinetic Decompose the Hamiltonian H(p, q) = U(q) + K(p). U(q): potential energy depend on position K(p): Kinetic energy depend on momentum Motivating example: Free fall U(q) = mgq K(p) = 1 2 mv 2 = p2 2m H(p, q) = mgq + p2 is the total energy 2m Velocity: v = dq dt = H p = p/m Force F = dp dt = H q = mg 3

Properties of Hamiltonian system 1. Reversibility: The mapping T s: (q(t), p(t)) (q(t + s), p(t + s)) is one-to-one Has inverse T s : negate p, apply T s. negate p again 2. Conserved (Hamiltonian invariant) dh dt = dq H dt q + dp H dt p = H H p q H H q p = 0 H(p, q) is constant over time t. 3. Volume preservation: The map T s preserves the volume ( ) Tδ For small δ, Jacobian det 1 (p, q) 4

Idea of HMC D : Observed data, q : parameters (latent variables), π(q) prior distribution Likelihood function L(D q) Posterior distribution Pr(q D) L(D q)π(q) Position parameters, potential U(q) log-posterior U(q) = log [L(D q)π(q)] Introduce ancillary variable p for Kinetic energy d pi 2 K(p) = log (N (0, M)) 2m i i=1 p, q are independent Hamiltonian: H(p, q) = U(q) + K(p) 5

Idea of HMC: Cont Now we defined U(q) and K(p). Relate that to a distribution Canonical distribution Pr(p, q) = 1 Z exp ( H(p, q)/t ) = 1 exp ( U(q)/T ) exp ( K(p)/T ) (2) Z where T : temperature, Z normalizing constant Ususally set T = 1, Pr(q, p) Posterior distribution Mulitivaranit Guassian Goal: sample (p, q) jointly from canonical distribution 6

Ideal HMC Specify variance matrix M, time s > 0 For i = 1,..., N 1. Sample p (i) from N (0, M) 2. Starting with current (p (i), q (i 1) ), integral on Hamiltonian system for s period: (p, q ) T s((p (i), q (i 1) )) (leaves H(, ) invariant) 3. q (i) q, p (i) p Output q (1),..., q (N) as posterior samples Problem: The Hamiltonian system may not have a closed-form solution Need numerical method to for ODE system 7

Numerical ODE integrator Targeting problem: dq dt = H p = M 1 p dp dt = H = log (L(D q)π(q)) q Leap-frog method, for small time ɛ > 0 p(t + ɛ/2) = p(t) (ɛ/2) U q (q(t)) q(t + ɛ) = q(t) + ɛm 1 p(t + ɛ/2) p(t + ɛ) = p(t + ɛ/2) (ɛ/2) U (q(t + ɛ/2)) q 8

Numerical stability for Hamiltonian system 9

Property of Leap frog Time reversibility: Integrate n steps forward and then n steps backward, arrive at same starting position. Symplectic property: Converse the (slightly modified) energy 10

Idea HMC review Specify variance M, time s > 0 For i = 1,..., N 1. Sample p (i) from N (0, M) 2. Starting with current (p (i), q (i 1) ), integral on Hamiltonian system for s period: (p, q ) T s((p (i), q (i 1) )) 3. q (i) q, p (i) p Output q (1),..., q (N) as posterior samples Numerical method does not leave H(p, q) unchanged during integration H((p, q )) H((p (i), q (i 1) )) Need to adjust that 11

HMC in practice Specify variance matrix M, step size ɛ > 0, L : number of the leap frog steps For i = 1,..., N 1. Sample p (i) from N (0, M) 2. Starting with current (p (i), q (i 1) ), p p (p, q ) Leapfrog(p (i), q (i 1), ɛ, L) 3. Metropolis-Hastings with probability { Pr(p, q } ) α = min 1, Pr(p (i), q (i 1) ) set q (i) q, p (i) p (leaves canonical distribution invariant) Output q (1),..., q (N) as posterior samples 12

Comparison with random walk Metropolis-Hastings HMC: proposal based on Hamiltonian dynamics, not random walk Random walk Metropolis-Hastings (RWMH) need more steps to get a independent sample Optimum acceptance: HMC (65%), RWMH (23%) Computation d: Number of iterations to get a independent sample: HMC: O(d 1/4 ) vs RWMH: O(d) Total number of computations O(d 5/4 ) vs RWMH: O(d 2 ) See (Roberts et al. 2001) and (Neal 2011) for more details 13

Tuning parameters Stepsize ɛ: Large ɛ: Low acceptance rate Small ɛ: Waste computation, random walk behavior (ɛl) too small might need different ɛ for different region, eg. choose ɛ by random Number of leap-frog steps L: Trajectory length is crucial for exploring state space systematically More constrained in some directions, but much less constrained in other directions U-turns in long-trajectory 14

NUTS Solution: No-U-Turn Sampler (NUTS) (Hoffman et al. 2014) Adaptive way to select number of leap-frog step L Adaptive way to select step size ɛ The exact algorithm behind Stan! 15

NUTS: Select L Criterion for U-turns d q t q 0 2 = (q t q 0) T p t < 0 (3) dt 2 Start from (p (i), q (i 1) ) 1. Run leap-frog steps until (3) happens. Have candidate set B of (p, q) pairs 2. Select subset C B satisfies detail balanced equation 3. Random select q (i) from C 16

Selecting stepsize ɛ Warm-up phase M adapt H t be the acceptance probability at t-th iterations e.g { } Pr(p, q ) H t = min 1, Pr(p (t), q (t 1) ) h t(ɛ) = E t[h t ɛ] one step Dual averaging in each iteration for solving h t(ɛ) = δ where δ is the optimum acceptance rate, for HMC δ = 0.65 Find ɛ after M adapt iterations 17

Summary HMC: A MCMC algorithm make use of Hamiltonian dynamics Parameters as position, posterior likelihood as potential energy Propose new state based on Hamiltonian dynamics Leap-frog for numerical simulation, sensitive for tuning NUTS: A HMC with adaptive tuning on (L, ɛ) for more efficient proposal L: Avoid U-turns ɛ: Dual-averaging optimization to make the acceptance rate close to optimum 18

Implement your Own HMC Review the Hamiltonian dynamics dq dt = H p = M 1 p dp dt = H = log (L(D q)π(q)) q Need gradient log (L(D q)π(q)) = log (L(D q)) + log(π(q)) Stan: automatic gradient calculation Gradient: Stan can do gradient-based optimization (quasi-newton method L-BFGS) 19