Hamiltonian Monte Carlo with Fewer Momentum Reversals

Similar documents
Hamiltonian Monte Carlo Without Detailed Balance

19 : Slice Sampling and HMC

Introduction to Hamiltonian Monte Carlo Method

1 Geometry of high dimensional probability distributions

Hamiltonian Monte Carlo

Gradient-based Monte Carlo sampling methods

Monte Carlo in Bayesian Statistics

Markov Chain Monte Carlo (MCMC)

Hamiltonian Monte Carlo

17 : Markov Chain Monte Carlo

Probabilistic Graphical Models

17 : Optimization and Monte Carlo Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Approximate inference in Energy-Based Models

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

Learning Energy-Based Models of High-Dimensional Data

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Hamiltonian Monte Carlo for Scalable Deep Learning

A = {(x, u) : 0 u f(x)},

GSHMC: An efficient Markov chain Monte Carlo sampling method. Sebastian Reich in collaboration with Elena Akhmatskaya (Fujitsu Laboratories Europe)

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Large Scale Bayesian Inference

We want to determine what the graph of the logarithmic function. y = log a. (x) looks like for values of a such that a > 1

Lecture 7 and 8: Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo

Sampling. Very simple. high dimensional. Summary : Monte Carlo Methods. normalizing. problems. Importance. very general. good proposal qkl.

DAG models and Markov Chain Monte Carlo methods a short overview

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

Discrete solid-on-solid models

arxiv: v4 [stat.co] 4 May 2016

ADAPTIVE TWO-STAGE INTEGRATORS FOR SAMPLING ALGORITHMS BASED ON HAMILTONIAN DYNAMICS

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Monte Carlo (MC) Simulation Methods. Elisa Fadda

Paul Karapanagiotidis ECO4060

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Polynomial Filtered Hybrid Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

MCMC and Gibbs Sampling. Sargur Srihari

Graphs of Increasing Logarithmic Functions

Kernel adaptive Sequential Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Object-based High Contrast Traveltime Tomography

Lecture 8: The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Bayes Nets: Sampling

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Computational statistics

Metropolis-Hastings Algorithm

Simulation of the two-dimensional square-lattice Lenz-Ising model in Python

Numerical Analysis of 2-D Ising Model. Ishita Agarwal Masters in Physics (University of Bonn) 17 th March 2011

CS 343: Artificial Intelligence

A Lattice Path Integral for Supersymmetric Quantum Mechanics

Advanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds)

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Perfect simulation algorithm of a trajectory under a Feynman-Kac law

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Monte Carlo methods for sampling-based Stochastic Optimization

Multiple time step Monte Carlo

Critical Dynamics of Two-Replica Cluster Algorithms

hep-lat/ Dec 93

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

Testing a Fourier Accelerated Hybrid Monte Carlo Algorithm

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

CPSC 540: Machine Learning

Markov Chains and MCMC

Notes on pseudo-marginal methods, variational Bayes and ABC

MCMC notes by Mark Holder

Molecular Dynamics and Accelerated Molecular Dynamics

Numerical Time Integration in Hybrid Monte Carlo Schemes for Lattice Quantum Chromo Dynamics

Bayesian Methods for Machine Learning

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Metropolis/Variational Monte Carlo. Microscopic Theories of Nuclear Structure, Dynamics and Electroweak Currents June 12-30, 2017, ECT*, Trento, Italy

Metropolis Monte Carlo simulation of the Ising Model

Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

Monte Carlo importance sampling and Markov chain

Statistical Estimation of the Parameters of a PDE

Stat 8501 Lecture Notes Spatial Lattice Processes Charles J. Geyer November 16, Introduction

The Recycling Gibbs Sampler for Efficient Learning

Markov Chain Monte Carlo (MCMC)

Lab 70 in TFFM08. Curie & Ising

Markov Chains and MCMC

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Adaptive HMC via the Infinite Exponential Family

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Bayesian Nonparametric Regression for Diabetes Deaths

The XY-Model. David-Alexander Robinson Sch th January 2012

Closed-form Gibbs Sampling for Graphical Models with Algebraic constraints. Hadi Mohasel Afshar Scott Sanner Christfried Webers

Simulation - Lectures - Part III Markov chain Monte Carlo

Convergence Rate of Markov Chains

Methods of Fermion Monte Carlo

Monte Carlo Methods for Inference and Learning

UNIVERSITY OF OSLO FACULTY OF MATHEMATICS AND NATURAL SCIENCES

Transcription:

Hamiltonian Monte Carlo with ewer Momentum Reversals Jascha Sohl-Dickstein December 6, 2 Hamiltonian dynamics with partial momentum refreshment, in the style of Horowitz, Phys. ett. B, 99, explore the state space more slowly than they otherwise would due to the momentum reversals which occur on proposal rejection. These cause trajectories to double back on themselves, leading to random walk behavior on timescales longer than the typical rejection time, and leading to slower mixing. Here, I present a technique by which the number of momentum reversals can be reduced. This is accomplished by maintaining the net exchange of probability between states with opposite momenta, but reducing the rate of exchange in both directions such that it is in one direction. An experiment illustrates these reduced momentum flips accelerating mixing for a particular distribution. ormalism A state ζ R N 2 consists of a position x R N and an auxiliary momentum v R N, ζ = {x, v}. The state space has an associated Hamiltonian and a joint probability distribution H ζ = E x + 2 vt v, p x, v = = exp H ζ, 2 Z where the normalization constant Z is the partition function. The momentum flip operator : R N 2 R N 2 negates the momentum. It has the properties: negates the momentum, ζ = {x, v} = {x, v} is its own inverse, =, ζ = ζ. is volume preserving, det ζ = ζ T doesn t change the probability of a state, = p ζ

ζ ζ ζ ζ ζ ζ a d igure : This diagram illustrates the possible transitions between states using the Markov transition operator from Equation 3. In a the relevant states, represented by the nodes, are labeled. In b the possible transitions, represented by the arrows, are labeled. In Section 2, the net probability flow into and out of the state ζ is set to. The leapfrog integrator n, ɛ : R N 2 R N 2 integrates Hamiltonian dynamics for the Hamiltonian H ζ, using leapfrog integration, for n Z + integration steps with stepsize ɛ R +. We assume that n and ɛ are constants, and write this operator simply as. The leapfrog integrator has the following relevant properties: is volume preserving, det ζ = ζ T is exactly reversible using momentum flips, =, ζ = ζ During sampling, state updates are performed using a Markov transition operator T r : R N 2 R N 2, where r U [, is drawn from the uniform distribution between and, ζ r < P lea T r ζ = ζ P leap r < P lea + P fli. 3 ζ P leap + P fli r T r additionally depends on an acceptance probability for the leapfrog dynamics, P lea [, ], and a probability of negating the momentum, P fli [, P lea]. These must be chosen to guarantee that is a fixed point of T. This fixed point requirement can be written as = dζ p ζ drδ ζ T r ζ. 2

2 Making the distribution of interest a fixed point In order to make a fixed point, we will choose the Markov dynamics T so that on average as many transitions enter as leave state ζ at equilibrium. This is not pairwise detailed balance - instead we are directly enforcing zero net change in the probability of each state by summing over all allowed transitions into or out of the state. This constraint is analogous to Kirchhoff s current law, where the total current entering a node is set to. As can be seen from Equation 3 and the definitions in Section, and as is illustrated in igure, a state ζ can only lose probability to the two states ζ and ζ, and gain probability from the two states ζ and ζ. Equating the rates of probability inflow and outflow, we find P lea + P fli = p ζ P leap ζ + p ζ P flip ζ = p ζ P leap ζ + P flip ζ 5 P fli P flip ζ = p ζ P leap ζ P lea. 6 We choose the standard Metropolis-Hastings acceptance rules for P lea, P lea = min, p ζ. 7 Substituting this in to Equation 6, we find P fli P flip ζ = p ζ min, p ζ p ζ, p ζ = min = min, p ζ min min, p ζ, p ζ min, p ζ Satisfying Equation we choose 2 the following form for P fli, P fli = max, min, 4 8 9. p ζ min, p ζ. Note that P fli P lea, where P lea is the rejection rate, and thus the momentum flip rate, in standard HMC. Using this form for P fli will generally reduce the number of momentum flips required. 2 To recover standard HMC, instead set P fli = P lea. One can verify by substitution that this satisfies Equation. 3

2.5.5.8.7.6.5.4.5.5 2 2 2.3.2. igure 2: A two dimensional image of the distribution used in Section 3. Pixel intensity corresponds to the probability density function at that location..8 Standard rejection ewer momentum reversals Autocovariance.6.4.2 5 5 2 25 Sampling steps igure 3: The covariance between samples as a function of the number of intervening sampling steps for HMC with standard rejection and rejection with fewer momentum reversals. Reducing the number of momentum reversals causes faster mixing, as evidenced by the faster falloff of the autocovariance. 4

3 Example In order to demonstrate the accelerated mixing provided by this technique, samples were drawn from a simple distribution with standard rejection, and with separate rejection and momentum flipping rates as described above. In both cases, the leapfrog step length ɛ was set to., the number of integration steps n was set to, and the momentum corruption rate β was set so as to corrupt half the momentum per unit stimulation time. Both samplers were run for, sampling steps. The distribution used was described by the energy function E = log x 2 2 + x2 2. 2 A 2 dimensional image of this distribution can be seen in igure 2. The autocovariance of the returned samples can be seen, as a function of the number of intervening sampling steps, in igure 3. Sampling using the technique presented here led to more rapid decay of the autocovariance, consistent with faster mixing. 5