1 Geometry of high dimensional probability distributions
|
|
- Della Ryan
- 5 years ago
- Views:
Transcription
1 Hamiltonian Monte Carlo October 20, 2018 Debdeep Pati References: Neal, Radford M. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2.11 (2011): 2. Betancourt, Michael. A conceptual introduction to Hamiltonian Monte Carlo. arxiv preprint arxiv: (2017). 1 Geometry of high dimensional probability distributions The neighborhood immediately around the mode features large densities, but in more than a few dimensions the small volume of that neighborhood prevents it from having much contribution to any expectation. On the other hand, the complimentary neighborhood far away from the mode features a much larger volume, but the vanishing densities lead to similarly negligible contributions expectations. The only significant contributions come from the neighborhood between these two extremes known as the typical set (Figure 1). Importantly, because probability densities and volumes transform oppositely under any reparameterization, the typical set is an invariant object that does not depend on the irrelevant details of any particular choice of parameters. Figure 1: A typical set As the dimension of parameter space increases, the tension between the density and the volume grows and the regions where the density and volume are both large enough to yield a significant contribution becomes more and more narrow. Consequently the typical set becomes more singular with increasing dimension, a manifestation of concentration of measure. The immediate consequence of concentration of measure is that the only significant contributions to any expectation 1
2 come from the typical set; evaluating the integrand outside of the typical set has negligible effect on expectations and hence is a waste of precious computational resources. In other words, we can accurately estimate expectations by averaging over the typical set instead of the entirety of parameter space. Consequently, in order to compute expectations efficiently, we have to be able to identify, and then focus our computational resources into, the typical set. 2 Returning to MCMC again Given a Markov transition that targets the desired distribution, Markov chain Monte Carlo defines a generic strategy for quantifying the typical set. Constructing such a transition, however, is itself a nontrivial problem. Fortunately there are various procedures for automatically constructing appropriate transitions for any given target distribution, with the foremost amongst these the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970). The Metropolis-Hastings algorithm is comprised of two steps: a proposal and a correction. The proposal is any stochastic perturbation of the initial state while the correction rejects any proposals that stray too far away from the typical set of the target distribution. More formally, let q(x; x ) be the probability density defining each proposal. The probability of accepting a given proposal is then given by { α(x; x ) = min 1, q(x ; x)π(x } ) q(x; x )Π(x) The original Markov chain Monte Carlo algorithm, and one still commonly in use today, utilizes a Gaussian distribution as its proposal mechanism, Q(x; x ) = N(x ; x, Σ), an algorithm to which we will refer to as Random Walk Metropolis. Because the proposal mechanism is symmetric under the exchange of the initial and proposed points, the proposal density cancels and { } α(x; x ) = min 1, Π(x ). Π(x) Random Walk Metropolis is not only simple to implement, it also has a particularly nice intuition. The proposal distribution is biased towards large volumes, and hence the tails of the target distribution, while the Metropolis correction rejects those proposals that jump into neighborhoods where the density is too small. The combined procedure then preferentially selects out those proposals that fall into neighborhoods of high probability mass, concentrating towards the typical set as desired. Because of its conceptual simplicity and the ease in which it can be implemented by practitioners, Random Walk Metropolis is still popular in many applications. Unfortunately, that seductive simplicity hides a performance that scales poorly with increasing dimension and complexity of the target distribution. As the dimension of the target distribution increases, the volume exterior to the typical set overwhelms the volume interior to the typical set, and almost every Random Walk Metropolis proposal will produce a point on the outside of the typical set, towards the tails. The density of these points, however, is so small, that the acceptance probability becomes negligible. In this case almost all of the proposals will be rejected and the resulting Markov chain will only rarely move. We can induce 2
3 a larger acceptance probability by shrinking the size of the proposal to stay within the typical set but those small jumps will move the Markov chain extremely slowly. It thus makes sense to seek ways of accelerating (a) the convergence of a given MCMC algorithm to its stationary distribution, (b) the convergence of a given MCMC estimate to its expectation, and/or (c) the exploration of a given MCMC algorithm of the support of the target distribution. Those goals are related but still distinct. For instance, a chain initialised by simulating from the target distribution may still fail to explore the whole support in an acceptable number of iterations. While there is not an optimal and universal solution to this issue, we will discuss below approaches that are as generic as possible, as opposed to artificial ones taking advantage of the mathematical structure of a specific target distribution. Ideally, we aim at covering realistic situations when the target density is only known [up to a constant or an additional completion step] as the output of an existing computer code. Pragmatically, we also cover here solutions that require more efforts and calibration steps when they apply to a wide enough class of problems. 3 Hamiltonian Monte Carlo The guess-and-check strategy of Random Walk Metropolis is doomed to fail in highdimensional spaces where there are an exponential number of directions in which to guess but only a singular number of directions that stay within the typical set and pass the check. In order to make large jumps away from the initial point, and into new, unexplored regions of the typical set, we need to exploit information about the geometry of the typical set itself. Specifically, we need transitions that can follow those contours of high probability mass, coherently gliding through the typical set. How can we distill the geometry of the typical set into information about how to move through it? When the sample space is continuous, a natural way of encoding this direction information is with a vector field aligned with the typical set. A vector field is the assignment of a direction at every point in parameter space, and if those directions are aligned with the typical set then they act as a guide through this neighborhood of largest target probability. In other words, instead of fumbling around parameter space with random, uninformed jumps, we can follow the direction assigned to each at point for a small distance. By construction this will move us to a new point in the typical set, where we will find a new direction to follow. Continuing this process traces out a coherent trajectory through the typical set that efficiently moves us far away from the initial point to new, unexplored regions of the typical set as quickly as possible. From the point of view of this review, Hamiltonian (or hybrid) Monte Carlo (HMC) is an auxiliary variable technique that takes advantage of a continuous time Markov process to sample from the target π. This approach comes from physics (Duane et al., 1987) [Simon Duane in Imperial College London, Physics Review B ] and was popularized in statistics by Neal (1999, 2011) and MacKay (2002). Given a target π(θ), where θ R d an artificial auxiliary variable ν R d is introduced along with a density ω(ν θ) so that the joint distribution of (θ, ν) enjoys π(θ) as its marginal. While there is complete freedom in this representation, the HMC literature often calls ν the momentum of a particle located at θ by analogy with physics. Based on the representation of the joint distribution p(θ, ν) = π(θ)ω(ν θ) exp{ H(θ, ν)} where H( ) is called the Hamiltonian. Hamiltonian Monte Carlo is associated with the continuous 3
4 time process (θ t, ν t ) generated by the so-called Hamiltonian equations: dθ t dt = H ν (θ t, ν t ), which keeps the Hamiltonian target stable over time as dν t dt = H θ (θ t, ν t ) dh(θ t, ν t ) dt = H ν (θ t, ν t ) dν t dt + H θ (θ t, ν t ) dθ t dt = 0 Obviously, the above continuous time Markov process is deterministic and only explores a given level set, {(θ, ν) : H(θ, ν) = H(θ 0, ν 0 )}, instead of the whole augmented state space R 2d which induces an issue with irreducibility. An acceptable solution to this problem is to refresh the momentum, ν t (ν θ t ), at random times {τ n }, where θ t denotes denotes the location of θ immediately prior to time t, and the random durations {τ n τ n 1 } follow an exponential distribution. By construction, continuous-time Hamiltonian Markov chain can be regarded as a specific piecewise deterministic Markov process using Hamiltonian dynamics (Davis, 1984, 1993; Bou-Rabee et al., 2017) and our target, π is the marginal of its associated invariant distribution. Before moving to the practical implementation of the concept, let us point out that the free cog in the machinery is the conditional density (ν θ), which is usually chosen as a Gaussian density with either a constant covariance matrix M corresponding to the target covariance or as a local curvature depending on θ in Riemannian Hamiltonian Monte Carlo (Girolami and Calderhead, 2011). Betancourt (2017) argues in favour of these two cases against non-gaussian alternatives and Livingstone et al. (2017) analyse how different choices of kinetic energy in Hamiltonian Monte Carlo affect algorithm performances. For a fixed covariance matrix, the Hamiltonian equations become dθ t dt = M 1 ν t, dν t dt = tu(θ t ). where U(θ t ) = t log π(θ t ) is the score function. Henceforth for the ease of notations we shall denote ν t by ν(t) and θ t by θ(t). In the special case when π(θ) = exp{ 1/2 θ 2 } and ω(ν θ) = exp{ 1 2 ν M 1 ν} where M is a diagonal matrix, it is possible to solve the equations as θ j (t) = r j cos(a j + t) and ν j (t) = r j mj sin(a j + t). 4 Properties of Hamiltonian dynamics First, Hamiltonian dynamics is reversible - the mapping T s from the state at time t, (θ(t), ν(t)) to the state at time t + s (θ(t + s), ν(t + s)) is one-one and hence has an inverse T s. The inverse mapping is obtained by simply negating the time derivative in the Hamiltonian equations. The dynamics of course leads to the conservation of the Hamiltonian. For Metropolis updates using a proposal found by Hamiltonian dynamics, which form part of the HMC method, the acceptance probability is one if H is kept invariant. We will see later, however, that in practice we can only make H approximately invariant and hence we won t have acceptance probability of one. 4
5 A third fundamental property of Hamiltonian dynamics preserves volume in (θ, ν) space, a result known as the Liouville s Theorem. If we apply the mapping T s to the points in some region R of (θ, ν) space, with volume V, the image under T will also have volume V. The significance of the volume preservation for MCMC is that we needn t account for a Jacobian in the acceptance probability for Metropolis updates. The preservation of volume can be proved in several ways. One is to note that the divergence of the vector field defined by the Hamiltonian equations is zero, which can be readily seen as d j=1 [ dθ j θ j dt + ] dν j = 0. ν j dt Next, we will show that the Hamiltonian dynamics preserves volume without presuming this property of divergence. Consider the dimension to be 1. We can approximate T δ for δ near 0 as T δ (θ, ν) = Then the Jacobian can be written as [ Then B δ = [ θ ν 1 + δ 2 H θ ν δ 2 H θ 2 ] [ dθ + δ dt dν dt δ 2 H ν 2 1 δ 2 H ν θ ] + O(δ 2 ) ] + O(δ 2 ) det(b δ ) = 1 + δ 2 H θ ν δ 2 H ν θ + O(δ2 ) = 1 + O(δ 2 ) Since log(1+x) x for x near zero, log det(b δ ) is zero except perhaps for terms of order δ 2 (though we will see later that it is exactly zero). Now consider log det(b s ) for some time interval s that is not close to zero. Setting δ = s/n, for some integer n, we can write T s as the composition of T δ applied n times (from n points along the trajectory), so det(b s ) is the n-fold product of det(b δ ) evaluated at these points. We then find that log det(b s ) = n log det(b δ ) n/n 2 = 1/n i=1 Taking n to we get the result. 4.1 Numerically simulating the Hamiltonian dynamics In general, it is not possible to analytically solve Hamilton s equations as we did for the simple case above. Instead, it is common to discretize the simulation of the differential equations with some step size ɛs. We briefly discuss two options here: Euler s method (performs poorly) and the leapfrog method (performs better). Just assume that the conditional distribution of ν is independent of θ. Assume that H(θ, ν) = U(θ) + K(ν). 5
6 Euler s method: ν j (t + ɛ) = ν j (t) + ɛ dν j dt = ν j(t) ɛ du dθ j (θ(t)) θ j (t + ɛ) = θ j (t) + ɛ dθ j dt = θ j(t) + ɛ dk dν j (ν(t)) Unfortunately, Euler method performs poorly. The result often diverges, meaning that the approximation error grows causing the Hamiltonian to no longer be preserved. Instead, the leapfrog method is used in practice. Much better results can be obtained by slightly modifying Euler s method, as follows: ν t+ɛ = ν t ɛ U(θ t ) θ t+ɛ = θ t + ɛm 1 ν t+ɛ We simply use the new value for the momentum variables, ν t+ɛ, when computing the new value for the position variables, θ t+ɛ. The leapfrog method deals with this issue by only making a ɛ/2 step in ν first, using that to update θ, and then coming back to ν for the remaining update. It consists of the following updates: Markov chain and is wellsuited to the Hamiltonian equations in that it preserves the stationary distribution (Betancourt, 2017). It is called the symplectic integrator, and one version in the independent case with constant covariance consists in the following (so-called leapfrog) steps ν t+ɛ/2 = ν t ɛ U(θ t )/2 θ t+ɛ = θ t + ɛm 1 ν t+ɛ/2 ν t+ɛ = ν t+ɛ/2 + (ɛ/2) U(θ t+ɛ ) The leapfrog approach diverges far less quickly than Euler s method. Recall the similarity with approximating y(t + ɛ) = y(t) + y(t + ɛ) y(t) = t+ɛ t t+ɛ t y (s)ds ɛy (t). y (s)ds ɛy (t + ɛ/2) We now have the necessary tools to describe how to formulate a MCMC strategy using Hamiltonian dynamics. The first two steps can be combined to get θ t+ɛ = θ t + ɛ 2 M 1 U(θ t )/2 + ɛm 1 ν t which is similar to Langevin MC: Suppose we want to sample from π(θ) e U(θ). Then X t+1 = X t + ξ t U(X t ) + 2ξZ t+1 where ξ is the step size and Z t are iid N(0, 1) random variables will have π as the stationary distribution. If π is log-concave this X t has π as the target distribution. (Verify this when π e τ θ 2 /2 ). 6
7 5 Hamiltonian Monte Carlo algorithm Using Hamiltonian dynamics to sample from a distribution requires translating the density function for this distribution to a potential energy function and introducing momentum variables to go with the original variables of interest (now seen as position variables). We can then simulate a Markov chain in which each iteration resamples the momentum and then does a Metropolis update with a proposal found using Hamiltonian dynamics. We now have the background needed to present the Hamiltonian Monte Carlo (HMC) algorithm. HMC can be used to sample only from continuous distributions on R d for which the density function can be evaluated (perhaps up to an unknown normalizing constant). For the moment, we will also assume that the density is non-zero everywhere. We must also be able to compute the partial derivatives of the log of the density function. These derivatives must therefore exist, except perhaps on a set of points with probability zero, for which some arbitrary value could be returned. HMC samples from the canonical distribution for θ, ν, in which θ has the distribution of interest π(θ), as specified using the potential energy function U(θ). We can choose the distribution of the momentum variables, ν, which are independent of θ, as we wish, specifying the distribution via the kinetic energy function, K(ν). Current practice with HMC is to use a quadratic kinetic energy which leads ν to have a zero-mean multivariate Gaussian distribution. Most often, the components of ν are specified to be independent, with component i having variance m i The kinetic energy function producing this distribution (setting T = 1) is K(ν) = exp{ 0.5 j ν 2 j /m j } 5.1 The two steps of the HMC algorithm Each iteration of the HMC algorithm has two steps. The first changes only the momentum; the second may change both position and momentum. Both steps leave the canonical joint distribution of (θ, ν) invariant, and hence their combination also leaves this distribution invariant. In the first step, new values for the momentum variables are randomly drawn from their Gaussian distribution, independently of the current values of the position variables. For the kinetic energy, the d momentum variables are independent, with ν i having mean zero and variance m i. Since θ isn t changed, and ν is drawn from it s correct conditional distribution given θ (the same as its marginal distribution, due to independence), this step obviously leaves the canonical joint distribution invariant. In the second step, a Metropolis update is performed, using Hamiltonian dynamics to propose a new state. Starting with the current state, (θ, ν), Hamiltonian dynamics is simulated for L steps using the Leapfrog method (or some other reversible method that preserves volume), with a stepsize of ɛ. Here, L and are parameters of the algorithm, which need to be tuned to obtain good performance. The momentum variables at the end of this L-step trajectory are then negated, giving a proposed state (θ, ν ). This proposed state is accepted as the next state of the Markov chain with probability min { 1, exp{ H(θ, ν ) + H(θ, ν)} } If the proposed state is not accepted (i.e, it is rejected), the next state is the same as the current state (and is counted again when estimating the expectation of some function of state by its average over 7
8 states of the Markov chain). The negation of the momentum variables at the end of the trajectory makes the Metropolis proposal symmetrical, as needed for the acceptance probability above to be valid. This negation need not be done in practice, since K(ν) = K( ν), and the momentum will be replaced before it is used again, in the first step of the next iteration. (This assumes that these HMC updates are the only ones performed.) 8
17 : Optimization and Monte Carlo Methods
10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection
More informationGradient-based Monte Carlo sampling methods
Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationIntroduction to Hamiltonian Monte Carlo Method
Introduction to Hamiltonian Monte Carlo Method Mingwei Tang Department of Statistics University of Washington mingwt@uw.edu November 14, 2017 1 Hamiltonian System Notation: q R d : position vector, p R
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationHamiltonian Monte Carlo with Fewer Momentum Reversals
Hamiltonian Monte Carlo with ewer Momentum Reversals Jascha Sohl-Dickstein December 6, 2 Hamiltonian dynamics with partial momentum refreshment, in the style of Horowitz, Phys. ett. B, 99, explore the
More informationManifold Monte Carlo Methods
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationMarkov Chain Monte Carlo (MCMC)
School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due
More informationHamiltonian Monte Carlo
Chapter 7 Hamiltonian Monte Carlo As with the Metropolis Hastings algorithm, Hamiltonian (or hybrid) Monte Carlo (HMC) is an idea that has been knocking around in the physics literature since the 1980s
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationHamiltonian Monte Carlo for Scalable Deep Learning
Hamiltonian Monte Carlo for Scalable Deep Learning Isaac Robson Department of Statistics and Operations Research, University of North Carolina at Chapel Hill isrobson@email.unc.edu BIOS 740 May 4, 2018
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationPaul Karapanagiotidis ECO4060
Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationIntroduction to Stochastic Gradient Markov Chain Monte Carlo Methods
Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationMCMC and Gibbs Sampling. Sargur Srihari
MCMC and Gibbs Sampling Sargur srihari@cedar.buffalo.edu 1 Topics 1. Markov Chain Monte Carlo 2. Markov Chains 3. Gibbs Sampling 4. Basic Metropolis Algorithm 5. Metropolis-Hastings Algorithm 6. Slice
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationarxiv: v1 [stat.me] 6 Apr 2013
Generalizing the No-U-Turn Sampler to Riemannian Manifolds Michael Betancourt Applied Statistics Center, Columbia University, New York, NY 127, USA Hamiltonian Monte Carlo provides efficient Markov transitions
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationThe Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMonte Carlo Inference Methods
Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationDiagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo
Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo Michael Betancourt arxiv:1604.00695v1 [stat.me] 3 Apr 2016 Abstract. When properly tuned, Hamiltonian Monte Carlo scales to some
More informationSupplementary Note on Bayesian analysis
Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationDiscontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods
Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods arxiv:1705.08510v3 [stat.co] 7 Sep 2018 Akihiko Nishimura Department of Biomathematics, University of California
More informationGaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.
In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing
More informationMIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis
MIT 1985 1/30 Stan: a program for Bayesian data analysis with complex models Andrew Gelman, Bob Carpenter, and Matt Hoffman, Jiqiang Guo, Ben Goodrich, and Daniel Lee Department of Statistics, Columbia
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationarxiv: v1 [stat.co] 2 Nov 2017
Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationElliptical slice sampling
Iain Murray Ryan Prescott Adams David J.C. MacKay University of Toronto University of Toronto University of Cambridge Abstract Many probabilistic models introduce strong dependencies between variables
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationHamiltonian Monte Carlo
Hamiltonian Monte Carlo within Stan Daniel Lee Columbia University, Statistics Department bearlee@alum.mit.edu BayesComp mc-stan.org Why MCMC? Have data. Have a rich statistical model. No analytic solution.
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Lecture 23:! Nonlinear least squares!! Notes Modeling2015.pdf on course
More informationComputer intensive statistical methods
Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods
Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationApproximate Slice Sampling for Bayesian Posterior Inference
Approximate Slice Sampling for Bayesian Posterior Inference Anonymous Author 1 Anonymous Author 2 Anonymous Author 3 Unknown Institution 1 Unknown Institution 2 Unknown Institution 3 Abstract In this paper,
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationEco517 Fall 2013 C. Sims MCMC. October 8, 2013
Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationGSHMC: An efficient Markov chain Monte Carlo sampling method. Sebastian Reich in collaboration with Elena Akhmatskaya (Fujitsu Laboratories Europe)
GSHMC: An efficient Markov chain Monte Carlo sampling method Sebastian Reich in collaboration with Elena Akhmatskaya (Fujitsu Laboratories Europe) 1. Motivation In the first lecture, we started from a
More informationInformation-Geometric Markov Chain Monte Carlo Methods Using Diffusions
Entropy 2014, 16, 3074-3102; doi:10.3390/e16063074 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information-Geometric Markov Chain Monte Carlo Methods Using Diffusions Samuel
More informationApproximate inference in Energy-Based Models
CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationZig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationHamiltonian Monte Carlo Without Detailed Balance
Jascha Sohl-Dickstein Stanford University, Palo Alto. Khan Academy, Mountain View Mayur Mudigonda Redwood Institute for Theoretical Neuroscience, University of California at Berkeley Michael R. DeWeese
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationMonte Carlo Methods for Inference and Learning
Monte Carlo Methods for Inference and Learning Ryan Adams University of Toronto CIFAR NCAP Summer School 14 August 2010 http://www.cs.toronto.edu/~rpa Thanks to: Iain Murray, Marc Aurelio Ranzato Overview
More informationApproximate Bayesian Computation: a simulation based approach to inference
Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationLarge Scale Bayesian Inference
Large Scale Bayesian I in Cosmology Jens Jasche Garching, 11 September 2012 Introduction Cosmography 3D density and velocity fields Power-spectra, bi-spectra Dark Energy, Dark Matter, Gravity Cosmological
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More information