Hamiltonian Monte Carlo for Scalable Deep Learning
|
|
- Eugenia Bryant
- 5 years ago
- Views:
Transcription
1 Hamiltonian Monte Carlo for Scalable Deep Learning Isaac Robson Department of Statistics and Operations Research, University of North Carolina at Chapel Hill BIOS 740 May 4, 2018
2 Preface Markov Chain Monte Carlo (MCMC) techniques are powerful algorithms for fitting probabilistic models Variations such as Gibbs samplers work well for some high-dimensional situations, but have issues scaling to today s challenges and model architectures Hamiltonian Monte Carlo (HMC) is a more proposal-efficient variant of MCMCs that is a promising catalyst for innovation in deep learning and probabilistic graphical models Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
3 Outline Review Metropolis-Hastings Introduction to Hamiltonian Monte Carlo (HMC) Brief review of neural networks and fitting methods Discussion of Stochastic Gradient HMC (T. Chen et al., 2014) Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
4 Introduction to Hamiltonian Monte Carlo Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
5 Review: Metropolis-Hastings 1/3 Metropolis et al., 1953, and Hastings, 1970 The original Metropolis et al. can be used to compute integrals of a distribution, e.g. the normalization for a Bayesian posterior J = f x P x dx = E P (f) Originally for statistical mechanics, more specifically, calculating potential of 2D spheres (particles) in a square with fast electronic computing machines Size N = 224 particles, time = 16 hours (on prevailing machines) Metropolis et al., 1953 Advantage is that it depends only the ratio of the P(x )/P(x) of the probability distribution evaluated at two points, x and x in some data D Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
6 Review: Metropolis-Hastings 2/3 We can use this ratio to accept or reject moving from randomly generated points x x with acceptance ratio P(x )/P(x) This allows us to sample by accumulating a running random-walk (Markov chain) list of correlated samples under a symmetric proposal scheme from the target distribution, which we can then estimate Hastings extended this to permit (but not require) an asymmetric proposal scheme, which speeds the process and improves mixing A(x x) = min[1, P x P x g x x g x x ]πr2 Regardless, we also accumulate a burn-in period of bad initial samples we have to ignore (this slows convergence, as do correlated samples) Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
7 Review: Metropolis-Hastings 3/3 We have to remember that Metropolis-Hastings has a few restrictions A Markov chain won t converge to a target distribution P(x) unless it converges to a stationary distribution π x = P x If π x is not unique, we can also get multiple answers! (This is bad) So we require the equality P x x)p x = P x x )P(x ), e.g. reversibility Additionally, the proposal is symmetric when g x x g x x = 1, e.g. Gaussian These are called random-walk algorithms When P(x ) P x, they move to the high-density region with certainty, else with acceptance ratio P(x )/P x Note a proposal with higher variance typically yields a lower acceptance ratio Finally, remember Gibbs sampling, useful for certain high-dimensional situations, is a special case of Metropolis-Hastings using proposals conditioned on values of other dimensions Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
8 Hamiltonian Monte Carlo 1/5 Duane et al., 1987, Neal, 2012, and Betancourt, 2017 Duane proposed Hybrid Monte Carlo to more efficiently computer integrals in lattice field theory Hybrid was due to the fact that it infused Hamiltonian equations of motion to generate a candidate point x instead of just RNG As Neal describes, this allows us to push the candidate points further out with momentum because the dynamics Are reversible (necessary for convergence to unique target distribution) Preserve the Hamiltonian (so we can still use momentum) Preserve volume (which makes acceptance probabilities solvable) Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
9 Hamiltonian Monte Carlo 2/5 A Hamiltonian is an energy function of form H q, p = U q + K(p) Position Momentum Potential Energy Kinetic Energy Hamilton s equations govern the change of this system over time dq i dt = H p i = [M 1 p] i, dp i dt = H q i = U q i J H = 0 dxd I dxd I dxd 0 dxd We can set U q = log q + C and K(p) = p T M 1 p/2 where C is constant and M is a PSD mass matrix that determines our momentum (kinetic energy) Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
10 Hamiltonian Monte Carlo 3/5 In a Bayesian setting, we set q to be the prior, π(q), times the likelihood given data D, L q D) for our potential energy U q = log[π q L q D) ] If we choose a Gaussian proposal (Metropolis), we set kinetic energy K p = d i=1 p i 2 2m i We then generate q, p via Hamiltonian dynamics and use the difference in energy levels as our acceptance ratio in the MH algorithm A q, p q, p) = min 1, exp H q, p + H q, p Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
11 Hamiltonian Monte Carlo 4/5 Converting proposal and acceptance steps to this energy form is convoluted, however, we can now use Hamiltonian dynamics to walk farther without sacrificing acceptance ratio Classic method of solving Hamiltonian s differential equations is Euler s Method, which traverses a small distance ε > 0 for L steps p i t + ε = p i t ε U q i [q t ], q i t + ε = q i t + ε p i t m i We can also employ the more efficient leapfrog technique to quickly propose a candidate p i t + ε/2 = p i t (ε/2) U q i [q t ], q i t + ε = q i t + ε p i t + ε/2 m i p i t + ε = p i t + ε/2 (ε/2) U q i [q t + ε ] Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
12 Hamiltonian Monte Carlo 5/5 The HMC algorithm adds two steps to MH: Sample p the momentum parameter (typically symmetric, Gaussian) Compute L steps of size ε to find a new q, p Betancourt explains we can sample a momentum to easily change energy levels, then we use Hamiltonian dynamics to traverse our q-space (state space) We no longer have to wait for random walk to slowly explore as we can easily find samples well-distributed across our posterior with high acceptance ratios (same energy levels) Graphic from Betancourt, 2017 However, as Chen et al, 2014 describes, we do still have to compute the gradient of our potential at every step, which can be costly, especially in high dimensions Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
13 Neural Networks Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
14 Neural Networks 1/4 [Artificial] neural networks (or nets) are popular connectionist models for learning a function approximation Derivative of Hebbian learning after Hebb s neuropsychology work in the 1940s Popularized today thanks to parallelization and convex optimization Universal function approximator (in theory) Typically use function composition and the chain rule alongside vectorization to efficiently optimize a loss function by altering the weights (function) of each node Requires immense computational power, especially when the functions being composed are probabilistic (such as in a Bayesian Neural Net (BNN)) Feedforward Neural Net, Wikimedia Commons Fitting neural nets is an active area of research, with contributions from the perspectives of both optimization and sampling Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
15 Neural Networks 2/4 LeCun et al., 1998, Robbins and Monro, 1951 As Lecun et al., details, backward propagation of errors (backprop) is a powerful method for neural net optimization (these do not use sampling) a layer of weights W, at a time t, given error matrix E W t+1 = W t η E W Note this is merely gradient descent, which in recent years has been upgraded with many bells and whistles One such whistle is stochastic gradient descent (SGD), an algorithm that evolved following the stochastic approximation methods introduced by Robbins and Monro, 1951 (GO TAR HEELS!) Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
16 Neural Networks 3/4 Calculating a gradient is costly, but as LeCun et al. details, stochastic gradient descent is much faster, and comes in both online and the smoother minibatch variations The primary idea is to update using only the error at one point, E* W t+1 = W t η E W The error at one point is an estimate of the error for the entire vector of current weights W t, hence the name stochastic gradient descent The speedup is feasible due to shared information across observations and the fact that by decreasing η, the learning rate, SGD still converges, including for minibatch variants of SGD, which just computes gradients for a handful of points instead of a single one Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
17 Neural Networks 4/4 Rumelhart et al., 1986 A popular bell to complement SGD s whistle is the addition of a momentum term to the update step. We more or less smooth our update steps with an exponential decay factor α W t+1 = W t η E W + α Wt W t = W t+1 W t This may seem familiar if you recall the momentum term that exists in Hamiltonian Monte Carlo (cue imaginary dramatic sound effect) Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
18 Stochastic Gradient HMC (T. Chen et al., 2014) Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
19 Stochastic Gradient HMC 1/4 As mentioned before, backprop uses the powerful stochastic gradient descent method and extensions to fit gradient-based neural networks Unfortunately, many of these neural nets lack inferentiability One solution (other proving P = NP or than solving AGI) is to use Bayesian Neural Networks, which exist as a class of probabilistic graphical models, and can be fitted with sampling or similar methods BNNs still perform many of the surreal feats that other neural nets accomplish However, even with Gibbs samplers and HMC, sampling in high dimensions is quite slow to converge for now... Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
20 Stochastic Gradient HMC 2/4 Welling et al., 2011, T. Chen et al., 2014, and C. Chen et al., 2016 In HMC, Instead of calculating our the gradient of our potential energy, U q, for all of the dataset D, what if we selected some minibatch D D to use for our estimate in the leapfrogging method? U U + N 0, Σ for noise Σ p i t + ε/2 = p i t (ε/2) U q i [q t ] Unfortunately, this naïve stochastic gradient HMC (SGHMC) injects noise into the Hamilton equations, which requires materially decreasing acceptance ratio in the MH algorithm to inefficient levels Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
21 Stochastic Gradient HMC 3/4 T. Chen et al. suggests fixing naïve SGHMC by adding a friction term, as proposed by Welling et al., borrowing once again from physics in the form of Langevin dynamics (vectorized form, omitting leapfrog notation) U = U + N 0, 2B B = ε 2 Σ q t+1 = q t + M 1 p t p t+1 = p t ε U q t+1 BM 1 p t Note that B is a PSD function of q t+1, but Chen also shows certain constant choices of B converge (and are far more practical) Welling et al. also laments that Bayesian methods have been leftbehind in recent machine learning advances due to [MCMC] requiring computations over the whole dataset Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
22 Stochastic Gradient HMC 4/4 The end result of SGHMC is an efficient sampling algorithm that also permits computing gradients on a minibatch in a Bayesian setting T. Chen et al then elaborate to show that under deterministic settings, SGHMC performs analogously to SGD with momentum, as the momentum components are related C. Chen et al. (BEAT DOOK!) further elaborates that many Bayesian MCMC sampling algorithms are analogs of stochastic optimization algorithms, which suggests that a symbiotic discovery and extensions of the two is possible, as presented in the Stochastic AnNealing Thermostats with Adaptive momentum (Santa) that incorporates recent advances from both domains Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
23 Conclusions HMC is a promising variant of MCMC sampling algorithms with applications in Bayesian models SGHMC offers more scalability in deep learning and several other settings, with the added benefit of inferentiability in Bayesian neural nets. Future work and collaborations between the sampling and optimization communities is promising Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
24 Bibliography (by date) Robbins, H. and Monro, S. (1951). A stochastic approximation method. The annals of mathematical statistics, pages Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of State Calculations by Fast Computing Machines. The journal of chemical physics Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika Rumelhart, David E., Hinton, Geoffrey E., and Williams, Ronald J (1986). Learning representations by backpropagating errors. Nature, 323: Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B Y. A. LeCun, L. Bottou, G. B. Orr, and K.R. Müller (1998). Efficient backprop. In Neural networks: Tricks of the trade, pages Springer, 1998b. Welling, M. and Teh, Y.W. (2011). Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML), pp Neal, R. M. (2012). MCMC using Hamiltonian dynamics. ArXiv e-prints, arxiv: Chen, T., Fox, E. B., and Guestrin, C. (2014). Stochastic gradient hamiltonian monte carlo. arxiv preprint arxiv: v2. Chen, C. Carlson, D. Gan, Z. Li, C. and Carin, L. (2015). Bridging the gap between stochastic gradient MCMC and stochastic optimization. arxiv: Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arxiv preprint arxiv: Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, /24
25 Isaac Robson (UNC) HMC for Scalable Deep Learning May 4, 2018
17 : Optimization and Monte Carlo Methods
10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationBayesian Sampling Using Stochastic Gradient Thermostats
Bayesian Sampling Using Stochastic Gradient Thermostats Nan Ding Google Inc. dingnan@google.com Youhan Fang Purdue University yfang@cs.purdue.edu Ryan Babbush Google Inc. babbush@google.com Changyou Chen
More informationBridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization
Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin May 2, 2016 1 Changyou Chen Bridging the Gap between Stochastic
More informationApproximate inference in Energy-Based Models
CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationIntroduction to Stochastic Gradient Markov Chain Monte Carlo Methods
Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods Changyou Chen Department of Electrical and Computer Engineering, Duke University cc448@duke.edu Duke-Tsinghua Machine Learning Summer
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More information1 Geometry of high dimensional probability distributions
Hamiltonian Monte Carlo October 20, 2018 Debdeep Pati References: Neal, Radford M. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2.11 (2011): 2. Betancourt, Michael. A conceptual
More informationGaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.
In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationMarkov chain Monte Carlo methods for visual tracking
Markov chain Monte Carlo methods for visual tracking Ray Luo rluo@cory.eecs.berkeley.edu Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720
More informationThe Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationGradient-based Monte Carlo sampling methods
Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationBayesian Sampling Using Stochastic Gradient Thermostats
Bayesian Sampling Using Stochastic Gradient Thermostats Nan Ding Google Inc. dingnan@google.com Youhan Fang Purdue University yfang@cs.purdue.edu Ryan Babbush Google Inc. babbush@google.com Changyou Chen
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationIntroduction to Hamiltonian Monte Carlo Method
Introduction to Hamiltonian Monte Carlo Method Mingwei Tang Department of Statistics University of Washington mingwt@uw.edu November 14, 2017 1 Hamiltonian System Notation: q R d : position vector, p R
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationarxiv: v1 [stat.ml] 4 Dec 2018
1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 6 Parallel-tempered Stochastic Gradient Hamiltonian Monte Carlo for Approximate Multimodal Posterior Sampling arxiv:1812.01181v1 [stat.ml]
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationarxiv: v1 [stat.co] 2 Nov 2017
Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University
More informationMarkov Chain Monte Carlo (MCMC)
School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationApproximate Slice Sampling for Bayesian Posterior Inference
Approximate Slice Sampling for Bayesian Posterior Inference Anonymous Author 1 Anonymous Author 2 Anonymous Author 3 Unknown Institution 1 Unknown Institution 2 Unknown Institution 3 Abstract In this paper,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationOnline but Accurate Inference for Latent Variable Models with Local Gibbs Sampling
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling Christophe Dupuy INRIA - Technicolor christophe.dupuy@inria.fr Francis Bach INRIA - ENS francis.bach@inria.fr Abstract
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More informationConnections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN
Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationBayesian model selection in graphs by using BDgraph package
Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationAusterity in MCMC Land: Cutting the Metropolis-Hastings Budget
Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget Anoop Korattikara, Yutian Chen and Max Welling,2 Department of Computer Science, University of California, Irvine 2 Informatics Institute,
More informationDeep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation
Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationLecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016
Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationStochastic Gradient Hamiltonian Monte Carlo
Tianqi Chen Emily B. Fox Carlos Guestrin MODE Lab, University of Washington, Seattle, WA. TQCHEN@CS.WASHINGTON.EDU EBFOX@STAT.WASHINGTON.EDU GUESTRIN@CS.WASHINGTON.EDU Abstract Hamiltonian Monte Carlo
More informationCondensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.
Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationHamiltonian Monte Carlo with Fewer Momentum Reversals
Hamiltonian Monte Carlo with ewer Momentum Reversals Jascha Sohl-Dickstein December 6, 2 Hamiltonian dynamics with partial momentum refreshment, in the style of Horowitz, Phys. ett. B, 99, explore the
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationDistributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server
Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server in collaboration with: Minjie Xu, Balaji Lakshminarayanan, Leonard Hasenclever, Thibaut Lienart, Stefan Webb,
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationThermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling
Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling Rui Luo, Yaodong Yang, Jun Wang, and Yuanyuan Liu Department of Computer Science, University College London
More informationMarkov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationApproximate Slice Sampling for Bayesian Posterior Inference
Approximate Slice Sampling for Bayesian Posterior Inference Christopher DuBois GraphLab, Inc. Anoop Korattikara Dept. of Computer Science UC Irvine Max Welling Informatics Institute University of Amsterdam
More informationCredit Assignment: Beyond Backpropagation
Credit Assignment: Beyond Backpropagation Yoshua Bengio 11 December 2016 AutoDiff NIPS 2016 Workshop oo b s res P IT g, M e n i arn nlin Le ain o p ee em : D will r G PLU ters p cha k t, u o is Deep Learning
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationResults: MCMC Dancers, q=10, n=500
Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time
More informationJ. Sadeghi E. Patelli M. de Angelis
J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of
More informationComputer Practical: Metropolis-Hastings-based MCMC
Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov
More informationBayesian networks: approximate inference
Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationA NEW VIEW OF ICA. G.E. Hinton, M. Welling, Y.W. Teh. S. K. Osindero
( ( A NEW VIEW OF ICA G.E. Hinton, M. Welling, Y.W. Teh Department of Computer Science University of Toronto 0 Kings College Road, Toronto Canada M5S 3G4 S. K. Osindero Gatsby Computational Neuroscience
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationHopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296
Hopfield Networks and Boltzmann Machines Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks A Hopfield network is a neural network with a graph G = (U,C) that satisfies
More informationTutorial on Probabilistic Programming with PyMC3
185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course
More informationAdvanced computational methods X Selected Topics: SGD
Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationUnsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation
Cognitive Science 30 (2006) 725 731 Copyright 2006 Cognitive Science Society, Inc. All rights reserved. Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation Geoffrey Hinton,
More informationarxiv: v1 [stat.ml] 6 Mar 2015
HAMILTONIAN ABC Hamiltonian ABC arxiv:153.1916v1 [stat.ml] 6 Mar 15 Edward Meeds Robert Leenders Max Welling Informatics Institute University of Amsterdam Amsterdam, Netherlands Abstract TMEEDS@GMAIL.COM
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationSimulated Annealing for Constrained Global Optimization
Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationAdaptive Rejection Sampling with fixed number of nodes
Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, Brazil. Abstract The adaptive rejection sampling
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationMCMC notes by Mark Holder
MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationTheory of Stochastic Processes 8. Markov chain Monte Carlo
Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html
More information