Deep Boltzmann Machines

Size: px
Start display at page:

Download "Deep Boltzmann Machines"

Transcription

1 Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

2 Overview 1 Introduction Representation of the model 2 Learning in Boltzmann Machines Variational Lower Bound - Mean Field Approximation Stochastic Approximation Procedure - Persistent Markov Chains 3 Additional Tricks for DBM Greedy Pretraining of the Model Discriminative Finetuning 4 Simulation results Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

3 Introduction Boltzmann Machine - Pairwise Markov Random Fields. Consider a set of random variables as latent i.e. hidden (h) and others as visible (v). The probability distribution for binary random variables is given by P θ (v, h) = 1 Z θ e E θ(v,h), θ = {L, J, W} E θ (v, h) = 1 2 vt Lv 1 2 ht Jh v T Wh, Figure: Model for Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

4 Representation While Boltzmann Machine is a powerful model over the data, it is computationally expensive to learn. So, one considers several approximations to Boltzmann machines. Figure: Boltzmann Machines vs RBM Deep Boltzmann Machine consider hidden nodes in several layers, with a layer being units that have no direct connections. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

5 Learning in Boltzmann Machines Model can be trained using Maximum Likelihood. The gradient of the likelihood takes the following form - ( ) ln(l θ (v)) = ln(p θ (v)) = ln p θ (v, h) h (1) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

6 Learning in Boltzmann Machines Model can be trained using Maximum Likelihood. The gradient of the likelihood takes the following form - ( ) ln(l θ (v)) = ln(p θ (v)) = ln p θ (v, h) = ln h h exp( E θ (v, h)) ln exp( E θ (v, h)) ; (1) v,h uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

7 Learning in Boltzmann Machines Model can be trained using Maximum Likelihood. The gradient of the likelihood takes the following form - ( ) ln(l θ (v)) = ln(p θ (v)) = ln p θ (v, h) = ln h h exp( E θ (v, h)) ln exp( E θ (v, h)) ; (1) v,h ln(l θ (v)) θ = p(h v) E θ(v, h) + p(v, h) E θ(v, h) θ θ h v,h }{{}}{{} Data Dependent Expectation Model Dependent Expectation Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

8 Learning in Boltzmann Machines Using gradient ascent by substituting E θ (v, h) in the gradient obtained in previous equation, one can obtain the update for the respective parameters as, W = α(e Pdata [vh T ] E Pmodel [vh T ]), L = α(e Pdata [vv T ] E Pmodel [vv T ]), J = α(e Pdata [hh T ] E Pmodel [hh T ]), b = α(e Pdata [v] E Pmodel [v]), c = α(e Pdata [h] E Pmodel [h]), (2) The parameters updates in the M.L.E. is very costly in the previous steps as would need to sum over exponential number of terms to compute both expectations. One needs Approximations. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

9 Approximate Maximum Likelihood Learning in Boltzmann Machines One approximation is to use a variational lower bound on the log-likelihoodln (p θ (v)) = ln p θ (v, h) = ln ( ) ( ) q µ (h v) q µ (h v) p θ(v i, h) h h (3) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

10 Approximate Maximum Likelihood Learning in Boltzmann Machines One approximation is to use a variational lower bound on the log-likelihoodln (p θ (v)) = ln p θ (v, h) = ln ( ) ( ) q µ (h v) q µ (h v) p θ(v i, h) h h q µ (h v i )logp θ (v, h) + H e (q µ ) = L(q µ, θ) h (3) where q µ (h v) is an approximate posterior (variational) distribution and H e (.) is the entropy function with natural logarithm. uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

11 Approximate Maximum Likelihood Learning in Boltzmann Machines One approximation is to use a variational lower bound on the log-likelihoodln (p θ (v)) = ln p θ (v, h) = ln ( ) ( ) q µ (h v) q µ (h v) p θ(v i, h) h h q µ (h v i )logp θ (v, h) + H e (q µ ) = L(q µ, θ) h (3) where q µ (h v) is an approximate posterior (variational) distribution and H e (.) is the entropy function with natural logarithm. Try to find the tightest lowerbound on the log-likelihood by optimizing on the distributions q µ and parameters θ. uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

12 Variational Learning for Boltzmann Machines For Boltzmann Machines, the lower bound can be rewritten as (ignoring the bias terms) - L(q µ, θ) = h q µ (h v)( E θ (v, h)) log(z θ ) + H e (q µ ) (4) (5) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

13 Variational Learning for Boltzmann Machines For Boltzmann Machines, the lower bound can be rewritten as (ignoring the bias terms) - L(q µ, θ) = h q µ (h v)( E θ (v, h)) log(z θ ) + H e (q µ ) (4) Using Mean Field Approximation, q µ (h v) = M j=1 q(h j v), and one assumes that q(h j = 1) = µ j. (M is the number of hidden units.) (5) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

14 Variational Learning for Boltzmann Machines For Boltzmann Machines, the lower bound can be rewritten as (ignoring the bias terms) - L(q µ, θ) = h q µ (h v)( E θ (v, h)) log(z θ ) + H e (q µ ) (4) Using Mean Field Approximation, q µ (h v) = M j=1 q(h j v), and one assumes that q(h j = 1) = µ j. (M is the number of hidden units.) = h M i=1 ( 1 q µ (h i v i ) 2 vt Lv + 1 ) 2 ht Jh + v T Wh log(z θ ) + H e (q µ ) = 1 2 vt Lv µt Jµ + v T Wµ log(z θ ) + M H e (µ j ) j=1 (5) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

15 Variational EM Learning for Boltzmann Machines Maximize lower bound iteratively by maximizing over the variational parameters µ and θ iteratively - Typical EM learning idea. E-step : sup µ L(q µ, θ) = sup µ 1 2 vt Lv µt Jµ + v T Wµ log(z θ ) + M j=1 H e(µ j ) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

16 Variational EM Learning for Boltzmann Machines Maximize lower bound iteratively by maximizing over the variational parameters µ and θ iteratively - Typical EM learning idea. E-step : sup µ L(q µ, θ) = 1 sup µ 2 vt Lv µt Jµ + v T Wµ log(z θ ) + M j=1 H e(µ j ) Using alternate maximization over each variate, one gets the update µ j σ i W ij v i + J mj µ m, m j where σ(.) denotes the sigmoid function. After running these updates, the parameter µ converges to ˆµ. uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

17 Stochastic Approximations or Persistent Markov Chains M-step : sup θ L(q µ, θ) = sup θ 1 2 vt Lv+ 1 2 µt Jµ+v T Wµ log(z θ )+ M j=1 H e(µ j ) Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

18 Stochastic Approximations or Persistent Markov Chains M-step : sup θ L(q µ, θ) = sup θ 1 2 vt Lv+ 1 2 µt Jµ+v T Wµ log(z θ )+ M j=1 H e(µ j ) MCMC Sampling and Persistent Markov Chains to approximate gradient of log-partition function log(z θ ) Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

19 Stochastic Approximations or Persistent Markov Chains M-step : sup θ L(q µ, θ) = sup θ 1 2 vt Lv+ 1 2 µt Jµ+v T Wµ log(z θ )+ M j=1 H e(µ j ) MCMC Sampling and Persistent Markov Chains to approximate gradient of log-partition function log(z θ ) The parameter updates for one training example can be written as, ) N W = α t ([vˆµ T ] ṽ h T i, L = α t ([vv T ] J = α t ([ˆµˆµ T ] i=1 ) N ṽ h T i, i=1 ) N ṽ h T i, i=1 (6) Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

20 Overall Algorithm for Training Boltzmann Machines Data: Training set S n of N binary data vectors v and M, the number of persistent Markov chains Initialize vector θ 0 and M samples : {ṽ 0,1, h 0,1 },..., {ṽ 0,M, h 0,M }; for t =0 to T (number of iterations) do for each n S n do Randomly( initalize µ n and run updates to obtain ˆµ n µ j σ i W ijv i + ) m j J mjµ m end for m = 1 to M (number of persistent markov chains) do Sample (ṽ t+1,m, h t+1,m ) given (ṽ t+1,m, h t+1,m ) by running Gibbs sampler end Update θ using equation (6) (adjusting for batch data) and decrease the learning rate α t. end Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

21 Learning for Deep Boltzmann Machines For Deep Boltzmann Machines, L = 0 and J would have many zero-blocks as hidden unit interactions layered. So some computations simplified. Gibbs sampling procedure is simplified as all units in one layer can be sampled parallely. But, learning observed slow, and Greedy Pretraining can result in faster convergence of parameters. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

22 Pretraining in Deep Boltzmann Machines Training each RBM separately, with some weight scaling. Figure: Greedy Layerwise Pretraining for DBM Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

23 Discriminative Finetuning in Deep Boltzmann Machines Further, an additional step of finetuning is also considered to improve the performance. For example, for a 2 hidden layer DBM, an approximate posterior is used as an augmented input to a neural network with weights of network initialized using parameters of DBM. Figure: Finetuning the parameters of DBM Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

24 Some Experimental Results and Observations Training a DBM for modeling handwritten digits in MNIST dataset. (a) DBM Model used for Training (b) Examples of handwritten digits Figure: An example of DBM used for MNIST data generation with training done for examples Some interesting observations :- Without Greedy Pretraining, the models were not producing good results. Using Discriminative fine tuning, DBM gave 99.5% accuracy, best on MNIST dataset for recognition at that time. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

25 Thank You Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

An Efficient Learning Procedure for Deep Boltzmann Machines

An Efficient Learning Procedure for Deep Boltzmann Machines ARTICLE Communicated by Yoshua Bengio An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov rsalakhu@utstat.toronto.edu Department of Statistics, University of Toronto, Toronto,

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey Hinton

An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey Hinton Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2010-037 August 4, 2010 An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Learning Deep Boltzmann Machines using Adaptive MCMC

Learning Deep Boltzmann Machines using Adaptive MCMC Ruslan Salakhutdinov Brain and Cognitive Sciences and CSAIL, MIT 77 Massachusetts Avenue, Cambridge, MA 02139 rsalakhu@mit.edu Abstract When modeling high-dimensional richly structured data, it is often

More information

Learning and Evaluating Boltzmann Machines

Learning and Evaluating Boltzmann Machines Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 Copyright c Ruslan Salakhutdinov 2008. June 26, 2008

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Learning to Disentangle Factors of Variation with Manifold Learning

Learning to Disentangle Factors of Variation with Manifold Learning Learning to Disentangle Factors of Variation with Manifold Learning Scott Reed Kihyuk Sohn Yuting Zhang Honglak Lee University of Michigan, Department of Electrical Engineering and Computer Science 08

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Contrastive Divergence

Contrastive Divergence Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Robust Classification using Boltzmann machines by Vasileios Vasilakakis Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

arxiv: v2 [cs.ne] 22 Feb 2013

arxiv: v2 [cs.ne] 22 Feb 2013 Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Density estimation. Computing, and avoiding, partition functions. Iain Murray Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines

Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines KyungHyun Cho KYUNGHYUN.CHO@AALTO.FI Tapani Raiko TAPANI.RAIKO@AALTO.FI Alexander Ilin ALEXANDER.ILIN@AALTO.FI Department

More information

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Asja Fischer and Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum,

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

Kyle Reing University of Southern California April 18, 2018

Kyle Reing University of Southern California April 18, 2018 Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Inductive Principles for Restricted Boltzmann Machine Learning

Inductive Principles for Restricted Boltzmann Machine Learning Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas

More information

Lecture 4 Towards Deep Learning

Lecture 4 Towards Deep Learning Lecture 4 Towards Deep Learning (January 30, 2015) Mu Zhu University of Waterloo Deep Network Fields Institute, Toronto, Canada 2015 by Mu Zhu 2 Boltzmann Distribution probability distribution for a complex

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Parallel Tempering is Efficient for Learning Restricted Boltzmann Machines

Parallel Tempering is Efficient for Learning Restricted Boltzmann Machines Parallel Tempering is Efficient for Learning Restricted Boltzmann Machines KyungHyun Cho, Tapani Raiko, Alexander Ilin Abstract A new interest towards restricted Boltzmann machines (RBMs) has risen due

More information

Self Supervised Boosting

Self Supervised Boosting Self Supervised Boosting Max Welling, Richard S. Zemel, and Geoffrey E. Hinton Department of omputer Science University of Toronto 1 King s ollege Road Toronto, M5S 3G5 anada Abstract Boosting algorithms

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto 1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

arxiv: v1 [cs.lg] 8 Oct 2015

arxiv: v1 [cs.lg] 8 Oct 2015 Empirical Analysis of Sampling Based Estimators for Evaluating RBMs Vidyadhar Upadhya and P. S. Sastry arxiv:1510.02255v1 [cs.lg] 8 Oct 2015 Indian Institute of Science, Bangalore, India Abstract. The

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Notes on Boltzmann Machines

Notes on Boltzmann Machines 1 Notes on Boltzmann Machines Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca I. INTRODUCTION Boltzmann machines are probability distributions on high dimensional binary

More information

ARestricted Boltzmann machine (RBM) [1] is a probabilistic

ARestricted Boltzmann machine (RBM) [1] is a probabilistic 1 Matrix Product Operator Restricted Boltzmann Machines Cong Chen, Kim Batselier, Ching-Yun Ko, and Ngai Wong chencong@eee.hku.hk, k.batselier@tudelft.nl, cyko@eee.hku.hk, nwong@eee.hku.hk arxiv:1811.04608v1

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns

An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns Daniel Jiwoong Im, Ethan Buchman, and Graham W. Taylor School of Engineering University of Guelph Guelph,

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

INITIALIZING NEURAL NETWORKS USING RESTRICTED BOLTZMANN MACHINES. by Amanda Anna Erhard B.S. in Electrical Engineering, University of Pittsburgh, 2014

INITIALIZING NEURAL NETWORKS USING RESTRICTED BOLTZMANN MACHINES. by Amanda Anna Erhard B.S. in Electrical Engineering, University of Pittsburgh, 2014 INITIALIZING NEURAL NETWORKS USING RESTRICTED BOLTZMANN MACHINES by Amanda Anna Erhard B.S. in Electrical Engineering, University of Pittsburgh, 2014 Submitted to the Graduate Faculty of the Swanson School

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers Alejandro Perdomo-Ortiz Senior Research Scientist, Quantum AI Lab. at NASA Ames Research Center and at the

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Intractable Likelihood Functions

Intractable Likelihood Functions Intractable Likelihood Functions Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap p(x y o ) = z p(x,y o,z) x,z

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Probabilistic Models in Theoretical Neuroscience

Probabilistic Models in Theoretical Neuroscience Probabilistic Models in Theoretical Neuroscience visible unit Boltzmann machine semi-restricted Boltzmann machine restricted Boltzmann machine hidden unit Neural models of probabilistic sampling: introduction

More information

arxiv: v2 [cs.lg] 16 Mar 2013

arxiv: v2 [cs.lg] 16 Mar 2013 Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines arxiv:1301.3545v2 cs.lg 16 Mar 2013 Guillaume Desjardins, Razvan Pascanu, Aaron Courville and Yoshua Bengio Département d informatique

More information

The Recurrent Temporal Restricted Boltzmann Machine

The Recurrent Temporal Restricted Boltzmann Machine The Recurrent Temporal Restricted Boltzmann Machine Ilya Sutskever, Geoffrey Hinton, and Graham Taylor University of Toronto {ilya, hinton, gwtaylor}@cs.utoronto.ca Abstract The Temporal Restricted Boltzmann

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Annealing Between Distributions by Averaging Moments

Annealing Between Distributions by Averaging Moments Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We

More information

A Practical Guide to Training Restricted Boltzmann Machines

A Practical Guide to Training Restricted Boltzmann Machines Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 Copyright c Geoffrey Hinton 2010. August 2, 2010 UTML

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information