Deep Boltzmann Machines
|
|
- Priscilla Mills
- 5 years ago
- Views:
Transcription
1 Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
2 Overview 1 Introduction Representation of the model 2 Learning in Boltzmann Machines Variational Lower Bound - Mean Field Approximation Stochastic Approximation Procedure - Persistent Markov Chains 3 Additional Tricks for DBM Greedy Pretraining of the Model Discriminative Finetuning 4 Simulation results Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
3 Introduction Boltzmann Machine - Pairwise Markov Random Fields. Consider a set of random variables as latent i.e. hidden (h) and others as visible (v). The probability distribution for binary random variables is given by P θ (v, h) = 1 Z θ e E θ(v,h), θ = {L, J, W} E θ (v, h) = 1 2 vt Lv 1 2 ht Jh v T Wh, Figure: Model for Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
4 Representation While Boltzmann Machine is a powerful model over the data, it is computationally expensive to learn. So, one considers several approximations to Boltzmann machines. Figure: Boltzmann Machines vs RBM Deep Boltzmann Machine consider hidden nodes in several layers, with a layer being units that have no direct connections. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
5 Learning in Boltzmann Machines Model can be trained using Maximum Likelihood. The gradient of the likelihood takes the following form - ( ) ln(l θ (v)) = ln(p θ (v)) = ln p θ (v, h) h (1) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
6 Learning in Boltzmann Machines Model can be trained using Maximum Likelihood. The gradient of the likelihood takes the following form - ( ) ln(l θ (v)) = ln(p θ (v)) = ln p θ (v, h) = ln h h exp( E θ (v, h)) ln exp( E θ (v, h)) ; (1) v,h uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
7 Learning in Boltzmann Machines Model can be trained using Maximum Likelihood. The gradient of the likelihood takes the following form - ( ) ln(l θ (v)) = ln(p θ (v)) = ln p θ (v, h) = ln h h exp( E θ (v, h)) ln exp( E θ (v, h)) ; (1) v,h ln(l θ (v)) θ = p(h v) E θ(v, h) + p(v, h) E θ(v, h) θ θ h v,h }{{}}{{} Data Dependent Expectation Model Dependent Expectation Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
8 Learning in Boltzmann Machines Using gradient ascent by substituting E θ (v, h) in the gradient obtained in previous equation, one can obtain the update for the respective parameters as, W = α(e Pdata [vh T ] E Pmodel [vh T ]), L = α(e Pdata [vv T ] E Pmodel [vv T ]), J = α(e Pdata [hh T ] E Pmodel [hh T ]), b = α(e Pdata [v] E Pmodel [v]), c = α(e Pdata [h] E Pmodel [h]), (2) The parameters updates in the M.L.E. is very costly in the previous steps as would need to sum over exponential number of terms to compute both expectations. One needs Approximations. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
9 Approximate Maximum Likelihood Learning in Boltzmann Machines One approximation is to use a variational lower bound on the log-likelihoodln (p θ (v)) = ln p θ (v, h) = ln ( ) ( ) q µ (h v) q µ (h v) p θ(v i, h) h h (3) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
10 Approximate Maximum Likelihood Learning in Boltzmann Machines One approximation is to use a variational lower bound on the log-likelihoodln (p θ (v)) = ln p θ (v, h) = ln ( ) ( ) q µ (h v) q µ (h v) p θ(v i, h) h h q µ (h v i )logp θ (v, h) + H e (q µ ) = L(q µ, θ) h (3) where q µ (h v) is an approximate posterior (variational) distribution and H e (.) is the entropy function with natural logarithm. uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
11 Approximate Maximum Likelihood Learning in Boltzmann Machines One approximation is to use a variational lower bound on the log-likelihoodln (p θ (v)) = ln p θ (v, h) = ln ( ) ( ) q µ (h v) q µ (h v) p θ(v i, h) h h q µ (h v i )logp θ (v, h) + H e (q µ ) = L(q µ, θ) h (3) where q µ (h v) is an approximate posterior (variational) distribution and H e (.) is the entropy function with natural logarithm. Try to find the tightest lowerbound on the log-likelihood by optimizing on the distributions q µ and parameters θ. uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
12 Variational Learning for Boltzmann Machines For Boltzmann Machines, the lower bound can be rewritten as (ignoring the bias terms) - L(q µ, θ) = h q µ (h v)( E θ (v, h)) log(z θ ) + H e (q µ ) (4) (5) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
13 Variational Learning for Boltzmann Machines For Boltzmann Machines, the lower bound can be rewritten as (ignoring the bias terms) - L(q µ, θ) = h q µ (h v)( E θ (v, h)) log(z θ ) + H e (q µ ) (4) Using Mean Field Approximation, q µ (h v) = M j=1 q(h j v), and one assumes that q(h j = 1) = µ j. (M is the number of hidden units.) (5) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
14 Variational Learning for Boltzmann Machines For Boltzmann Machines, the lower bound can be rewritten as (ignoring the bias terms) - L(q µ, θ) = h q µ (h v)( E θ (v, h)) log(z θ ) + H e (q µ ) (4) Using Mean Field Approximation, q µ (h v) = M j=1 q(h j v), and one assumes that q(h j = 1) = µ j. (M is the number of hidden units.) = h M i=1 ( 1 q µ (h i v i ) 2 vt Lv + 1 ) 2 ht Jh + v T Wh log(z θ ) + H e (q µ ) = 1 2 vt Lv µt Jµ + v T Wµ log(z θ ) + M H e (µ j ) j=1 (5) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
15 Variational EM Learning for Boltzmann Machines Maximize lower bound iteratively by maximizing over the variational parameters µ and θ iteratively - Typical EM learning idea. E-step : sup µ L(q µ, θ) = sup µ 1 2 vt Lv µt Jµ + v T Wµ log(z θ ) + M j=1 H e(µ j ) uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
16 Variational EM Learning for Boltzmann Machines Maximize lower bound iteratively by maximizing over the variational parameters µ and θ iteratively - Typical EM learning idea. E-step : sup µ L(q µ, θ) = 1 sup µ 2 vt Lv µt Jµ + v T Wµ log(z θ ) + M j=1 H e(µ j ) Using alternate maximization over each variate, one gets the update µ j σ i W ij v i + J mj µ m, m j where σ(.) denotes the sigmoid function. After running these updates, the parameter µ converges to ˆµ. uslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
17 Stochastic Approximations or Persistent Markov Chains M-step : sup θ L(q µ, θ) = sup θ 1 2 vt Lv+ 1 2 µt Jµ+v T Wµ log(z θ )+ M j=1 H e(µ j ) Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
18 Stochastic Approximations or Persistent Markov Chains M-step : sup θ L(q µ, θ) = sup θ 1 2 vt Lv+ 1 2 µt Jµ+v T Wµ log(z θ )+ M j=1 H e(µ j ) MCMC Sampling and Persistent Markov Chains to approximate gradient of log-partition function log(z θ ) Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
19 Stochastic Approximations or Persistent Markov Chains M-step : sup θ L(q µ, θ) = sup θ 1 2 vt Lv+ 1 2 µt Jµ+v T Wµ log(z θ )+ M j=1 H e(µ j ) MCMC Sampling and Persistent Markov Chains to approximate gradient of log-partition function log(z θ ) The parameter updates for one training example can be written as, ) N W = α t ([vˆµ T ] ṽ h T i, L = α t ([vv T ] J = α t ([ˆµˆµ T ] i=1 ) N ṽ h T i, i=1 ) N ṽ h T i, i=1 (6) Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
20 Overall Algorithm for Training Boltzmann Machines Data: Training set S n of N binary data vectors v and M, the number of persistent Markov chains Initialize vector θ 0 and M samples : {ṽ 0,1, h 0,1 },..., {ṽ 0,M, h 0,M }; for t =0 to T (number of iterations) do for each n S n do Randomly( initalize µ n and run updates to obtain ˆµ n µ j σ i W ijv i + ) m j J mjµ m end for m = 1 to M (number of persistent markov chains) do Sample (ṽ t+1,m, h t+1,m ) given (ṽ t+1,m, h t+1,m ) by running Gibbs sampler end Update θ using equation (6) (adjusting for batch data) and decrease the learning rate α t. end Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
21 Learning for Deep Boltzmann Machines For Deep Boltzmann Machines, L = 0 and J would have many zero-blocks as hidden unit interactions layered. So some computations simplified. Gibbs sampling procedure is simplified as all units in one layer can be sampled parallely. But, learning observed slow, and Greedy Pretraining can result in faster convergence of parameters. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
22 Pretraining in Deep Boltzmann Machines Training each RBM separately, with some weight scaling. Figure: Greedy Layerwise Pretraining for DBM Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
23 Discriminative Finetuning in Deep Boltzmann Machines Further, an additional step of finetuning is also considered to improve the performance. For example, for a 2 hidden layer DBM, an approximate posterior is used as an augmented input to a neural network with weights of network initialized using parameters of DBM. Figure: Finetuning the parameters of DBM Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
24 Some Experimental Results and Observations Training a DBM for modeling handwritten digits in MNIST dataset. (a) DBM Model used for Training (b) Examples of handwritten digits Figure: An example of DBM used for MNIST data generation with training done for examples Some interesting observations :- Without Greedy Pretraining, the models were not producing good results. Using Discriminative fine tuning, DBM gave 99.5% accuracy, best on MNIST dataset for recognition at that time. Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
25 Thank You Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel Deep (UIUC) Boltzmann Machines December 2, / 16
Modeling Documents with a Deep Boltzmann Machine
Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationAn Efficient Learning Procedure for Deep Boltzmann Machines
ARTICLE Communicated by Yoshua Bengio An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov rsalakhu@utstat.toronto.edu Department of Statistics, University of Toronto, Toronto,
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationAn Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey Hinton
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2010-037 August 4, 2010 An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationLearning Deep Boltzmann Machines using Adaptive MCMC
Ruslan Salakhutdinov Brain and Cognitive Sciences and CSAIL, MIT 77 Massachusetts Avenue, Cambridge, MA 02139 rsalakhu@mit.edu Abstract When modeling high-dimensional richly structured data, it is often
More informationLearning and Evaluating Boltzmann Machines
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 Copyright c Ruslan Salakhutdinov 2008. June 26, 2008
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationLearning to Disentangle Factors of Variation with Manifold Learning
Learning to Disentangle Factors of Variation with Manifold Learning Scott Reed Kihyuk Sohn Yuting Zhang Honglak Lee University of Michigan, Department of Electrical Engineering and Computer Science 08
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationDeep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationRepresentational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some
More informationarxiv: v2 [cs.ne] 22 Feb 2013
Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA
More informationAn efficient way to learn deep generative models
An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye
More informationDeep Learning Architecture for Univariate Time Series Forecasting
CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationDensity estimation. Computing, and avoiding, partition functions. Iain Murray
Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationDoes the Wake-sleep Algorithm Produce Good Density Estimators?
Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto
More informationEnhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines
Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines KyungHyun Cho KYUNGHYUN.CHO@AALTO.FI Tapani Raiko TAPANI.RAIKO@AALTO.FI Alexander Ilin ALEXANDER.ILIN@AALTO.FI Department
More informationEmpirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines
Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Asja Fischer and Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum,
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationTraining an RBM: Contrastive Divergence. Sargur N. Srihari
Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and
More informationKyle Reing University of Southern California April 18, 2018
Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationInductive Principles for Restricted Boltzmann Machine Learning
Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas
More informationLecture 4 Towards Deep Learning
Lecture 4 Towards Deep Learning (January 30, 2015) Mu Zhu University of Waterloo Deep Network Fields Institute, Toronto, Canada 2015 by Mu Zhu 2 Boltzmann Distribution probability distribution for a complex
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationParallel Tempering is Efficient for Learning Restricted Boltzmann Machines
Parallel Tempering is Efficient for Learning Restricted Boltzmann Machines KyungHyun Cho, Tapani Raiko, Alexander Ilin Abstract A new interest towards restricted Boltzmann machines (RBMs) has risen due
More informationSelf Supervised Boosting
Self Supervised Boosting Max Welling, Richard S. Zemel, and Geoffrey E. Hinton Department of omputer Science University of Toronto 1 King s ollege Road Toronto, M5S 3G5 anada Abstract Boosting algorithms
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationHow to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto
1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationarxiv: v1 [cs.lg] 8 Oct 2015
Empirical Analysis of Sampling Based Estimators for Evaluating RBMs Vidyadhar Upadhya and P. S. Sastry arxiv:1510.02255v1 [cs.lg] 8 Oct 2015 Indian Institute of Science, Bangalore, India Abstract. The
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationConnections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN
Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University
More informationNotes on Boltzmann Machines
1 Notes on Boltzmann Machines Patrick Kenny Centre de recherche informatique de Montréal Patrick.Kenny@crim.ca I. INTRODUCTION Boltzmann machines are probability distributions on high dimensional binary
More informationARestricted Boltzmann machine (RBM) [1] is a probabilistic
1 Matrix Product Operator Restricted Boltzmann Machines Cong Chen, Kim Batselier, Ching-Yun Ko, and Ngai Wong chencong@eee.hku.hk, k.batselier@tudelft.nl, cyko@eee.hku.hk, nwong@eee.hku.hk arxiv:1811.04608v1
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationAn Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns
An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns Daniel Jiwoong Im, Ethan Buchman, and Graham W. Taylor School of Engineering University of Guelph Guelph,
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationINITIALIZING NEURAL NETWORKS USING RESTRICTED BOLTZMANN MACHINES. by Amanda Anna Erhard B.S. in Electrical Engineering, University of Pittsburgh, 2014
INITIALIZING NEURAL NETWORKS USING RESTRICTED BOLTZMANN MACHINES by Amanda Anna Erhard B.S. in Electrical Engineering, University of Pittsburgh, 2014 Submitted to the Graduate Faculty of the Swanson School
More informationEnergy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13
Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationOpportunities and challenges in quantum-enhanced machine learning in near-term quantum computers
Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers Alejandro Perdomo-Ortiz Senior Research Scientist, Quantum AI Lab. at NASA Ames Research Center and at the
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationIntractable Likelihood Functions
Intractable Likelihood Functions Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap p(x y o ) = z p(x,y o,z) x,z
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationUsing Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationProbabilistic Models in Theoretical Neuroscience
Probabilistic Models in Theoretical Neuroscience visible unit Boltzmann machine semi-restricted Boltzmann machine restricted Boltzmann machine hidden unit Neural models of probabilistic sampling: introduction
More informationarxiv: v2 [cs.lg] 16 Mar 2013
Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines arxiv:1301.3545v2 cs.lg 16 Mar 2013 Guillaume Desjardins, Razvan Pascanu, Aaron Courville and Yoshua Bengio Département d informatique
More informationThe Recurrent Temporal Restricted Boltzmann Machine
The Recurrent Temporal Restricted Boltzmann Machine Ilya Sutskever, Geoffrey Hinton, and Graham Taylor University of Toronto {ilya, hinton, gwtaylor}@cs.utoronto.ca Abstract The Temporal Restricted Boltzmann
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationAnnealing Between Distributions by Averaging Moments
Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We
More informationA Practical Guide to Training Restricted Boltzmann Machines
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 Copyright c Geoffrey Hinton 2010. August 2, 2010 UTML
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More information