Restricted Boltzmann Machines

Similar documents
Learning Deep Architectures for AI. Part II - Vijay Chakilam

UNSUPERVISED LEARNING

arxiv: v2 [cs.ne] 22 Feb 2013

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Bayesian Inference and an Intro to Monte Carlo Methods

Greedy Layer-Wise Training of Deep Networks

Restricted Boltzmann Machines

Reinforcement Learning

CSC321 Lecture 20: Autoencoders

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Modeling Documents with a Deep Boltzmann Machine

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Denoising Autoencoders

Knowledge Extraction from DBNs for Images

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Self Supervised Boosting

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Boltzmann Machines

Chapter 16. Structured Probabilistic Models for Deep Learning

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Kyle Reing University of Southern California April 18, 2018

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Rapid Introduction to Machine Learning/ Deep Learning

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Lecture 16 Deep Neural Generative Models

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Neural Networks: Backpropagation

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Understanding How ConvNets See

Mixture Models and Representational Power of RBM s, DBN s and DBM s

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Learning & Neural Networks Lecture 2

TUTORIAL PART 1 Unsupervised Learning

Learning Deep Architectures

A Video from Google DeepMind.

Principal Component Analysis (PCA)

Contrastive Divergence

Undirected Graphical Models: Markov Random Fields

Jakub Hajic Artificial Intelligence Seminar I

The Origin of Deep Learning. Lili Mou Jan, 2015

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Belief Networks are Compact Universal Approximators

Learning to Disentangle Factors of Variation with Manifold Learning

From perceptrons to word embeddings. Simon Šuster University of Groningen

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

How to do backpropagation in a brain

Week 5: Logistic Regression & Neural Networks

Deep Neural Networks (1) Hidden layers; Back-propagation

Learning Tetris. 1 Tetris. February 3, 2009

CSC321 Lecture 5: Multilayer Perceptrons

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Inductive Principles for Restricted Boltzmann Machine Learning

Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines

Deep learning attracts lots of attention.

Restricted Boltzmann Machines for Collaborative Filtering

Basic Principles of Unsupervised and Unsupervised

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

ARestricted Boltzmann machine (RBM) [1] is a probabilistic

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Logistic Regression Trained with Different Loss Functions. Discussion

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Knowledge Extraction from Deep Belief Networks for Images

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Unsupervised Learning

Introduction to Deep Learning

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 7 Neural language models

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Convolutional Neural Networks. Srikumar Ramalingam

arxiv: v4 [cs.lg] 16 Apr 2015

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

ECE521 Lectures 9 Fully Connected Neural Networks

arxiv: v1 [stat.ml] 30 Aug 2010

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Lecture 5: Logistic Regression. Neural Networks

Deep Learning Autoencoder Models

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines

Deep Generative Models. (Unsupervised Learning)

CS534 Machine Learning - Spring Final Exam

arxiv: v1 [stat.ml] 2 Sep 2014

Introduction to Deep Learning

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

3 : Representation of Undirected GM

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

RegML 2018 Class 8 Deep learning

Based on the original slides of Hung-yi Lee

arxiv: v3 [cs.lg] 18 Mar 2013

An Efficient Learning Procedure for Deep Boltzmann Machines

CSCI-567: Machine Learning (Spring 2019)

Transcription:

Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter 2016 Michael Guerzhoy

Unsupervised Learning Instead of having inputs and target outputs, we ust have the inputs The goal is to learn something useful about the data E.g., want to discover useful features of the data Want to obtain the same kind of features we obtained when training e.g. AlexNet, but without having to supply labels for images This is useful! Labelling images is difficult and expensive, and features can be useful for classification when there is not a lot of training data

Unsupervised Learning Find weights W s.t. P W (x) is high when x looks like the data in the training set, but P W (x) is low if x looks differently from the data in the training set P W (x) is the probability of x How likely are we to observe x as a new training sample? In RBMs, we also use hidden variables h We imagine each sample in the training set consists of visible input x, and some hidden inputs h P W x = σ h P W (x, h )

Restricted Boltzmann Machine (RBM) h, x: Binary vecs. (h i, x {0,1}) E x, h = h T Wx c T x b T h = W k h x k c k x k b h k k P x, h = exp E x, h Z, Z = (x,h ) exp( E(x, h )) P x = P(x, h ) h

RBM weights as features E x, h = h T Wx c T x b T h = k W k h x k k c k x k High probability => low Energy Function (E) b h Consider W,: The Energy Function is lower if W,k is large when x k is large, if h = 1 So W,: could be a template for x

RBM weights as features W,: could be a template for x But that s only useful when h is on (i.e., = 1) So P x, h will be high when h = 1 for a that s appropriate for the x Think: x is in class The weights are such that W,: is a template for x s of class If the weights are good, P(x) will be high too, since one of the terms in P x = σ h P(x, h ) will be large

A view of the training set x (i) = {1, 0, 1, 1,, 1} : observed. E.g., binarized image h (i) =? (Unobserved.). Also a binary vector. We don t know what it is. For example, a first coordinate equal to 1 might mean the sample represents the digit 0 It would be easy to assign probability if we knew the state of the hidden units The hidden layer makes it possible to assign reasonable probabilities using a relatively simple architecture

Computing P x directly is hard P x, h = exp E x,h Z, Z = σ (x,h ) exp( E(x, h )) 2 dim x +dim(h) terms! Even computing this directly is hard (Note: before we ust looked at the Energy Function (denominator), but that was ust for intuition) Even computing P(x) is hard. Maximizing it with respect to W is also hard

Gibbs Sampling It turns out that it s possible to compute P h x and P(x h) easily. (I.e., if we know the visible units, it s easy to compute the probability distribution for the hidden units, and vice-versa) P h x = ς P h x, P h = 1 x = σ(b + W,: x) P x h = ς P x h, P x = 1 h = σ(c + W T,: h)

Proof P h = 1 x = P(h =1,x) P(x) P(h = 1, x) = P h = 0, x + P h = 1, x 1 = 1 + P h = 0, x /P(h = 1, x) P h =0,x P h =1,x σ = h exp( E(x,h ))/Z =0 = σ h exp( ) σ h exp( E(x,h ))/Z σ =1 h exp( W x+b ) = exp W,: x b So P h = 1 x = σ(w,: x + b ) = W k h x k c k x k b h k k

Proof (cont d, first two lines important)

Gibbs Sampling for RBM (known weights) Initialize the x Sample the h given the x (Easy! We worked out the distribution) Sample the x given the h (Easy!) Repeat This allows us to see what kind of data the RBM is modelling (i.e., assigning high probability to)

Sampling from an RBM i i i i High probability sample If we wait long enough, the visible samples will be sampled according to the probability distribution of the RBM, since we are performing Gibbs sampling