Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Similar documents
A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Lecture 16 Deep Neural Generative Models

The Origin of Deep Learning. Lili Mou Jan, 2015

UNSUPERVISED LEARNING

Deep unsupervised learning

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Need for Sampling in Machine Learning. Sargur Srihari

Unsupervised Learning

Chapter 16. Structured Probabilistic Models for Deep Learning

Restricted Boltzmann Machines for Collaborative Filtering

Using Graphs to Describe Model Structure. Sargur N. Srihari

Knowledge Extraction from DBNs for Images

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Introduction to Restricted Boltzmann Machines

Greedy Layer-Wise Training of Deep Networks

arxiv: v2 [cs.ne] 22 Feb 2013

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Restricted Boltzmann Machines

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Pattern Recognition and Machine Learning

Probabilistic Graphical Models

Replicated Softmax: an Undirected Topic Model. Stephen Turner

STA 4273H: Statistical Machine Learning

Deep Generative Models. (Unsupervised Learning)

Sum-Product Networks: A New Deep Architecture

Inductive Principles for Restricted Boltzmann Machine Learning

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Learning Deep Architectures

STA 4273H: Statistical Machine Learning

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Deep Belief Networks are compact universal approximators

Variational Inference. Sargur Srihari

Representation of undirected GM. Kayhan Batmanghelich

Basic Sampling Methods

Parameter learning in CRF s

Deep Learning Autoencoder Models

Learning to Disentangle Factors of Variation with Manifold Learning

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Undirected Graphical Models

Linear Dynamical Systems

Lecture 14: Deep Generative Learning

Probabilistic Graphical Models

Gaussian Cardinality Restricted Boltzmann Machines

Introduction to Machine Learning Midterm, Tues April 8

2 : Directed GMs: Bayesian Networks

Undirected Graphical Models: Markov Random Fields

Chapter 20. Deep Generative Models

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Lecture 21: Spectral Learning for Graphical Models

The connection of dropout and Bayesian statistics

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

CSC321 Lecture 20: Autoencoders

3 : Representation of Undirected GM

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

CPSC 540: Machine Learning

Bayesian Networks Inference with Probabilistic Graphical Models

Learning Energy-Based Models of High-Dimensional Data

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Deep Learning & Neural Networks Lecture 2

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Convolution and Pooling as an Infinitely Strong Prior

Introduction to Graphical Models

How to do backpropagation in a brain

STA 4273H: Statistical Machine Learning

Machine Learning A Bayesian and Optimization Perspective

Learning Deep Architectures for AI. Part II - Vijay Chakilam

STA 4273H: Statistical Machine Learning

From Bayesian Networks to Markov Networks. Sargur Srihari

Part 2. Representation Learning Algorithms

An efficient way to learn deep generative models

Basic Principles of Unsupervised and Unsupervised

Diversity-Promoting Bayesian Learning of Latent Variable Models

13: Variational inference II

The Recurrent Temporal Restricted Boltzmann Machine

Kyle Reing University of Southern California April 18, 2018

Inference as Optimization

Alternative Parameterizations of Markov Networks. Sargur Srihari

Machine Learning Techniques for Computer Vision

Learning MN Parameters with Approximation. Sargur Srihari

Deep Boltzmann Machines

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Contrastive Divergence

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Mixtures of Gaussians. Sargur Srihari

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Intelligent Systems (AI-2)

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Alternative Parameterizations of Markov Networks. Sargur Srihari

An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey Hinton

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Deep Belief Networks are Compact Universal Approximators

Transcription:

Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu

Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous data 6. Convolutional Boltzmann machines 7. Boltzmann machines for structured and sequential outputs 8. Other Boltzmann machines 9. Backpropagation through random operations 10. Directed generative nets 11. Drawing samples from autoencoders 12. Generative stochastic networks 13. Other generative schemes 14. Evaluating generative models 15. Conclusion 2

Topics 1. History of Deep Belief Networks (DBNs) 2. What are DBNs? Example of a DBN DBNs as Hybrid graphical models 3. Probability distribution represented by a DBN 4. Sampling from a DBN 5. Inference with a DBN 6. Training a DBN 7. Using the DBN 3

Deep Belief Net Architecture A hybrid graphical model involving both directed and undirected connections. Like an RBM it has no intralayer connections. However a DBN has multiple hidden layers and thus connections between hidden units are in separate layers. All the local CPDs needed by the DBN are copied directly from the local CPDs of Its constituent RBMs. Alternatively, we could represent the DBN with a completely undirected graph, but it would need intralayer connections to capture the dependencies between parents. 4

History of Deep Belief Networks One of the first non-convolutional models to admit training of deep architectures Deep belief networks started the current deep learning renaissance Prior to this deep models were considered too difficult to optimize Kernel machines with convex objective functions dominated the landscape DBNs Demonstrated that deep architectures outperformed kernelized SVM on MNIST Today deep belief networks have fallen out of favor and rarely used But still studied due to their role in deep learning 5

DBNs as generative models DBNs are generative models with several layers of latent variables Latent variables are typically binary Visible layers can be binary or real There are no intra-layer connections Connections between top two layers are undirected Connections between all other layers is directed, pointing towards data 6

An example of a DBN It is a hybrid graphical model involving both directed and undirected connections Like an RBM it has no intra-layer connections Has multiple hidden layers Connection between top two layers are undirected Connections between all other layers are directed 7

Distribution represented by a DBN A DBN with l hidden layers has l weight matrices W (1),..,W (l) A Markov Network over top two layers It contains l+1 bias vectors b (1),..,b (l) Bias b (0) provides biases for the visible layer Probability distribution represented (binary v i,h i ) is: In the case of real-valued variables, substitute With β diagonal for tractability Several layers of a Bayesian network Joint probabilities of a pair of hidden layers MN expressed in terms of energy function CPDs of lower hidden layers, given layer above CPDs of visible layers, given first hidden layer A DBN with only one hidden layer is an RBM 8

Sampling from a DBN To generate a sample from a DBN: (say, to compute expected values of features represented by top latent layer h (l) First run several steps of Gibbs sampling from the top two hidden layers This stage is drawing a sample from the RBM defined by the top two layers Then use a single pass of ancestral sampling through rest of the model to draw a sample from the visible units Say to generate a visible sample v from the model 9

Inference in a DBN Inference is the task of determining a probability from a model, e.g., P(h v) given P(h,v) Inference in a DBN is intractable due to: the explaining away effect within each directed layer Interaction between two hidden layers that have undirected connections Evaluating or maximizing standard evidence bound on the log-likelihood is also intractable Because evidence bound takes the expectation of cliques whose size is equal to network width 10

Evaluating log-likelihood Evaluating or maximizing log-likelihood for training (parameter estimation) requires solving two intractable problems: 1. Inference: marginalizing out latent variables (likelihood is product of probabilities of visible samples) 2. Partition function Within the undirected model of the top two layers 11

To Train a DBN Begin by training an RBM to maximize Using contrastive divergence or stochastic ML RBM Parameters define parameters of first layer of DBN Next, a second RBM is trained to maximize where p (1), p (2) : distributions represented by the two RBMs In other words, second RBM is trained to model the distribution defined by sampling the hidden units of the first RBM, when the first RBM is driven by the data This procedure can be repeated indefinitely to add as many layers to the DBN as desired With each RBM modeling the samples of the previous one (justified as increasing the variational lower bound) 12

Using a DBN The trained DBN may be directly used as a generative model But most interest in DBNs arose from classification To do this, take the weights from the DBN to define an MLP After initializing this MLP with the weights and biases learned via generative training of the DBN, we can train the MLP to perform a classification task This additional training of the MLP is an example of discriminative fine-tuning Choice of MLP is arbitrary 13