Deep Belief Networks are Compact Universal Approximators

Size: px
Start display at page:

Download "Deep Belief Networks are Compact Universal Approximators"

Transcription

1 Deep Belief Networks are Compact Universal Approximators Franck Olivier Ndjakou Njeunje Applied Mathematics and Scientific Computation May 16, / 29

2 Outline 1 Introduction 2 Preliminaries Universal approximation theorem Definition Deep Belief Network 3 Sutekever and Hinton s Method Transfer of Probability 4 Le Roux and Bengio s Method Gray Codes Simultaneous Transfer of Probability 5 Conclusion 2 / 29

3 Introduction Machine learning (by Arthur Samuel ) Neural Networks: Set of transformations 3 / 29

4 Introduction Machine learning (by Arthur Samuel ) Neural Networks: Set of transformations 3 / 29

5 Introduction Machine learning (by Arthur Samuel ) Neural Networks: Set of transformations σ(x) = exp( x) (1) 3 / 29

6 Universal approximation theorem [Cybe89] Let ϕ( ) be a non constant, bounded, and monotonically-increasing continuous function. Let I m denote the m-dimensional unit hypercube [0, 1] m. The space of continuous functions on I m is denoted by C(I m ). Then, given any function f C(I m ) and ε > 0, there exists an integer N, real constants v i, b i R and real vectors w i R m, where i = 1,, N such that we may define: F (x) = N i=1 ( ) v i ϕ wi T x + b i (2) as an approximate realization of the function f where f is independent of ϕ; that is, F (x) f (x) < ε (3) for all x I m. In other words, functions of the form F (x) are dense in C(I m ). 4 / 29

7 Definition Generative probabilistic model (GPM) Application: recognition, classification, and generation 5 / 29

8 Definition Generative probabilistic model (GPM) Application: recognition, classification, and generation Restricted Boltzmann machine (RBM) [LeBe08] 2-layer GPM 5 / 29

9 Definition Generative probabilistic model (GPM) Application: recognition, classification, and generation Restricted Boltzmann machine (RBM) [LeBe08] 2-layer GPM Deep belief network (DBN) [HiOT06] Multilayer GPM First two layer form an RBM 6 / 29

10 Deep Belief Network Let h i represent the vector of hidden variable at layer i. The model is parametrized as follows: P(h 0, h 1, h 2,..., h l ) = P(h 0 h 1 )P(h 1 h 2 )... P(h l 2 h l 1 )P(h l, h l 1 ). (4) The hidden layer h i is a binary random vector with elements h i j and n i P(h i h i+1 ) = P(h i j h i+1 ). (5) The element h i j is a stochatic neuron or whose binary activation is 1: j=1 ( n i+1 P(h i j = 1 h i+1 ) = σ bj i + k=1 W i jk hi+1 k ). (6) 7 / 29

11 Universal approximator (DBN) Let p be an arbitrary distribution over binary vectors of n bits. A deep belief network that has p as its marginal distribution over h 0 is said to be a universal approximator. This means that for any binary vector x of n bits, there exist weights and biases such that given ε > 0: P(h 0 = x) p (x) < ε (7) 8 / 29

12 Sutekever and Hinton s Method 2008 [SuHi08] Define an arbitrary sequence (a i ) 1 i 2 n of binary vectors in {0, 1} n. The goal is to find appropriate weights and biases such that the marginal distribution over the set of outputs to our DBN is the same as the probability distribution over the vectors (a i ) 1 i 2 n. In the next few slides I will consider the following example for n = 4: a 1 = 1011, p (a 1 ) = 0.1 (8) a 2 = 1000, p (a 2 ) = 0.05 (9) a 3 = 1001, p (a 3 ) = 0.01 (10) a 4 = 1111, p (a 4 ) = 0.02 (11). (12) 9 / 29

13 Sutekever and Hinton s Method 2008 [SuHi08] Consider two consecutive layers h and v of size n, with W ij the weight linking unit v i to unit h j, b i the bias of unit v i and w a positive scalar. For every positive scalar ε (0 < ε < 1), there is a weight vector W i,: and a real b i such that P(v i = h i h) = 1 ε. Indeed, setting: W ii = 2w W ij = 0 for i j b i = w yields a total input to unit v i of: I (v i, h) = 2wh i w (13) Therefore, if w = σ 1 (1 ε), we have P(v i = h i h) = 1 ε. 10 / 29

14 Sutekever and Hinton s Method 2008 [SuHi08] W ii = 2w W ij = 0 for i j b i = w With I (v i, h) = 2wh i w and w = σ 1 (1 ε): P(v i = 1 h i = 1) = σ(w) = 1 ε. (14) P(v i = 0 h i = 0) = 1 P(v i = 1 h i = 0) (15) = 1 σ( w) (16) = σ(w) = 1 ε. (17) 11 / 29

15 Transfer of Probability This is what Sutskever and Hinton call transfer of probability. 12 / 29

16 Transfer of Probability 13 / 29

17 Transfer of Probability Number of parameters: We need 3(n + 1) 2 2 n parameters or 3 2 n layers. 13 / 29

18 Le Roux and Bengio s Method 2010 [LeBe10]: Gray Codes Gray codes [Gray53] are sequences (a i ) 1 i 2 n k {a k } = {0, 1} n such that: k s.t. 2 k 2 n, a k a k 1 H = 1 where H is the Hamming distance Example for n = 4: a 1 = 0000 a 5 = a 2 = 0001 a 6 = a 3 = 0011 a 7 = a 4 = / 29

19 Theorem 1 Let a t be an arbitrary binary vector in {0, 1} n with its last bit equal to 0 and p a scalar. For every positive scalar ε (0 < ε < 1), there is a weight vector W n,: and a real b n such that: if the binary vector h is not equal to a t, the last bit remains unchanged with probability greater than or equal to 1 ε, that is P(v n = h n h a t ) > (1 ε). if the binary vector h is equal to a t, its last bit is switched from 0 to 1 with probability σ(p). 15 / 29

20 Parameters for Theorem 1 With the following the weights and biases the result in Theorem 1 is achievable: W nj = w, 1 j k W nj = w, k + 1 j n 1 W nn = nw b n = kw + p Number of parameters: We need n 2 2 n parameters or 2 n layers. 16 / 29

21 Simultaneous Transfer of Probability 1 vector at a time. N vectors at a time. We need n2 n parameters or 2n n layers. 17 / 29

22 Arrangement of Gray Codes Let n = 2 t. There exist n sequences of vectors of n bits S i, 0 i n 1 composed of vectors S i,k, 1 k 2n satisfying the following conditions: n 1 {S 0,..., S n 1 } is a partition of the set of all vectors of n bits. Example for n = 4: S 0 S 1 S 2 S / 29

23 Arrangement of Gray Codes Let n = 2 t. There exist n sequences of vectors of n bits S i, 0 i n 1 composed of vectors S i,k, 1 k 2n satisfying the following conditions: n 1 {S 0,..., S n 1 } is a partition of the set of all vectors of n bits. 2 Every sub-sequence S i satisfies the second property of Gray codes: The Hamming distance between S i,k and S i,k+1 is 1. Example for n = 4: S 0 S 1 S 2 S / 29

24 Arrangement of Gray Codes Let n = 2 t. There exist n sequences of vectors of n bits S i, 0 i n 1 composed of vectors S i,k, 1 k 2n satisfying the following conditions: n 1 {S 0,..., S n 1 } is a partition of the set of all vectors of n bits. 2 Every sub-sequence S i satisfies the second property of Gray codes: The Hamming distance between S i,k and S i,k+1 is 1. 3 For any two sub-sequences S i and S j the bit switched between consecutive vectors (S i,k and S i,k+1 or S j,k and S j,k+1 ) is different unless the Hamming distance between S i,k and S j,k is 1. Example for n = 4: S 0 S 1 S 2 S / 29

25 End Game Can we retain the universal approximation property of DBN by transferring probability to n vectors at a time? For any binary vector x of length n, can we still find weights and biases such that P(h 0 = x) = p (x)? N vectors at a time. 21 / 29

26 Lemma Let p be an arbitrary distribution over vectors of n bits, where n is again a power of two. A DBN with 2n n + 1 layers such that: 1 for each i, 0 i n 1, the top RBM between layers h 2n n and h 2n n 1 assigns probability k p (S i,k ) to S i,1. 22 / 29

27 Lemma Let p be an arbitrary distribution over vectors of n bits, where n is again a power of two. A DBN with 2n n + 1 layers such that: 1 for each i, 0 i n 1, the top RBM between layers h 2n n and h 2n n 1 assigns probability k p (S i,k ) to S i,1. 2 for each i, 0 i n 1 and each k, 1 k 2n n 1, we have P(h 2n n (k+1) = S i,k+1 h 2n n (k) = S i,k ) = P(h 2n n (k+1) = S i,k h 2n n (k) = S i,k ) = 2n n t=k+1 p (S i,t ) (18) 2n n t=k p (S i,t ) p (S i,k ) (19) 2n n t=k p (S i,t ) 22 / 29

28 Lemma Let p be an arbitrary distribution over vectors of n bits, where n is again a power of two. A DBN with 2n n + 1 layers such that: 1 for each i, 0 i n 1, the top RBM between layers h 2n n and h 2n n 1 assigns probability k p (S i,k ) to S i,1. 2 for each i, 0 i n 1 and each k, 1 k 2n n 1, we have P(h 2n n (k+1) = S i,k+1 h 2n n (k) = S i,k ) = P(h 2n n (k+1) = S i,k h 2n n (k) = S i,k ) = 2n n t=k+1 p (S i,t ) (18) 2n n t=k p (S i,t ) p (S i,k ) (19) 2n n t=k p (S i,t ) 3 for each k, 1 k 2n n 1, we have P(h 2n n (k+1) = a h 2n n (k) = a) = 1 if a / i S i,k (20) has p as its marginal distribution over h / 29

29 Proof of the Lemma Let x be an arbitrary binary vector of n bits; there is a pair (i, k) such that x = S i,k. We need to show that: P(h 0 = S i,k ) = p (S i,k ). (21) 23 / 29

30 Proof of the Lemma Let x be an arbitrary binary vector of n bits; there is a pair (i, k) such that x = S i,k. We need to show that: P(h 0 = S i,k ) = p (S i,k ). (21) Example for n = 4: if x = S 22 P(h 0 = S 22 ) = p (S 22 ). (22) 23 / 29

31 Proof of the Lemma The marginal probability of h 0 = S i,k is therefore equal to: P(h 0 = S i,k ) = P(h 2n n 1 = S i,1 ) (23) k 1 ) P (h 2n n (t+1) = S i,t+1 h 2n n t = S i,t (24) t=1 ) P (h 2n n (k+1) = S i,k h 2n n k = S i,k (25) 2 n n 1 t=k+1 ) P (h 2n n (t+1) = S i,k h 2n n t = S i,k (26) 24 / 29

32 Proof of the Lemma By replacing each of those probabilities by the ones given in the Lemma we get: P(h 0 = S i,k ) = 2 n n u=1 k 1 t=1 p (S i,u ) (27) 2n n u=t+1 p (S i,u ) 2n n u=t p (S i,u ) p (S i,k ) 2n n u=k p (S i,u ) (28) (29) 1 2n n 1 k (30) = p (S i,k ) (31) The last result comes from the cancellation of consecutive terms in the product. This concludes the proof. 25 / 29

33 Theorem 4 If n = 2 t, a DBN composed of 2n n + 1 layers of size n is a universal approximator of distributions over vectors of size n. 26 / 29

34 Theorem 4 If n = 2 t, a DBN composed of 2n n + 1 layers of size n is a universal approximator of distributions over vectors of size n. Proof of Theorem 4: Using Lemma, we now show that it is possible to construct such a DBN. First, Le Roux and Bengio (2008) showed that an RBM with n hidden units can model any distribution which assigns a non-zero probability to at most n vectors. Property 1 of the Lemma can therefore be achieved. 26 / 29

35 Proof of Theorem 4 All the subsequent layers are as follows. At each layer, the first t bits of h k+1 are copied to the first t bits of h k with probability arbitrarily close to 1. This is possible as proven earlier. 27 / 29

36 Proof of Theorem 4 All the subsequent layers are as follows. At each layer, the first t bits of h k+1 are copied to the first t bits of h k with probability arbitrarily close to 1. This is possible as proven earlier. At each layer, n/2 of the remaining n t bits are potentially changed to move from one vector in a Gray code sequence to the next with the correct probability (as defined in the Lemma). 27 / 29

37 Proof of Theorem 4 All the subsequent layers are as follows. At each layer, the first t bits of h k+1 are copied to the first t bits of h k with probability arbitrarily close to 1. This is possible as proven earlier. At each layer, n/2 of the remaining n t bits are potentially changed to move from one vector in a Gray code sequence to the next with the correct probability (as defined in the Lemma). The remaining n/2 t bits are copied from h k+1 to h k with probability arbitrarily close to 1. Such layers are arbitrarily close to fulfilling the requirements of the second property of the Lemma. This concludes the proof. 27 / 29

38 Conclusion Deep belief networks are compact universal approximators: Sutskever and Hinton method (2008) Transfer of probability We need 3(n + 1) 2 2 n parameters or 3 2 n layers. LeRoux and Bengio improvements (2009) Gray codes Simultaneous transfer of probability We need n2 n parameters or 2n n layers (given n is a power of 2). 28 / 29

39 References: Le Roux, N., Bengio, Y. (2010). Deep belief networks are compact universal approximators. Neural computation, 22(8), Le Roux, N. and Bengio, Y. (2008). Representational power of restricted boltzmann machines and deep belief networks. Neural Computation, 20(6), Sutskever, I. and Hinton, G. E. (2008). Deep, narrow sigmoid belief networks are universal approximators. Neural Computation, 20(11), Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, Gray, F. (1953). Pulse code communication. U.S. Patent 2,632,058. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4), / 29

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

Mixture Models and Representational Power of RBM s, DBN s and DBM s

Mixture Models and Representational Power of RBM s, DBN s and DBM s Mixture Models and Representational Power of RBM s, DBN s and DBM s Guido Montufar Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. montufar@mis.mpg.de Abstract

More information

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Learning Deep Architectures for AI. Part I - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Density functionals from deep learning

Density functionals from deep learning Density functionals from deep learning Jeffrey M. McMahon Department of Physics & Astronomy March 15, 2016 Jeffrey M. McMahon (WSU) March 15, 2016 1 / 18 Kohn Sham Density-functional Theory (KS-DFT) The

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines

Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines Christopher Tosh Department of Computer Science and Engineering University of California, San Diego ctosh@cs.ucsd.edu Abstract The

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

arxiv: v1 [cs.ne] 6 May 2014

arxiv: v1 [cs.ne] 6 May 2014 Training Restricted Boltzmann Machine by Perturbation arxiv:1405.1436v1 [cs.ne] 6 May 2014 Siamak Ravanbakhsh, Russell Greiner Department of Computing Science University of Alberta {mravanba,rgreiner@ualberta.ca}

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

arxiv: v1 [cs.lg] 30 Sep 2018

arxiv: v1 [cs.lg] 30 Sep 2018 Deep, Skinny Neural Networks are not Universal Approximators arxiv:1810.00393v1 [cs.lg] 30 Sep 2018 Jesse Johnson Sanofi jejo.math@gmail.com October 2, 2018 Abstract In order to choose a neural network

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Machine Learning

Machine Learning Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Expressive Power and Approximation Errors of Restricted Boltzmann Machines

Expressive Power and Approximation Errors of Restricted Boltzmann Machines Expressive Power and Approximation Errors of Restricted Boltzmann Machines Guido F. Montúfar, Johannes Rauh, and Nihat Ay, Max Planck Institute for Mathematics in the Sciences, Inselstraße 0403 Leipzig,

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined

More information

Machine Learning

Machine Learning Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

ARestricted Boltzmann machine (RBM) [1] is a probabilistic

ARestricted Boltzmann machine (RBM) [1] is a probabilistic 1 Matrix Product Operator Restricted Boltzmann Machines Cong Chen, Kim Batselier, Ching-Yun Ko, and Ngai Wong chencong@eee.hku.hk, k.batselier@tudelft.nl, cyko@eee.hku.hk, nwong@eee.hku.hk arxiv:1811.04608v1

More information

Understanding Neural Networks : Part I

Understanding Neural Networks : Part I TensorFlow Workshop 2018 Understanding Neural Networks Part I : Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Neural Networks

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Neural Networks. Hopfield Nets and Auto Associators Fall 2017 Neural Networks Hopfield Nets and Auto Associators Fall 2017 1 Story so far Neural networks for computation All feedforward structures But what about.. 2 Loopy network Θ z = ቊ +1 if z > 0 1 if z 0 y i

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION SHIYU LIANG THESIS

WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION SHIYU LIANG THESIS c 2017 Shiyu Liang WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION BY SHIYU LIANG THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Srinjoy Das Department of Electrical and Computer Engineering University of California, San Diego srinjoyd@gmail.com Bruno Umbria

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Deep Boltzmann Machines

Deep Boltzmann Machines Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish

More information

Simple neuron model Components of simple neuron

Simple neuron model Components of simple neuron Outline 1. Simple neuron model 2. Components of artificial neural networks 3. Common activation functions 4. MATLAB representation of neural network. Single neuron model Simple neuron model Components

More information

Multilayer feedforward networks are universal approximators

Multilayer feedforward networks are universal approximators Multilayer feedforward networks are universal approximators Kur Hornik, Maxwell Stinchcombe and Halber White (1989) Presenter: Sonia Todorova Theoretical properties of multilayer feedforward networks -

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Neural Networks: A Very Brief Tutorial

Neural Networks: A Very Brief Tutorial Neural Networks: A Very Brief Tutorial Chloé-Agathe Azencott Machine Learning & Computational Biology MPIs for Developmental Biology & for Intelligent Systems Tübingen (Germany) cazencott@tue.mpg.de October

More information

The Recurrent Temporal Restricted Boltzmann Machine

The Recurrent Temporal Restricted Boltzmann Machine The Recurrent Temporal Restricted Boltzmann Machine Ilya Sutskever, Geoffrey Hinton, and Graham Taylor University of Toronto {ilya, hinton, gwtaylor}@cs.utoronto.ca Abstract The Temporal Restricted Boltzmann

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Lecture 13: Introduction to Neural Networks

Lecture 13: Introduction to Neural Networks Lecture 13: Introduction to Neural Networks Instructor: Aditya Bhaskara Scribe: Dietrich Geisler CS 5966/6966: Theory of Machine Learning March 8 th, 2017 Abstract This is a short, two-line summary of

More information

10. Artificial Neural Networks

10. Artificial Neural Networks Foundations of Machine Learning CentraleSupélec Fall 217 1. Artificial Neural Networks Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

CSE446: Neural Networks Winter 2015

CSE446: Neural Networks Winter 2015 CSE446: Neural Networks Winter 2015 Luke Ze

More information

Introduction to Neural Networks: Structure and Training

Introduction to Neural Networks: Structure and Training Introduction to Neural Networks: Structure and Training Professor Q.J. Zhang Department of Electronics Carleton University, Ottawa, Canada www.doe.carleton.ca/~qjz, qjz@doe.carleton.ca A Quick Illustration

More information

Artificial Neural Networks

Artificial Neural Networks 0 Artificial Neural Networks Based on Machine Learning, T Mitchell, McGRAW Hill, 1997, ch 4 Acknowledgement: The present slides are an adaptation of slides drawn by T Mitchell PLAN 1 Introduction Connectionist

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/62 Lecture 1b Logistic regression & neural network October 2, 2015 2/62 Table of contents 1 1 Bird s-eye

More information

Bidirectional Representation and Backpropagation Learning

Bidirectional Representation and Backpropagation Learning Int'l Conf on Advances in Big Data Analytics ABDA'6 3 Bidirectional Representation and Bacpropagation Learning Olaoluwa Adigun and Bart Koso Department of Electrical Engineering Signal and Image Processing

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information