Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
|
|
- Aubrey Wilcox
- 6 years ago
- Views:
Transcription
1 Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto
2 Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep feed-forward neural networks Can the Boltzmann machine modeling framework be useful on its own?
3 Restricted Boltzmann Machine b j h hidden units (binary) bias W connections c k x input units (binary) Energy function : E(x, h) = h Wx c x b h = W jk h j x k c k x k jk k j b j h j Distribution: p(x, h) = exp( E(x, h))/z intractable in general
4 Inference x h p(h x) = j p(h j = 1 x) = p(h j x) exp( (b j + W j x)) = sigm(b j + W j x) h p(x h) = k p(x k h) x p(x k = 1 h) = exp( (c k + h W k )) = sigm(c k + h W k )
5 Inference h p(x) = h {0,1} H p(x, h) = h {0,1} H exp( E(x, h))/z x exp( E(x, h))/z = h {0,1} H h 1 {0,1} free energy = exp c x + = exp( F (x))/z F exp( E(x, h))/z h H {0,1} H j=1 log(1 + exp(b j + W j x)) /Z F
6 Inference: in general h marg { x unk : unknown variables, whose distribution we are interested in h x cond : known variables, on which we condition { x cond { x unk p(x unk x cond ) = x h marg : unknown variables, over which we marginalize exp(e(x h marg unk, x cond, h marg )) x exp(e(x unk h marg unk, x cond, h marg ))
7 Learning To train an RBM, we minimize the negative log-likelihood (NLL) of the data L gen (D train ) = 1 n x t D train log p(x t ) We proceed by stochastic gradient descent log p(x t ) θ = E h [ E(xt, h) θ ] x t [ ] E(x, h) E x,h θ { positive phase { negative phase
8 Contrastive Divergence (CD) (Hinton, Neural Computation, 2002) Idea: 1. replace expectation by an evaluation at a single point x, h 2. obtain x, h by MCMC sampling h t = h start sampling chain at x t h k = h p(h x) p(x h) x t x 1 x k = x
9 Contrastive Divergence (CD) (Hinton, Neural Computation, 2002) E h [ E(xt, h) θ ] x t E(x t, h t ) θ E x,h [ E(x, h) θ ] E( x, h) θ E(x, h) (x t, h t ) ( x, h)
10 Contrastive Divergence (CD) (Hinton, Neural Computation, 2002) CD-k: contrastive divergence with k Gibbs sampling steps In general, the bigger k is, the less biased is the estimate of the gradient In practice, k=1 works well For an RBM, the parameter updates won t actually need samples for h
11 Other learning algorithm: Persistent CD (PCD) (Tieleman, ICML2008) Idea: instead of always reinitializing the Gibbs chain to, use from the previous iteration instead x t x h t = h 0... h k = h p(h x) p(x h) x t x 1 x k = x
12 Other learning algorithm: Persistent CD (PCD) (Tieleman, ICML2008) Idea: instead of always reinitializing the Gibbs chain to, use from the previous iteration instead x t x h t = h 0... h k = h p(h x) p(x h) x 1 x k = x
13 Other learning algorithm: Persistent CD (PCD) (Tieleman, ICML2008) Idea: instead of always reinitializing the Gibbs chain to, use from the previous iteration instead x t x h t = h 0... h k = h p(h x) p(x h) x comes from previous iteration x 1 x k = x
14 Plan RBMs can initialize deep neural networks, but not a Boltzmann machine anymore Three extensions to RBMs: Classification RBM Restricted Boltzmann Forest Deep Boltzmann Machine (DBM)
15 Classification Restricted Boltzmann Machine b j h hidden units (binary) W connections c k x input units (binary) Energy function : E(x, h) = h Wx c x b h = W jk h j x k c k x k jk k j b j h j Distribution: p(x, h) = exp( E(x, h))/z
16 Classification Restricted Boltzmann Machine b j h hidden units (binary) W connections c k x input units (binary) Energy function : = h Wx c x b h = W jk h j x k c k x k jk k j b j h j Distribution:
17 Classification Restricted Boltzmann Machine b j h hidden units (binary) U W connections target units (multinomial) y c k x input units (binary) Energy function : E(x, h, y) = h Wx c x b h h Uy a y = W jk h j x k c k x k b j h j jk k j jl U jl h j y l l a l y l Distribution: p(x, h, y) = exp( E(x, h, y))/z
18 Inference: in general h marg { x unk : unknown variables, whose distribution we are interested in h x cond : known variables, on which we condition { x cond { x unk p(x unk x cond ) = x h marg : unknown variables, over which we marginalize exp(e(x h marg unk, x cond, h marg )) x exp(e(x unk h marg unk, x cond, h marg ))
19 Discriminative training (DRBM) (Larochelle and Bengio, ICML2008) What if we ditional are only likelihood interested of the in classification: targets: L disc (D train ) = D train i=1 log p(y i x i ) Since p(y x) can be computed exactly, we can also compute the discriminative gradient exactly: log p(y i x i ) θ where F (y, x) = d y = [ ] θ F (y i, x i ) E y xi θ F (y, x i) ( ( Hn log 1 + exp c j + U jy + j=1 )) d W ji x i i=1
20 Hybrid generative/discriminative training (HDRBM) (Larochelle and Bengio, ICML2008) We can explore the space between generative and discriminative training with (Bouchard & Triggs, 2004) α L hybrid (D train ) = L disc (D train ) + αl gen (D train ) Perform two updates, one according to each criterion Generative criterion acts as a data-dependent regulariser
21 Experiment: MNIST Model Error RBM (λ = 0.005, n = 6000) 3.39% DRBM (λ = 0.05, n = 500) 1.81% RBM+NNet 1.41% HDRBM (α = 0.01, λ = 0.05, n = 1500 ) 1.28% Sparse HDRBM (idem + n = 3000, δ = 10 4 ) 1.16% SVM 1.40% NNet 1.93% Subset of filters learned by the HDRBM on the MNIST dataset Some filters act as edge detectors (first row), some other filters are more specific to a particular digit shape (second row)
22 Experiment: 20newsgroup Model Error RBM (λ = , n = 1000) 24.9% DRBM (λ = , n = 50) 27.6% RBM+NNet 26.8% HDRBM (α = 0.005, λ = 0.1, n = 1000 ) 23.8% SVM 32.8% NNet 28.2% Newsgroup Most influencial words comp.sys.ibm.pc.hardware dos, ide, adaptec, pc, config, irq, vlb, bios, scsi, esdi, comp.sys.mac.hardware apple, mac, quadra, powerbook, lc, pds, centris alt.atheism bible, atheists, benedikt, atheism, religion talk.politics.guns firearms, handgun, firearm, gun, rkba, concealed rec.sport.hockey playoffs, penguins, didn, playoff, game, out, play
23 Restricted Boltzmann Forest (Larochelle et al., Neural Computation, to appear) How to obtain a tractable density estimator from an RBM? Answer: constrain RBM such that we can compute Z! Z = h {0,1} H x {0,1} d exp( E(x, h)) Z = h {0,1} H exp( F (h))number of hidden units must be small
24 Restricted Boltzmann Forests (Larochelle et al., Neural Computation, to appear) Alternative: constrain activations using a tree h W x
25 Restricted Boltzmann Forests (Larochelle et al., Neural Computation, to appear) Conditioned on x, the distribution of each tree is independant Using belief propagation (sum-product) can: sample hidden units compute unit expectations (probabilities) sum out all hidden units
26 Experiments Dataset Nb. Set sizes Diff. NLL Diff. NLL Diff. NL inputs (Train,Valid,Test) RBM RBM RBForest mult. RBFore adult ,1414, ± ± ± 0. connect ,4000, ± ± ± 0. dna ,600, ± ± ± 0. mushrooms ,500, ± ± ± 0. nips ,100, ± ± ± 1. ocr-letter ,10000, ± ± ± 0. rcv ,10000, ± ± ± 0. web ,3188, ± ± ± 0.
27 Deep Boltzmann Machines (DBM) (Salakhutdinov and Hinton, AISTATS2009) Deep Belief Network Deep Boltzmann Machine h 3 W 3 h 2 W 2 h 1 W 1 v E(v, h; θ) = v W 1 h 1 h 1 W 2 h 2 h 2 W 3 h 3
28 Learning Learning under a DBM requires a similar gradient log P (v t ) θ = E h [ E(vt, h; θ) θ ] v t E v,h [ E(v, h; θ) θ ] { positive phase { negative phase We use Persistent CD for the negative phase The positive phase now must be approximated
29 Learning In positive phase, a variational approach is used Q MF (h v; µ) = F 1 F 2 F 3 j=1 k=1 m=1 Procedure for minimizing { KL(P (h v), Q MF (h v; µ)) q(h 1 j)q(h 2 k)q(h 3 m) where ( D µ 1 j σ i=1 ( F1 µ 2 k σ j=1 ( F2 µ 3 m σ k=1 q(h l i = 1) = µl i for l = 1, 2, 3. nd on the log-probability of the d W 1 ijv i + W 2 jkµ 1 j + W 3 kmµ 2 k F 2 k=1 ). F 3 m=1 W 2 jkµ 2 k ), W 3 kmµ 3 m ),
30 Pretraining h 2 h 2 Pretraining h 3 h 3 W 3 W 3 RBM RBM Deep Boltzmann Machine h 3 W 3 h 2 2W 2 h 1 h 1 W 1 W 1 RBM h 1 W 2 W 1 v v v
31 Efficient Learning of Deep Boltzmann Machines (Salakhutdinov and Larochelle, AISTATS2010) Unfortunately, the learning procedure described so far is quite slow The major bottleneck is variational inference, which requires around 25 passes up and down the network before converging
32 Efficient Learning of Deep Boltzmann Machines (Salakhutdinov and Larochelle, AISTATS2010) Deep Boltzmann Machine Idea: suppose that we have relatively good mean-field recognition net Could initialize meanfield with recognition net output R 3 2R 2 2R 1 W 3 W 2 W 1
33 Efficient Learning of Deep Boltzmann Machines (Salakhutdinov and Larochelle, AISTATS2010) Pseudocode positive phase negative phase { { ν 1. get prediction from recognition net 2. initialize mean-field to µ ν and do only a few steps of mean-field 3. do Persistent CD 4. update DBM parameters 5. update recognition net parameters 6. go back to 1 until converged
34 Experiments Classification Dataset no top-down process DBM inference procedures MF-0 MF-1 MFRec-1 MF-5 MFRec-5 MF-Full DBN SVM K-NN MNIST 1.38% 1.15% 1.00% 1.01% 0.96% 0.95% 1.17% 1.40% 3.09% OCR letters 8.68% 8.44% 8.40% 8.50% 8.48% 8.58% 9.68% 9.70% 18.92% NORB 9.32% 7.96% 7.62% 7.67% 7.46% 7.23% 8.31% 11.60% 18.40% Inference speed on NORB (per 100 examples) MF-Rec1: 0.69 sec MF-5: 3.20 sec MF-full: sec Code is available online!
35 Conclusion Boltzmann machines provide a flexible framework for modeling data with latent variables Using tools from graphical models, it is possible to develop many extensions for the RBM Other variations: Conditional RBM (sequential data) Higher order RBM and more!
36 Merci!
A graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationModeling Documents with a Deep Boltzmann Machine
Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationDensity estimation. Computing, and avoiding, partition functions. Iain Murray
Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationDeep Boltzmann Machines
Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationLearning to Disentangle Factors of Variation with Manifold Learning
Learning to Disentangle Factors of Variation with Manifold Learning Scott Reed Kihyuk Sohn Yuting Zhang Honglak Lee University of Michigan, Department of Electrical Engineering and Computer Science 08
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationEmpirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines
Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Asja Fischer and Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum,
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationAn Efficient Learning Procedure for Deep Boltzmann Machines
ARTICLE Communicated by Yoshua Bengio An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov rsalakhu@utstat.toronto.edu Department of Statistics, University of Toronto, Toronto,
More informationarxiv: v2 [cs.ne] 22 Feb 2013
Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationUnsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih
Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationLearning Deep Boltzmann Machines using Adaptive MCMC
Ruslan Salakhutdinov Brain and Cognitive Sciences and CSAIL, MIT 77 Massachusetts Avenue, Cambridge, MA 02139 rsalakhu@mit.edu Abstract When modeling high-dimensional richly structured data, it is often
More informationJoint Training of Partially-Directed Deep Boltzmann Machines
Joint Training of Partially-Directed Deep Boltzmann Machines Ian J. Goodfellow goodfeli@iro.umontreal.ca Aaron Courville aaron.courville@umontreal.ca Yoshua Bengio Département d Informatique et de Recherche
More informationAn Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey Hinton
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2010-037 August 4, 2010 An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey
More informationarxiv: v4 [cs.lg] 16 Apr 2015
REWEIGHTED WAKE-SLEEP Jörg Bornschein and Yoshua Bengio Department of Computer Science and Operations Research University of Montreal Montreal, Quebec, Canada ABSTRACT arxiv:1406.2751v4 [cs.lg] 16 Apr
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationRepresentational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationAn Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns
An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns Daniel Jiwoong Im, Ethan Buchman, and Graham W. Taylor School of Engineering University of Guelph Guelph,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationDeep Learning & Neural Networks Lecture 2
Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationAnnealing Between Distributions by Averaging Moments
Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationDenoising Autoencoders
Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationDiscriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos
Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationThe Recurrent Temporal Restricted Boltzmann Machine
The Recurrent Temporal Restricted Boltzmann Machine Ilya Sutskever, Geoffrey Hinton, and Graham Taylor University of Toronto {ilya, hinton, gwtaylor}@cs.utoronto.ca Abstract The Temporal Restricted Boltzmann
More informationGaussian-Bernoulli Deep Boltzmann Machine
Gaussian-Bernoulli Deep Boltzmann Machine KyungHyun Cho, Tapani Raiko and Alexander Ilin Aalto University School of Science Department of Information and Computer Science Espoo, Finland {kyunghyun.cho,tapani.raiko,alexander.ilin}@aalto.fi
More informationInductive Principles for Restricted Boltzmann Machine Learning
Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationARestricted Boltzmann machine (RBM) [1] is a probabilistic
1 Matrix Product Operator Restricted Boltzmann Machines Cong Chen, Kim Batselier, Ching-Yun Ko, and Ngai Wong chencong@eee.hku.hk, k.batselier@tudelft.nl, cyko@eee.hku.hk, nwong@eee.hku.hk arxiv:1811.04608v1
More informationDeep Learning Autoencoder Models
Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative
More informationarxiv: v2 [cs.lg] 16 Mar 2013
Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines arxiv:1301.3545v2 cs.lg 16 Mar 2013 Guillaume Desjardins, Razvan Pascanu, Aaron Courville and Yoshua Bengio Département d informatique
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationAlgorithms other than SGD. CS6787 Lecture 10 Fall 2017
Algorithms other than SGD CS6787 Lecture 10 Fall 2017 Machine learning is not just SGD Once a model is trained, we need to use it to classify new examples This inference task is not computed with SGD There
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationAuto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Auto-Encoding Variational Bayes Diederik Kingma and Max Welling Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo J. Rezende, Shakir Mohamed, Daan Wierstra Neural Variational
More informationABC-LogitBoost for Multi-Class Classification
Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12
More informationDeep Belief Networks are Compact Universal Approximators
Deep Belief Networks are Compact Universal Approximators Franck Olivier Ndjakou Njeunje Applied Mathematics and Scientific Computation May 16, 2016 1 / 29 Outline 1 Introduction 2 Preliminaries Universal
More informationRuslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group
NON-LINEAR DIMENSIONALITY REDUCTION USING NEURAL NETORKS Ruslan Salakhutdinov Joint work with Geoff Hinton University of Toronto, Machine Learning Group Overview Document Retrieval Present layer-by-layer
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationAuto-Encoders & Variants
Auto-Encoders & Variants 113 Auto-Encoders MLP whose target output = input Reconstruc7on=decoder(encoder(input)), input e.g. x code= latent features h encoder decoder reconstruc7on r(x) With bo?leneck,
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More informationDeep Learning Architectures and Algorithms
Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning
More informationDeep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain
More informationUsing Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada
More informationarxiv: v1 [cs.ne] 6 May 2014
Training Restricted Boltzmann Machine by Perturbation arxiv:1405.1436v1 [cs.ne] 6 May 2014 Siamak Ravanbakhsh, Russell Greiner Department of Computing Science University of Alberta {mravanba,rgreiner@ualberta.ca}
More informationTempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines
Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines Desjardins, G., Courville, A., Bengio, Y., Vincent, P. and Delalleau, O. Technical Report 1345, Dept. IRO, Université de
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationEnhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines
Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines KyungHyun Cho KYUNGHYUN.CHO@AALTO.FI Tapani Raiko TAPANI.RAIKO@AALTO.FI Alexander Ilin ALEXANDER.ILIN@AALTO.FI Department
More informationModeling Natural Images with Higher-Order Boltzmann Machines
Modeling Natural Images with Higher-Order Boltzmann Machines Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu joint work with Geoffrey Hinton and Vlad Mnih CIFAR
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationHow to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto
1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost
More informationInferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016
Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationDeep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy
Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More information