Denoising Autoencoders
|
|
- Anissa Tyler
- 6 years ago
- Views:
Transcription
1 Denoising Autoencoders Oliver Worm, Daniel Leinfelder Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
2 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
3 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
4 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
5 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
6 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
7 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b input x reconstructed input z Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
8 Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
9 Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error L(x, z) = x z 2 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
10 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
11 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
12 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
13 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
14 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
15 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
16 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
17 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer 2 remove autoencoder construct and use the learned mapping f θ directly on the input Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
18 2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer
19 2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer
20 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
21 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
22 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
23 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
24 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
25 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
26 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
27 Results Dataset SVM rbf SAA-3 DBN-3 SdA-3 (v%) basic 3.03± ± ± ± 0.14 (10) rot 11.11± ± ± ± 0.27 (10) bg-rand 14.58± ± ± ± 0.27 (40) bg-img 22.61± ± ± ± 0.33 (25) ro-b-im 55.18± ± ± ± 0.44 (25) rect 2.15± ± ± ± 0.12 (10) rect-img 24.04± ± ± ± 0.36 (25) convex 19.13± ± ± ± 0.34 (10) MNIST data set Test error rate with a 95% confidence interval [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
28 Results v = 0% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
29 Results v = 10% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
30 Results v = 25% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
31 Results v = 50% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
32 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
33 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
34 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
35 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
36 References [BLP + 07] [HOT06] [PCL06] [RHW88] [VLBM08] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, Universite De Montreal, and Montreal Quebec. Greedy layer-wise training of deep networks. In In NIPS. MIT Press, Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7): , July Christopher Poultney, Sumit Chopra, and Yann Lecun. Efficient learning of sparse representations with an energy-based model. In Advances in Neural Information Processing Systems (NIPS MIT Press, David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Neurocomputing: Foundations of research. chapter Learning Representations by Back-propagating Errors, pages MIT Press, Cambridge, MA, USA, Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML 08, pages , New York, NY, USA, ACM. Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11
Learning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationDeep Learning & Neural Networks Lecture 2
Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation
More informationNeural Networks: A Very Brief Tutorial
Neural Networks: A Very Brief Tutorial Chloé-Agathe Azencott Machine Learning & Computational Biology MPIs for Developmental Biology & for Intelligent Systems Tübingen (Germany) cazencott@tue.mpg.de October
More informationNonlinear system modeling with deep neural networks and autoencoders algorithm
Nonlinear system modeling with deep neural networks and autoencoders algorithm Erick De la Rosa, Wen Yu Departamento de Control Automatico CINVESTAV-IPN Mexico City, Mexico yuw@ctrl.cinvestav.mx Xiaoou
More informationarxiv: v3 [cs.lg] 18 Mar 2013
Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationAn Introduction to Deep Learning
An Introduction to Deep Learning Ludovic Arnold 1,2, Sébastien Rebecchi 1, Sylvain Chevallier 1, Hélène Paugam-Moisy 1,3 1- Tao, INRIA-Saclay, LRI, UMR8623, Université Paris-Sud 11 F-91405 Orsay, France
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationarxiv: v5 [cs.lg] 19 Aug 2014
What Regularized Auto-Encoders Learn from the Data Generating Distribution Guillaume Alain and Yoshua Bengio guillaume.alain@umontreal.ca, yoshua.bengio@umontreal.ca arxiv:111.446v5 cs.lg] 19 Aug 014 Department
More informationarxiv: v1 [cs.lg] 30 Jun 2012
Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders arxiv:1207.0057v1 [cs.lg] 30 Jun 2012 Yoshua Bengio, Guillaume Alain, and Salah Rifai Department of Computer Science and
More informationDeep Generative Stochastic Networks Trainable by Backprop
Yoshua Bengio FIND.US@ON.THE.WEB Éric Thibodeau-Laufer Guillaume Alain Département d informatique et recherche opérationnelle, Université de Montréal, & Canadian Inst. for Advanced Research Jason Yosinski
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationNeural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationKnowledge Extraction from Deep Belief Networks for Images
Knowledge Extraction from Deep Belief Networks for Images Son N. Tran City University London Northampton Square, ECV 0HB, UK Son.Tran.@city.ac.uk Artur d Avila Garcez City University London Northampton
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationA Connection Between Score Matching and Denoising Autoencoders
A Connection Between Score Matching and Denoising Autoencoders Pascal Vincent vincentp@iro.umontreal.ca Dept. IRO, Université de Montréal, CP 68, Succ. Centre-Ville, Montréal (QC) H3C 3J7, Canada. Technical
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationDeep Learning Autoencoder Models
Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative
More informationEmpirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines
Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Asja Fischer and Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum,
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationA Connection Between Score Matching and Denoising Autoencoders
NOTE CommunicatedbyAapoHyvärinen A Connection Between Score Matching and Denoising Autoencoders Pascal Vincent vincentp@iro.umontreal.ca Département d Informatique, UniversitédeMontréal, Montréal (QC)
More information[6, 7], SVM, [7].,, SVM,, SVM,,,,,,, 0.01%,,,,.,,,, SVM,,,, [9], ( ) Soft Confidence-Weighted Learning (SCW)[8],, SCW[8], ( ),,, ( ),,,, , [
SCW Predicting stock fluctuations using Two-level Mapping and SCW 1 Muhtar Fukuda 1 1 1 Faculty of Environmental and Information Studies, Nagoya Sangyo University Abstract: Due to high uncertainty in the
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationarxiv: v1 [stat.ml] 2 Sep 2014
On the Equivalence Between Deep NADE and Generative Stochastic Networks Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université de Montréal
More informationUsing Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada
More informationA Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning
A Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning Norbert Ayine Agana Advisor: Abdollah Homaifar Autonomous Control & Information Technology Institute
More informationarxiv: v1 [stat.ml] 30 Mar 2016
A latent-observed dissimilarity measure arxiv:1603.09254v1 [stat.ml] 30 Mar 2016 Yasushi Terazono Abstract Quantitatively assessing relationships between latent variables and observed variables is important
More informationEE-559 Deep learning 9. Autoencoders and generative models
EE-559 Deep learning 9. Autoencoders and generative models François Fleuret https://fleuret.org/dlc/ [version of: May 1, 2018] ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Embeddings and generative models
More informationAn efficient way to learn deep generative models
An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye
More informationDeep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationarxiv: v1 [cs.lg] 10 Jun 2016
Deep Directed Generative Models with Energy-Based Probability Estimation arxiv:1606.03439v1 [cs.lg] 10 Jun 2016 Taesup Kim, Yoshua Bengio Department of Computer Science and Operations Research Université
More informationAuto-Encoders & Variants
Auto-Encoders & Variants 113 Auto-Encoders MLP whose target output = input Reconstruc7on=decoder(encoder(input)), input e.g. x code= latent features h encoder decoder reconstruc7on r(x) With bo?leneck,
More informationOn autoencoder scoring
Hanna Kamyshanska kamyshanska@fias.uni-frankfurt.de Goethe Universität Frankfurt, Robert-Mayer-Str. 11-15, 60325 Frankfurt, Germany Roland Memisevic memisevr@iro.umontreal.ca University of Montreal, CP
More informationarxiv: v2 [cs.lg] 14 Dec 2012
ADVANCES IN OPTIMIZING RECURRENT NETWORKS Yoshua Bengio, Nicolas Boulanger-Lewandowski and Razvan Pascanu U. Montreal arxiv:1212.0901v2 [cs.lg] 14 Dec 2012 ABSTRACT After a more than decade-long period
More informationSelf-paced Convolutional Neural Networks
Self-paced Convolutional Neural Networks Hao Li, Maoguo Gong Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education Xidian University, Xi an, China omegalihao@gmail.com,
More informationarxiv: v3 [cs.lg] 9 Aug 2016
Yoshua Bengio 1, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard and Zhouhan Lin Montreal Institute for Learning Algorithms, University of Montreal, Montreal, QC, H3C 3J7 1 CIFAR Senior Fellow arxiv:1502.04156v3
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationarxiv: v1 [cs.lg] 11 May 2015
Improving neural networks with bunches of neurons modeled by Kumaraswamy units: Preliminary study Jakub M. Tomczak JAKUB.TOMCZAK@PWR.EDU.PL Wrocław University of Technology, wybrzeże Wyspiańskiego 7, 5-37,
More informationDeep Learning of Invariant Spatio-Temporal Features from Video
Deep Learning of Invariant Spatio-Temporal Features from Video Bo Chen California Institute of Technology Pasadena, CA, USA bchen3@caltech.edu Benjamin M. Marlin University of British Columbia Vancouver,
More informationFreezeOut: Accelerate Training by Progressively Freezing Layers
FreezeOut: Accelerate Training by Progressively Freezing Layers Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,
More informationUnsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih
Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition
More informationDeep Recurrent Neural Networks
Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationTo go deep or wide in learning?
Gaurav Pandey and Ambedkar Dukkipati Department of Computer Science and Automation Indian Institute of Science, Bangalore 5600, India Abstract To achieve acceptable performance for AI tasks, one can either
More informationFast Learning with Noise in Deep Neural Nets
Fast Learning with Noise in Deep Neural Nets Zhiyun Lu U. of Southern California Los Angeles, CA 90089 zhiyunlu@usc.edu Zi Wang Massachusetts Institute of Technology Cambridge, MA 02139 ziwang.thu@gmail.com
More informationRepresentational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some
More informationScalable Gaussian Process Regression Using Deep Neural Networks
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Scalable Gaussian Process Regression Using Deep eural etworks Wenbing Huang 1, Deli Zhao 2, Fuchun
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationUnsupervised Feature Learning from Temporal Data
Unsupervised Feature Learning from Temporal Data Rostislav Goroshin 1 goroshin@cims.nyu.edu Joan Bruna 1 bruna@cims.nyu.edu Jonathan Tompson 1 tompson@cims.nyu.edu Arthur Szlam 2 aszlam@ccny.cuny.edu David
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationarxiv: v1 [cs.lg] 25 Jun 2015
Conservativeness of untied auto-encoders arxiv:1506.07643v1 [cs.lg] 25 Jun 2015 Daniel Jiwoong Im Montreal Institute for Learning Algorithms University of Montreal Montreal, QC, H3C 3J7 imdaniel@iro.umontreal.ca
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationDeep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia
Deep Learning of Invariant Spatiotemporal Features from Video Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Introduction Focus: Unsupervised feature extraction from
More informationTempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines
Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines Desjardins, G., Courville, A., Bengio, Y., Vincent, P. and Delalleau, O. Technical Report 1345, Dept. IRO, Université de
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationCardinality Restricted Boltzmann Machines
Cardinality Restricted Boltzmann Machines Kevin Swersky Daniel Tarlow Ilya Sutskever Dept. of Computer Science University of Toronto [kswersky,dtarlow,ilya]@cs.toronto.edu Ruslan Salakhutdinov, Richard
More informationSupervised Learning Part I
Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationNeural Networks and Deep Learning.
Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden
More informationAn Efficient Learning Procedure for Deep Boltzmann Machines
ARTICLE Communicated by Yoshua Bengio An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov rsalakhu@utstat.toronto.edu Department of Statistics, University of Toronto, Toronto,
More informationReward-modulated inference
Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement
More informationA Unified Energy-Based Framework for Unsupervised Learning
A Unified Energy-Based Framework for Unsupervised Learning Marc Aurelio Ranzato Y-Lan Boureau Sumit Chopra Yann LeCun Courant Insitute of Mathematical Sciences New York University, New York, NY 10003 Abstract
More informationDeep Learning. Alexandre Allauzen, Michèle Sebag, Yann Ollivier CNRS & Université Paris-Sud
Deep Learning Alexandre Allauzen, Michèle Sebag, Yann Ollivier CNRS & Université Paris-Sud Nov. 23rd, 2016 Credit for slides: Yoshua Bengio, Yann Le Cun, Nando de Freitas, Christian Perone, Honglak Lee
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationNeural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)
Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined
More informationarxiv: v1 [cs.lg] 6 Nov 2016
GENERATIVE ADVERSARIAL NETWORKS AS VARIA- TIONAL TRAINING OF ENERGY BASED MODELS Shuangfei Zhai Binghamton University Vestal, NY 13902, USA szhai2@binghamton.edu Yu Cheng IBM T.J. Watson Research Center
More informationRAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY
RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA
More informationUNDERSTANDING LOCAL MINIMA IN NEURAL NET-
UNDERSTANDING LOCAL MINIMA IN NEURAL NET- WORKS BY LOSS SURFACE DECOMPOSITION Anonymous authors Paper under double-blind review ABSTRACT To provide principled ways of designing proper Deep Neural Network
More informationDeep Learning Architectures and Algorithms
Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning
More informationApproximation properties of DBNs with binary hidden units and real-valued visible units
Approximation properties of DBNs with binary hidden units and real-valued visible units Oswin Krause Oswin.Krause@diku.dk Department of Computer Science, University of Copenhagen, 100 Copenhagen, Denmark
More informationarxiv: v2 [stat.ml] 18 Jun 2017
FREEZEOUT: ACCELERATE TRAINING BY PROGRES- SIVELY FREEZING LAYERS Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,
More informationDeep Learning Made Easier by Linear Transformations in Perceptrons
Deep Learning Made Easier by Linear Transformations in Perceptrons Tapani Raiko Aalto University School of Science Dept. of Information and Computer Science Espoo, Finland firstname.lastname@aalto.fi Harri
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationMeasuring Invariances in Deep Networks
Measuring Invariances in Deep Networks Ian J. Goodfellow, Quoc V. Le, Andrew M. Saxe, Honglak Lee, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 9435 {ia3n,quocle,asaxe,hllee,ang}@cs.stanford.edu
More informationIntroduction to Deep Learning
Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function
More informationarxiv: v4 [stat.ml] 8 Jan 2016
DROPOUT AS DATA AUGMENTATION Xavier Bouthillier Université de Montréal, Canada xavier.bouthillier@umontreal.ca Pascal Vincent Université de Montréal, Canada and CIFAR pascal.vincent@umontreal.ca Kishore
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationLearning Dynamics of Linear Denoising Autoencoders
Arnu Pretorius 1 2 Steve Kroon 1 2 Herman Kamper 3 Abstract Denoising autoencoders (DAEs) have proven useful for unsupervised representation learning, but a thorough theoretical understanding is still
More information