Denoising Autoencoders

Size: px
Start display at page:

Download "Denoising Autoencoders"

Transcription

1 Denoising Autoencoders Oliver Worm, Daniel Leinfelder Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

2 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

3 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

4 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

5 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

6 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

7 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b input x reconstructed input z Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

8 Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

9 Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error L(x, z) = x z 2 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

10 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

11 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

12 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

13 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

14 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

15 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

16 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

17 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer 2 remove autoencoder construct and use the learned mapping f θ directly on the input Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

18 2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer

19 2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer

20 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

21 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

22 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

23 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

24 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

25 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

26 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

27 Results Dataset SVM rbf SAA-3 DBN-3 SdA-3 (v%) basic 3.03± ± ± ± 0.14 (10) rot 11.11± ± ± ± 0.27 (10) bg-rand 14.58± ± ± ± 0.27 (40) bg-img 22.61± ± ± ± 0.33 (25) ro-b-im 55.18± ± ± ± 0.44 (25) rect 2.15± ± ± ± 0.12 (10) rect-img 24.04± ± ± ± 0.36 (25) convex 19.13± ± ± ± 0.34 (10) MNIST data set Test error rate with a 95% confidence interval [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

28 Results v = 0% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

29 Results v = 10% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

30 Results v = 25% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

31 Results v = 50% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

32 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

33 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

34 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

35 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

36 References [BLP + 07] [HOT06] [PCL06] [RHW88] [VLBM08] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, Universite De Montreal, and Montreal Quebec. Greedy layer-wise training of deep networks. In In NIPS. MIT Press, Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7): , July Christopher Poultney, Sumit Chopra, and Yann Lecun. Efficient learning of sparse representations with an energy-based model. In Advances in Neural Information Processing Systems (NIPS MIT Press, David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Neurocomputing: Foundations of research. chapter Learning Representations by Back-propagating Errors, pages MIT Press, Cambridge, MA, USA, Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML 08, pages , New York, NY, USA, ACM. Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

Neural Networks: A Very Brief Tutorial

Neural Networks: A Very Brief Tutorial Neural Networks: A Very Brief Tutorial Chloé-Agathe Azencott Machine Learning & Computational Biology MPIs for Developmental Biology & for Intelligent Systems Tübingen (Germany) cazencott@tue.mpg.de October

More information

Nonlinear system modeling with deep neural networks and autoencoders algorithm

Nonlinear system modeling with deep neural networks and autoencoders algorithm Nonlinear system modeling with deep neural networks and autoencoders algorithm Erick De la Rosa, Wen Yu Departamento de Control Automatico CINVESTAV-IPN Mexico City, Mexico yuw@ctrl.cinvestav.mx Xiaoou

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

An Introduction to Deep Learning

An Introduction to Deep Learning An Introduction to Deep Learning Ludovic Arnold 1,2, Sébastien Rebecchi 1, Sylvain Chevallier 1, Hélène Paugam-Moisy 1,3 1- Tao, INRIA-Saclay, LRI, UMR8623, Université Paris-Sud 11 F-91405 Orsay, France

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

arxiv: v5 [cs.lg] 19 Aug 2014

arxiv: v5 [cs.lg] 19 Aug 2014 What Regularized Auto-Encoders Learn from the Data Generating Distribution Guillaume Alain and Yoshua Bengio guillaume.alain@umontreal.ca, yoshua.bengio@umontreal.ca arxiv:111.446v5 cs.lg] 19 Aug 014 Department

More information

arxiv: v1 [cs.lg] 30 Jun 2012

arxiv: v1 [cs.lg] 30 Jun 2012 Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders arxiv:1207.0057v1 [cs.lg] 30 Jun 2012 Yoshua Bengio, Guillaume Alain, and Salah Rifai Department of Computer Science and

More information

Deep Generative Stochastic Networks Trainable by Backprop

Deep Generative Stochastic Networks Trainable by Backprop Yoshua Bengio FIND.US@ON.THE.WEB Éric Thibodeau-Laufer Guillaume Alain Département d informatique et recherche opérationnelle, Université de Montréal, & Canadian Inst. for Advanced Research Jason Yosinski

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Knowledge Extraction from Deep Belief Networks for Images

Knowledge Extraction from Deep Belief Networks for Images Knowledge Extraction from Deep Belief Networks for Images Son N. Tran City University London Northampton Square, ECV 0HB, UK Son.Tran.@city.ac.uk Artur d Avila Garcez City University London Northampton

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

A Connection Between Score Matching and Denoising Autoencoders

A Connection Between Score Matching and Denoising Autoencoders A Connection Between Score Matching and Denoising Autoencoders Pascal Vincent vincentp@iro.umontreal.ca Dept. IRO, Université de Montréal, CP 68, Succ. Centre-Ville, Montréal (QC) H3C 3J7, Canada. Technical

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Asja Fischer and Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum,

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

A Connection Between Score Matching and Denoising Autoencoders

A Connection Between Score Matching and Denoising Autoencoders NOTE CommunicatedbyAapoHyvärinen A Connection Between Score Matching and Denoising Autoencoders Pascal Vincent vincentp@iro.umontreal.ca Département d Informatique, UniversitédeMontréal, Montréal (QC)

More information

[6, 7], SVM, [7].,, SVM,, SVM,,,,,,, 0.01%,,,,.,,,, SVM,,,, [9], ( ) Soft Confidence-Weighted Learning (SCW)[8],, SCW[8], ( ),,, ( ),,,, , [

[6, 7], SVM, [7].,, SVM,, SVM,,,,,,, 0.01%,,,,.,,,, SVM,,,, [9], ( ) Soft Confidence-Weighted Learning (SCW)[8],, SCW[8], ( ),,, ( ),,,, , [ SCW Predicting stock fluctuations using Two-level Mapping and SCW 1 Muhtar Fukuda 1 1 1 Faculty of Environmental and Information Studies, Nagoya Sangyo University Abstract: Due to high uncertainty in the

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

arxiv: v1 [stat.ml] 2 Sep 2014

arxiv: v1 [stat.ml] 2 Sep 2014 On the Equivalence Between Deep NADE and Generative Stochastic Networks Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université de Montréal

More information

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada

More information

A Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning

A Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning A Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning Norbert Ayine Agana Advisor: Abdollah Homaifar Autonomous Control & Information Technology Institute

More information

arxiv: v1 [stat.ml] 30 Mar 2016

arxiv: v1 [stat.ml] 30 Mar 2016 A latent-observed dissimilarity measure arxiv:1603.09254v1 [stat.ml] 30 Mar 2016 Yasushi Terazono Abstract Quantitatively assessing relationships between latent variables and observed variables is important

More information

EE-559 Deep learning 9. Autoencoders and generative models

EE-559 Deep learning 9. Autoencoders and generative models EE-559 Deep learning 9. Autoencoders and generative models François Fleuret https://fleuret.org/dlc/ [version of: May 1, 2018] ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Embeddings and generative models

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

arxiv: v1 [cs.lg] 10 Jun 2016

arxiv: v1 [cs.lg] 10 Jun 2016 Deep Directed Generative Models with Energy-Based Probability Estimation arxiv:1606.03439v1 [cs.lg] 10 Jun 2016 Taesup Kim, Yoshua Bengio Department of Computer Science and Operations Research Université

More information

Auto-Encoders & Variants

Auto-Encoders & Variants Auto-Encoders & Variants 113 Auto-Encoders MLP whose target output = input Reconstruc7on=decoder(encoder(input)), input e.g. x code= latent features h encoder decoder reconstruc7on r(x) With bo?leneck,

More information

On autoencoder scoring

On autoencoder scoring Hanna Kamyshanska kamyshanska@fias.uni-frankfurt.de Goethe Universität Frankfurt, Robert-Mayer-Str. 11-15, 60325 Frankfurt, Germany Roland Memisevic memisevr@iro.umontreal.ca University of Montreal, CP

More information

arxiv: v2 [cs.lg] 14 Dec 2012

arxiv: v2 [cs.lg] 14 Dec 2012 ADVANCES IN OPTIMIZING RECURRENT NETWORKS Yoshua Bengio, Nicolas Boulanger-Lewandowski and Razvan Pascanu U. Montreal arxiv:1212.0901v2 [cs.lg] 14 Dec 2012 ABSTRACT After a more than decade-long period

More information

Self-paced Convolutional Neural Networks

Self-paced Convolutional Neural Networks Self-paced Convolutional Neural Networks Hao Li, Maoguo Gong Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education Xidian University, Xi an, China omegalihao@gmail.com,

More information

arxiv: v3 [cs.lg] 9 Aug 2016

arxiv: v3 [cs.lg] 9 Aug 2016 Yoshua Bengio 1, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard and Zhouhan Lin Montreal Institute for Learning Algorithms, University of Montreal, Montreal, QC, H3C 3J7 1 CIFAR Senior Fellow arxiv:1502.04156v3

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

arxiv: v1 [cs.lg] 11 May 2015

arxiv: v1 [cs.lg] 11 May 2015 Improving neural networks with bunches of neurons modeled by Kumaraswamy units: Preliminary study Jakub M. Tomczak JAKUB.TOMCZAK@PWR.EDU.PL Wrocław University of Technology, wybrzeże Wyspiańskiego 7, 5-37,

More information

Deep Learning of Invariant Spatio-Temporal Features from Video

Deep Learning of Invariant Spatio-Temporal Features from Video Deep Learning of Invariant Spatio-Temporal Features from Video Bo Chen California Institute of Technology Pasadena, CA, USA bchen3@caltech.edu Benjamin M. Marlin University of British Columbia Vancouver,

More information

FreezeOut: Accelerate Training by Progressively Freezing Layers

FreezeOut: Accelerate Training by Progressively Freezing Layers FreezeOut: Accelerate Training by Progressively Freezing Layers Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,

More information

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition

More information

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

To go deep or wide in learning?

To go deep or wide in learning? Gaurav Pandey and Ambedkar Dukkipati Department of Computer Science and Automation Indian Institute of Science, Bangalore 5600, India Abstract To achieve acceptable performance for AI tasks, one can either

More information

Fast Learning with Noise in Deep Neural Nets

Fast Learning with Noise in Deep Neural Nets Fast Learning with Noise in Deep Neural Nets Zhiyun Lu U. of Southern California Los Angeles, CA 90089 zhiyunlu@usc.edu Zi Wang Massachusetts Institute of Technology Cambridge, MA 02139 ziwang.thu@gmail.com

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

Scalable Gaussian Process Regression Using Deep Neural Networks

Scalable Gaussian Process Regression Using Deep Neural Networks Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Scalable Gaussian Process Regression Using Deep eural etworks Wenbing Huang 1, Deli Zhao 2, Fuchun

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Unsupervised Feature Learning from Temporal Data

Unsupervised Feature Learning from Temporal Data Unsupervised Feature Learning from Temporal Data Rostislav Goroshin 1 goroshin@cims.nyu.edu Joan Bruna 1 bruna@cims.nyu.edu Jonathan Tompson 1 tompson@cims.nyu.edu Arthur Szlam 2 aszlam@ccny.cuny.edu David

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

arxiv: v1 [cs.lg] 25 Jun 2015

arxiv: v1 [cs.lg] 25 Jun 2015 Conservativeness of untied auto-encoders arxiv:1506.07643v1 [cs.lg] 25 Jun 2015 Daniel Jiwoong Im Montreal Institute for Learning Algorithms University of Montreal Montreal, QC, H3C 3J7 imdaniel@iro.umontreal.ca

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Deep Learning of Invariant Spatiotemporal Features from Video Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Introduction Focus: Unsupervised feature extraction from

More information

Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines

Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines Desjardins, G., Courville, A., Bengio, Y., Vincent, P. and Delalleau, O. Technical Report 1345, Dept. IRO, Université de

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Cardinality Restricted Boltzmann Machines

Cardinality Restricted Boltzmann Machines Cardinality Restricted Boltzmann Machines Kevin Swersky Daniel Tarlow Ilya Sutskever Dept. of Computer Science University of Toronto [kswersky,dtarlow,ilya]@cs.toronto.edu Ruslan Salakhutdinov, Richard

More information

Supervised Learning Part I

Supervised Learning Part I Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Neural Networks and Deep Learning.

Neural Networks and Deep Learning. Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden

More information

An Efficient Learning Procedure for Deep Boltzmann Machines

An Efficient Learning Procedure for Deep Boltzmann Machines ARTICLE Communicated by Yoshua Bengio An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov rsalakhu@utstat.toronto.edu Department of Statistics, University of Toronto, Toronto,

More information

Reward-modulated inference

Reward-modulated inference Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement

More information

A Unified Energy-Based Framework for Unsupervised Learning

A Unified Energy-Based Framework for Unsupervised Learning A Unified Energy-Based Framework for Unsupervised Learning Marc Aurelio Ranzato Y-Lan Boureau Sumit Chopra Yann LeCun Courant Insitute of Mathematical Sciences New York University, New York, NY 10003 Abstract

More information

Deep Learning. Alexandre Allauzen, Michèle Sebag, Yann Ollivier CNRS & Université Paris-Sud

Deep Learning. Alexandre Allauzen, Michèle Sebag, Yann Ollivier CNRS & Université Paris-Sud Deep Learning Alexandre Allauzen, Michèle Sebag, Yann Ollivier CNRS & Université Paris-Sud Nov. 23rd, 2016 Credit for slides: Yoshua Bengio, Yann Le Cun, Nando de Freitas, Christian Perone, Honglak Lee

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined

More information

arxiv: v1 [cs.lg] 6 Nov 2016

arxiv: v1 [cs.lg] 6 Nov 2016 GENERATIVE ADVERSARIAL NETWORKS AS VARIA- TIONAL TRAINING OF ENERGY BASED MODELS Shuangfei Zhai Binghamton University Vestal, NY 13902, USA szhai2@binghamton.edu Yu Cheng IBM T.J. Watson Research Center

More information

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA

More information

UNDERSTANDING LOCAL MINIMA IN NEURAL NET-

UNDERSTANDING LOCAL MINIMA IN NEURAL NET- UNDERSTANDING LOCAL MINIMA IN NEURAL NET- WORKS BY LOSS SURFACE DECOMPOSITION Anonymous authors Paper under double-blind review ABSTRACT To provide principled ways of designing proper Deep Neural Network

More information

Deep Learning Architectures and Algorithms

Deep Learning Architectures and Algorithms Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning

More information

Approximation properties of DBNs with binary hidden units and real-valued visible units

Approximation properties of DBNs with binary hidden units and real-valued visible units Approximation properties of DBNs with binary hidden units and real-valued visible units Oswin Krause Oswin.Krause@diku.dk Department of Computer Science, University of Copenhagen, 100 Copenhagen, Denmark

More information

arxiv: v2 [stat.ml] 18 Jun 2017

arxiv: v2 [stat.ml] 18 Jun 2017 FREEZEOUT: ACCELERATE TRAINING BY PROGRES- SIVELY FREEZING LAYERS Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,

More information

Deep Learning Made Easier by Linear Transformations in Perceptrons

Deep Learning Made Easier by Linear Transformations in Perceptrons Deep Learning Made Easier by Linear Transformations in Perceptrons Tapani Raiko Aalto University School of Science Dept. of Information and Computer Science Espoo, Finland firstname.lastname@aalto.fi Harri

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Measuring Invariances in Deep Networks

Measuring Invariances in Deep Networks Measuring Invariances in Deep Networks Ian J. Goodfellow, Quoc V. Le, Andrew M. Saxe, Honglak Lee, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 9435 {ia3n,quocle,asaxe,hllee,ang}@cs.stanford.edu

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function

More information

arxiv: v4 [stat.ml] 8 Jan 2016

arxiv: v4 [stat.ml] 8 Jan 2016 DROPOUT AS DATA AUGMENTATION Xavier Bouthillier Université de Montréal, Canada xavier.bouthillier@umontreal.ca Pascal Vincent Université de Montréal, Canada and CIFAR pascal.vincent@umontreal.ca Kishore

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Learning Dynamics of Linear Denoising Autoencoders

Learning Dynamics of Linear Denoising Autoencoders Arnu Pretorius 1 2 Steve Kroon 1 2 Herman Kamper 3 Abstract Denoising autoencoders (DAEs) have proven useful for unsupervised representation learning, but a thorough theoretical understanding is still

More information