TUTORIAL PART 1 Unsupervised Learning

Size: px
Start display at page:

Download "TUTORIAL PART 1 Unsupervised Learning"

Transcription

1 TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew Ng Deep Learning and Unsupervised Feature Learning Workshop, 10 Dec. 2010

2 Feature Learning Learning algorithm Input Motorbikes Non -Motorbikes color Input space brightness kindly borrowed from Andrew Ng ECCV10

3 Feature Learning Feature Extractor Learning algorithm Input Motorbikes Non -Motorbikes Input space color wheel Input space brightness handle kindly borrowed from Andrew Ng ECCV10

4 How is computer perception done? Object detection Image Low-level vision features Recognition Audio classification Audio Low-level audio features Speaker identification Helicopter control Helicopter Low-level state features Action

5 Computer vision features SIFT Spin image HoG RIFT Textons GLOH kindly borrowed from Andrew Ng ECCV10

6 Audio features MFCC Spectrogram Flux ZCR Rolloff kindly borrowed from Andrew Ng ECCV10

7 Engineering features: Need expert knowledge Sub-optimal Time-consuming and expensive Does not generalize to other domain

8 The goal of Unsupervised Feature Learning Unlabeled images Learning algorithm Feature representation kindly borrowed from Andrew Ng ECCV10

9 Outline What is Unsupervised Learning? Unsupervised Learning Algorithms Comparing Unsupervised Learning Algorithms (Tutorial II) Deep Learning

10 Unsupervised Learning Data points belonging to 3 classes

11 Unsupervised Learning No labels are provided during training

12 Unsupervised Learning Fit mixture of 3 Gaussians: use responsibility to represent a data point (indicative of its class)

13 Unsupervised Learning Unsupervised Learning Density estimation Latent variables ( features, possibly useful for discrimination)

14 Unsupervised Learning Unsupervised Learning Density estimation Latent variables ( features, possibly useful for discrimination) Energy-based interpretation Each data-point x has associated energy E(x) Training has to make E(x) lower for x in training set E(x) x

15 Unsupervised Learning Unsupervised Learning Density estimation Latent variables ( features, possibly useful for discrimination) Energy-based interpretation Each data-point x has associated energy E(x) Training has to make E(x) lower for x in training set E(x) BEFORE TRAINING x

16 Unsupervised Learning Unsupervised Learning Density estimation Latent variables ( features, possibly useful for discrimination) Energy-based interpretation Each data-point x has associated energy E(x) Training has to make E(x) lower for x in training set E(x) AFTER TRAINING x

17 Principal Component Analysis 2 E X, Z ; W = X W Z Feature: Z =W ' X, it must be lower dimensional Training: minimize E s.t. orthogonality constraint Input data points Reconstructions (1D feature space)

18 Principal Component Analysis 2 E X, Z ; W = X W Z Feature: Z =W ' X, it must be lower dimensional Training: minimize E s.t. orthogonality constraint Input data points Reconstructions (1D feature space)

19 Principal Component Analysis 2 E X, Z ; W = X W Z Feature: Z =W ' X PROS Simple training (tuning free) Unique solution Fast CONS Feature must be lower dimensional Features are linear

20 Auto-encoder Neural Network 2 E X, Z ; W = X f W Z Feature: Z =g A X, lower dimensional Training: minimize E Input data points Reconstructions (1D feature space)

21 Auto-encoder Neural Network 2 E X, Z ; W = X f W Z Feature: Z =g A X, lower dimensional PROS Non-linear features Pretty fast training CONS Feature must be lower dimensional A few hyper-parameters Optimization becomes hard if highly non-linear

22 Denoising Auto-encoder 2 E X, Z ; W = X f W Z Feature: Z =g A X n, n is noise Training: minimize E Input data points Reconstructions (1D feature space)

23 Denoising Auto-encoder 2 E X, Z ; W = X f W Z Feature: Z =g A X n PROS Non-linear features Pretty fast training Robustness to noise in the input Possibly higher dimensional features Can check convergence CONS A few hyper-parameters Optimization becomes hard if highly non-linear Choice of noise distribution

24 K-Means 2 E X, Z ; W = X W Z Feature: Z 1-of-N code Training: minimize E Input data points Reconstructions (2 prototypes)

25 K-Means 2 E X, Z ; W = X W Z Feature: Z 1-of-N code PROS Simple training (tuning free) Fast CONS One might need lots of prototypes to cover highdimensional space Representation is too sparse

26 Sparse Coding 2 E X, Z ; W = X W Z Z 1 Feature: Z sparse Training: minimize E (coordinate descent) Input data points Reconstructions (2 components)

27 Sparse Coding 2 E X, Z ; W = X W Z Z 1 Feature: Z sparse PROS Possibly higher-dimensional features It often yields more interpretable features Biologically plausible (?) CONS Expensive training Expensive inference Need to tune

28 Predictive Sparse Coding 2 2 E X, Z ; W = X W Z Z 1 Z g A ' X Feature: Z sparse PROS Possibly higher-dimensional features Fast inference CONS Expensive training Need to tune

29 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X All variables are binary: X i {0,1}, Z j {0,1} E = w 11 X 1 Z 1 w 12 X 1 Z 2 w 21 X 2 Z 1... Z1 Z2 W X1 X2 X3

30 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X exp E X, Z ;W p X, Z ; W = x ' z ' exp E x ', z ' ; W Z1 Z2 W X1 X2 X3

31 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X exp E X, Z ;W p X, Z ; W = x ' z ' exp E x ', z ' ; W Z1 INTRACTABLE Z2 W X1 X2 X3

32 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X 1 u = 1 exp u p X =1 Z ; W = j W j Z, p Z =1 X ; W = i W i ' X Z1 Z2 W X1 X2 X3

33 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Easy conditionals: p Z X ; W = W ' X efficient Gibbs sampling p X Z ; W = WZ Z1 Z2 W X1 X2 X3

34 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Easy conditionals: p Z X ; W = W ' X efficient Gibbs sampling p X Z ; W = WZ Z1 p Z X ; W X1 X2 X3 Z2

35 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Easy conditionals: p Z X ; W = W ' X efficient Gibbs sampling p X Z ; W = WZ Z1 Z2 p X Z ; W X1 X2 X3

36 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Easy conditionals: p Z X ; W = W ' X efficient Gibbs sampling p X Z ; W = WZ Z1 p Z X ; W X1 X2 X3 Z2...

37 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X p X Z ; W = WZ Subsequently used as features p Z X ; W = W ' X Z1 Z2 W X1 X2 X3

38 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Loss= log p X ; W W W z t E X, z ;W E x, z ;W t p z X x, z p x, z W W

39 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Loss= log p X ; W W E(x) W z t E X, z ;W E x, z ;W t p z X x, z p x, z W W BEFORE UPDATE x

40 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Loss= log p X ; W W E(x) W z t E X, z ;W E x, z ;W t p z X x, z p x, z W W AFTER UPDATE x

41 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Loss= log p X ; W W E(x) W z t m E X, z ;W E X,z ;W t m p z X z p z X W W USING MCMC x

42 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Loss= log p X ; W W W z t m E X, z ;W E X,z ;W t m p z X z p z X W W In practice, Gibbs sampler takes too long to converge: Contrastive Divergence Persistent Contrastive Divergence Fast Persistent Contrastive Divergence Score Matching Ratio Matching Margin-Based Losses Variational Methods

43 Restricted Boltzmann Machine E X, Z ; W = Z ' W ' X Feature: Z =g W ' X PROS Possibly higher-dimensional features It can generate data Simple interpretation of learning rule Simple to extend variables to other distributions CONS Exact learning is intractable Approximations do not let easily assess convergence A few hyper-parameters to tune

44 Comparison PCA Auto-encoder K-Means Sparse Coding RBM Denoising Auto-enc. Linear Features yes no no no no no

45 Comparison PCA Auto-encoder K-Means Sparse Coding RBM Denoising Auto-enc. Sparse features no (but it can be added) no (but it can be added) yes yes no (but it can be added) no (but it can be added)

46 Comparison PCA Auto-encoder K-Means Sparse Coding RBM Denoising Auto-enc. Energy pull-up restriction on code restriction on code restriction on code restriction on code partition function noise to input

47 Comparison PCA on 8x8 patches ICA on 8x8 patches

48 Comparing Unsupervised Algorithms Properties of features Reconstruction error Likelihood on test data Discriminative performance of classifier trained on features Statistical dependency of components Denoising performance Other tasks

49 References: RBM Hinton Training Product of Experts by minimizing contrastive divergence, Neural Computation 2001 Welling, Rosen-Zvi, Hinton Exponential Family Harmoniums with an Application to Information Retrieval, NIPS 2005 code (matlab) (python using gnumpy module to run on a GPU)

50 References: sparse RBM Lee, Ekanadham, Ng Sparse deep belief net model for visual area V2 NIPS 2008 Lee, Grosse, Ranganath, Ng Convolutional Deep Belief Networks for scalable unsupervised learning of hierarchical representations, ICML 2009

51 References: S-RBM Osindero, Hinton Modeling image patches with a directed hierarchy of markov random fields NIPS 2008

52 References: mcrbm Ranzato, Hinton Modeling pixel means and covariances using factorized third-order Boltzmann machines CVPR 2010 Ranzato, Mnih, Hinton Generating more realistic images using gated MRF NIPS 2010 code (python code using CUDAMAT to run on a GPU)

53 References: Sparse Coding Olshausen, Field Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research 1997 Lee, Battle, Raina, Ng Efficient sparse coding algorithms NIPS 2007 Lee, Raina, Teichman, Ng Exponential family sparse coding with applications to self-taught learning IJCAI 2009 code (Julien Mairal's work on fast sparse coding methods with several extensions) (see also work by M. Elad et al. on K-SVD, another sparse coding algorithm)

54 References: Predictive Sparse Coding Kavukcuoglu, Ranzato, LeCun, "Fast inference in sparse coding algorithms with applications to object recognition," CBLL Technical Report, December ArXiv Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun, "Learning Convolutional Feature Hierachies for Visual Recognition" NIPS 2010 code (basic algorithm for predictive sparse decomposition)

55 References: Local Coordinate Coding Yu, Zhang, Gong Nonlinear learning using local coordinate coding NIPS 2009 Lin, Zhang, Zhu, Yu Deep coding networks NIPS 2010

56 References: Product of Student's t Osindero, Welling, Hinton "Topographic product models applied to natural scene statistics," Neural Computation 2006

57 References: Denoising Auto-encoders Pascal, Larochelle, Bengio, Manzagol Extracting and composing robust features with denoising autoencoders ICML 2008 code code written in Theano, a python library with interface to GPU, developed in Y. Bengio's lab, see more at:

58 End of Part 1 Any questions?

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition

More information

Modeling Natural Images with Higher-Order Boltzmann Machines

Modeling Natural Images with Higher-Order Boltzmann Machines Modeling Natural Images with Higher-Order Boltzmann Machines Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu joint work with Geoffrey Hinton and Vlad Mnih CIFAR

More information

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Kihyuk Sohn Dae Yon Jung Honglak Lee Alfred O. Hero III Dept. of Electrical Engineering and Computer

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

CONVOLUTIONAL DEEP BELIEF NETWORKS

CONVOLUTIONAL DEEP BELIEF NETWORKS CONVOLUTIONAL DEEP BELIEF NETWORKS Talk by Emanuele Coviello Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Honglak Lee Roger Grosse Rajesh Ranganath

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Deep Learning of Invariant Spatiotemporal Features from Video Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Introduction Focus: Unsupervised feature extraction from

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1

More information

Denoising Autoencoders

Denoising Autoencoders Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

A Spike and Slab Restricted Boltzmann Machine

A Spike and Slab Restricted Boltzmann Machine Aaron Courville James Bergstra Yoshua Bengio DIRO, Université de Montréal, Montréal, Québec, Canada {courvila,bergstrj,bengioy}@iro.umontreal.ca Abstract We introduce the spike and slab Restricted Boltzmann

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However, Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny

More information

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

A Unified Energy-Based Framework for Unsupervised Learning

A Unified Energy-Based Framework for Unsupervised Learning A Unified Energy-Based Framework for Unsupervised Learning Marc Aurelio Ranzato Y-Lan Boureau Sumit Chopra Yann LeCun Courant Insitute of Mathematical Sciences New York University, New York, NY 10003 Abstract

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Deep Learning of Invariant Spatio-Temporal Features from Video

Deep Learning of Invariant Spatio-Temporal Features from Video Deep Learning of Invariant Spatio-Temporal Features from Video Bo Chen California Institute of Technology Pasadena, CA, USA bchen3@caltech.edu Benjamin M. Marlin University of British Columbia Vancouver,

More information

An Introduction to Deep Learning

An Introduction to Deep Learning An Introduction to Deep Learning Ludovic Arnold 1,2, Sébastien Rebecchi 1, Sylvain Chevallier 1, Hélène Paugam-Moisy 1,3 1- Tao, INRIA-Saclay, LRI, UMR8623, Université Paris-Sud 11 F-91405 Orsay, France

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other

More information

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The

More information

Deep Learning Architectures and Algorithms

Deep Learning Architectures and Algorithms Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning

More information

arxiv: v1 [stat.ml] 30 Aug 2010

arxiv: v1 [stat.ml] 30 Aug 2010 Sparse Group Restricted Boltzmann Machines arxiv:008.4988v [stat.ml] 30 Aug 200 Heng Luo Department of Computer Science Shanghai Jiao Tong University hengluo@sjtu.edu.cn Changyong Niu Department of Computer

More information

The Recurrent Temporal Restricted Boltzmann Machine

The Recurrent Temporal Restricted Boltzmann Machine The Recurrent Temporal Restricted Boltzmann Machine Ilya Sutskever, Geoffrey Hinton, and Graham Taylor University of Toronto {ilya, hinton, gwtaylor}@cs.utoronto.ca Abstract The Temporal Restricted Boltzmann

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Variational Autoencoders Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Contents 1. Why unsupervised learning, and why generative models? (Selected slides from Yann

More information

Knowledge Extraction from Deep Belief Networks for Images

Knowledge Extraction from Deep Belief Networks for Images Knowledge Extraction from Deep Belief Networks for Images Son N. Tran City University London Northampton Square, ECV 0HB, UK Son.Tran.@city.ac.uk Artur d Avila Garcez City University London Northampton

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

On autoencoder scoring

On autoencoder scoring Hanna Kamyshanska kamyshanska@fias.uni-frankfurt.de Goethe Universität Frankfurt, Robert-Mayer-Str. 11-15, 60325 Frankfurt, Germany Roland Memisevic memisevr@iro.umontreal.ca University of Montreal, CP

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Unsupervised Feature Learning from Temporal Data

Unsupervised Feature Learning from Temporal Data Unsupervised Feature Learning from Temporal Data Rostislav Goroshin 1 goroshin@cims.nyu.edu Joan Bruna 1 bruna@cims.nyu.edu Jonathan Tompson 1 tompson@cims.nyu.edu Arthur Szlam 2 aszlam@ccny.cuny.edu David

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Auto-Encoders & Variants

Auto-Encoders & Variants Auto-Encoders & Variants 113 Auto-Encoders MLP whose target output = input Reconstruc7on=decoder(encoder(input)), input e.g. x code= latent features h encoder decoder reconstruc7on r(x) With bo?leneck,

More information

arxiv: v1 [cs.lg] 30 Jun 2012

arxiv: v1 [cs.lg] 30 Jun 2012 Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders arxiv:1207.0057v1 [cs.lg] 30 Jun 2012 Yoshua Bengio, Guillaume Alain, and Salah Rifai Department of Computer Science and

More information

Kai Yu NEC Laboratories America, Cupertino, California, USA

Kai Yu NEC Laboratories America, Cupertino, California, USA Kai Yu NEC Laboratories America, Cupertino, California, USA Joint work with Jinjun Wang, Fengjun Lv, Wei Xu, Yihong Gong Xi Zhou, Jianchao Yang, Thomas Huang, Tong Zhang Chen Wu NEC Laboratories America

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks

More information

Measuring Invariances in Deep Networks

Measuring Invariances in Deep Networks Measuring Invariances in Deep Networks Ian J. Goodfellow, Quoc V. Le, Andrew M. Saxe, Honglak Lee, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 9435 {ia3n,quocle,asaxe,hllee,ang}@cs.stanford.edu

More information

Nonparametric Bayesian Dictionary Learning for Machine Listening

Nonparametric Bayesian Dictionary Learning for Machine Listening Nonparametric Bayesian Dictionary Learning for Machine Listening Dawen Liang Electrical Engineering dl2771@columbia.edu 1 Introduction Machine listening, i.e., giving machines the ability to extract useful

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 1 / 35 Outline 1 Universality of Neural Networks

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material A Generative Perspective on MRFs in Low-Level Vision Supplemental Material Uwe Schmidt Qi Gao Stefan Roth Department of Computer Science, TU Darmstadt 1. Derivations 1.1. Sampling the Prior We first rewrite

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

arxiv: v1 [stat.ml] 2 Sep 2014

arxiv: v1 [stat.ml] 2 Sep 2014 On the Equivalence Between Deep NADE and Generative Stochastic Networks Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université de Montréal

More information

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Srinjoy Das Department of Electrical and Computer Engineering University of California, San Diego srinjoyd@gmail.com Bruno Umbria

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

DepthQualificationExam Presentation

DepthQualificationExam Presentation DepthQualificationExam Presentation Li Wan, Dept. of Computer Science, Courant Institute, New York University Depth Qualification Exam Presentation p. 1/29 OverviewofTalk 1. Literature Survey (a) Approaches

More information

arxiv: v5 [cs.lg] 19 Aug 2014

arxiv: v5 [cs.lg] 19 Aug 2014 What Regularized Auto-Encoders Learn from the Data Generating Distribution Guillaume Alain and Yoshua Bengio guillaume.alain@umontreal.ca, yoshua.bengio@umontreal.ca arxiv:111.446v5 cs.lg] 19 Aug 014 Department

More information

Deep Generative Stochastic Networks Trainable by Backprop

Deep Generative Stochastic Networks Trainable by Backprop Yoshua Bengio FIND.US@ON.THE.WEB Éric Thibodeau-Laufer Guillaume Alain Département d informatique et recherche opérationnelle, Université de Montréal, & Canadian Inst. for Advanced Research Jason Yosinski

More information

Unsupervised Neural Nets

Unsupervised Neural Nets Unsupervised Neural Nets (and ICA) Lyle Ungar (with contributions from Quoc Le, Socher & Manning) Lyle Ungar, University of Pennsylvania Semi-Supervised Learning Hypothesis:%P(c x)%can%be%more%accurately%computed%using%

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Cardinality Restricted Boltzmann Machines

Cardinality Restricted Boltzmann Machines Cardinality Restricted Boltzmann Machines Kevin Swersky Daniel Tarlow Ilya Sutskever Dept. of Computer Science University of Toronto [kswersky,dtarlow,ilya]@cs.toronto.edu Ruslan Salakhutdinov, Richard

More information

Competitive Learning for Deep Temporal Networks

Competitive Learning for Deep Temporal Networks Competitive Learning for Deep Temporal Networks Robert Gens Computer Science and Engineering University of Washington Seattle, WA 98195 rcg@cs.washington.edu Pedro Domingos Computer Science and Engineering

More information

An Efficient Learning Procedure for Deep Boltzmann Machines

An Efficient Learning Procedure for Deep Boltzmann Machines ARTICLE Communicated by Yoshua Bengio An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov rsalakhu@utstat.toronto.edu Department of Statistics, University of Toronto, Toronto,

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Denoising Criterion for Variational Auto-Encoding Framework

Denoising Criterion for Variational Auto-Encoding Framework Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Denoising Criterion for Variational Auto-Encoding Framework Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua

More information

Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives

Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives 1 Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives Yoshua Bengio, Aaron Courville, and Pascal Vincent Department of computer science and operations research, U. Montreal arxiv:1206.5538v1

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information