Modeling Natural Images with Higher-Order Boltzmann Machines

Size: px
Start display at page:

Download "Modeling Natural Images with Higher-Order Boltzmann Machines"

Transcription

1 Modeling Natural Images with Higher-Order Boltzmann Machines Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto joint work with Geoffrey Hinton and Vlad Mnih CIFAR summer school, 12 August 2010

2 How to model natural images? p(v,h) X X X X image space v v: piels h: latent features

3 A generative perspective: p(v h) PCA Pearson 1901

4 A generative perspective: p(v h) PCA Pearson 1901 FA p(v h)=n(wh, D) Young 1940

5 A generative perspective: p(v h) PCA mean Pearson 1901 FA sample ~ p(v h) structure is lost! p(v h)=n(wh, D) Young 1940

6 A generative perspective: p(v h) PCA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999

7 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999

8 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999 PoT p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

9 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 data point sample ~ p(v h) PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT structure is preserved! p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

10 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 mcrbm PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT Ranzato Hinton 2010 p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

11 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 mcrbm PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT Ranzato Hinton 2010 THIS IS NOT A MIXTURE OF GAUSSIANS!! p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

12 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value v2 - input - p(v h) v1

13 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value with some eceptions! v2 - input - p(v h) v1

14 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value with some eceptions! v2 - input - p(v h) With one binary hidden unit we can pick the Gaussian with the right covariance v1 This is like PoT

15 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. v2 - input - p(v h) without mean v1

16 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. v2 - input - p(v h) with mean v1

17 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. When data is not centered, this is even more dramatic. v2 - input - p(v h) without mean v1 Failed to capture edge!

18 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. When data is not centered, this is even more dramatic. v2 - input - p(v h) with mean v1

19 p(v h) hiddens determine an image-specific mean and an image-specific covariance. mcrbm Ranzato Hinton 2010 How to modulate mean and covariance using hidden units? m c m c p v, h, h ep E v, h, h

20 - easy generation - slow inference p v hm, hc = N m h m, hc 1 1 E = v m ' v m 2 hm hc v

21 - easy generation - slow inference p v hm, hc = N m h m, hc - less easy generation - fast inference hm hc 1 1 E = v m ' v m 2 hm hc v 1 1 E = v ' v m v 2 v p v hm, hc = N hc 1 m h m, hc

22 Geometric interpretation If we multiply them, we get...

23 Geometric interpretation If we multiply them, we get... a sharper and shorter Gaussian

24 We start by modeling small image patches. - v visibles - h hiddens 1 0 p(v,h) v 1 0 h

25 Modeling the covariance only (using binary hiddens): crbm Ranzato Krizhevsky Hinton AISTATS E= v' v 2

26 c 1 h gated MRF v1 c c v2 c 1 E v, h =w 1 v 1 v 2 h... Interactions determined by state of latent variables 1 1 E v, h = v ' v 2 c 1 c c =C diag P h C ' c k h {0,1 }

27 So far we modeled the covariance only... now we add also the mean mcrbm Ranzato Hinton CVPR E = v ' v m v 2

28 h 1 1 m E v, h, h = v ' v ij W ij v i h j 2 m { c c v1 v2 h crbm m Conditional over visibles has non-zero mean that depends on both sets of hiddens: c m m p v h, h = N W h,

29 Interpreting mcrbm - Looking at p v h - relation to PCA, FA, PoT, etc.

30 Interpreting mcrbm - Looking at p v h - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c v2 - relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96

31 Interpreting mcrbm - Looking at p v h - Looking at E v, h h 3rd order BM v1 - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c c v2 - relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96 v2 - relation to conditional 3-way RBM Memisevic et al 07

32 Interpreting mcrbm - Looking at p v h - Looking at E v, h h 3rd order BM v1 - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c c v2 v2 - relation to conditional 3-way RBM Memisevic et al 07 - Looking at p h v 1 c 2 p h k =1 v = P k C ' v b k relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96 - relation to simple-comple cell model

33 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m

34 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m

35 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m

36 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples F=- log(p(v)) + K F w training sample

37 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

38 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

39 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

40 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

41 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

42 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w

43 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples weight update F=- log(p(v)) + K F w F w data sample point w w F data w F sample w

44 LEARNING - maimum likelihood - Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples Start dynamics from previous point F=- log(p(v)) + K data sample point w w F data w F sample w

45 LEARNING - maimum likelihood - Fast Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples Start dynamics from previous point, and temporarily perturb weights by a little bit F=- log(p(v)) + K data sample point Tieleman and Hinton ICML 2009 w w F data w F sample w

46 LEARNING - maimum likelihood - Fast Persistent Contrastive Divergence Initialize: w, w f, f for each training data case do: - get training sample: v - compute derivatives: g = F / w v - draw sample: v HMC v ; w w f - compute derivatives: g = F / w w f v - update true parameters: w w g g - update fast weights: w f 0.95 w f f g g Tieleman and Hinton ICML 2009

47 Eperiments

48 Learn from 1616 natural image patches pre-processing: PCA whitening covariance filters hc v1 v2 hm

49 Learn from 1616 natural image patches pre-processing: PCA whitening grouping of covariance filters hc v1 v2 hm

50 Learn from 1616 natural image patches pre-processing: PCA whitening mean intensity filters hc v1 v2 hm

51 1) given image -> infer latent variables using p(h v) 2) keeping latent variables fied, sample from p(v h) random walk in input space sampling p(v h) oo oo oo o The latent configuration induces a whole subspace of images. The latent representation learns to be robust to small distortions.

52 Eample of image patches used during training

53 Samples drawn from the model (using HMC)

54 Eample of image patches used during training

55 Samples drawn from the model (using HMC)

56 Comparison Natural images mcrbm GRBM from Osindero and Hinton NIPS 2008 S-RBM + DBN from Osindero and Hinton NIPS 2008

57 From patches to big images Training by picking patches at random

58 From patches to big images But we could also take them from a grid This is not a good way to etend the model to big images: block artifacts

59 From patches to big images But a subset of filters applied to these patches and...

60 From patches to big images But a subset of filters applied to these patches and... other subsets applied to shifted grids Gregor LeCun 2010, our paper in submission

61 From patches to big images But a subset of filters applied to these patches and... other subsets applied to shifted grids no block artifacts & little redundancy Gregor LeCun 2010, our paper in submission

62 Learning on high-resolution images covariance filters

63 Learning on high-resolution images covariance filters

64 Learning on high-resolution images covariance filters

65 Learning on high-resolution images covariance filters

66 Learning on high-resolution images covariance filters

67 Learning on high-resolution images mean filters

68 Sampling high resolution images Gaussian model from Simoncelli 2005 marginal wavelet

69 Sampling high resolution images Gaussian model marginal wavelet from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

70 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

71 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

72 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

73 Sampling high resolution images Sampling starting from natural image

74 Sampling high resolution images Sampling starting from natural image

75 Inpainting: guessing the 90% of the piels masked input image

76 Inpainting: guessing the 90% of the piels MAP estimate missing piels

77 Inpainting: guessing the 90% of the piels true image

78 Inpainting: guessing the 90% of the piels masked input image

79 Inpainting: guessing the 90% of the piels MAP estimate missing piels

80 Inpainting: guessing the 90% of the piels true image

81 Inpainting: guessing the 90% of the piels masked input image

82 Inpainting: guessing the 90% of the piels MAP estimate missing piels

83 Inpainting: guessing the 90% of the piels true image

84 Conclusion 3-way Boltzmann Machines Joint model: fast inference, easy to interpret conditionals It generates very realistic samples Training is hard because of partition function Future: applications: segmentation, denoising multi-scale we'll make it DEEPER!!

85 THANK YOU

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley Hierarchical Sparse Bayesian Learning Pierre Garrigues UC Berkeley Outline Motivation Description of the model Inference Learning Results Motivation Assumption: sensory systems are adapted to the statistical

More information

A Spike and Slab Restricted Boltzmann Machine

A Spike and Slab Restricted Boltzmann Machine Aaron Courville James Bergstra Yoshua Bengio DIRO, Université de Montréal, Montréal, Québec, Canada {courvila,bergstrj,bengioy}@iro.umontreal.ca Abstract We introduce the spike and slab Restricted Boltzmann

More information

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material A Generative Perspective on MRFs in Low-Level Vision Supplemental Material Uwe Schmidt Qi Gao Stefan Roth Department of Computer Science, TU Darmstadt 1. Derivations 1.1. Sampling the Prior We first rewrite

More information

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Richard Turner (turner@gatsby.ucl.ac.uk) Gatsby Computational Neuroscience Unit, 02/03/2006 Outline Historical

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

Evaluating Models of Natural Image Patches

Evaluating Models of Natural Image Patches 1 / 16 Evaluating Models of Natural Image Patches Chris Williams Neural Information Processing School of Informatics, University of Edinburgh January 15, 2018 Outline Evaluating Models Comparing Whitening

More information

Approximate inference in Energy-Based Models

Approximate inference in Energy-Based Models CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Robust Classification using Boltzmann machines by Vasileios Vasilakakis Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Kihyuk Sohn Dae Yon Jung Honglak Lee Alfred O. Hero III Dept. of Electrical Engineering and Computer

More information

arxiv: v1 [cs.ne] 6 May 2014

arxiv: v1 [cs.ne] 6 May 2014 Training Restricted Boltzmann Machine by Perturbation arxiv:1405.1436v1 [cs.ne] 6 May 2014 Siamak Ravanbakhsh, Russell Greiner Department of Computing Science University of Alberta {mravanba,rgreiner@ualberta.ca}

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Deep Learning of Invariant Spatiotemporal Features from Video Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Introduction Focus: Unsupervised feature extraction from

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Inductive Principles for Restricted Boltzmann Machine Learning

Inductive Principles for Restricted Boltzmann Machine Learning Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Learning Nonlinear Statistical Regularities in Natural Images by Modeling the Outer Product of Image Intensities

Learning Nonlinear Statistical Regularities in Natural Images by Modeling the Outer Product of Image Intensities LETTER Communicated by Odelia Schwartz Learning Nonlinear Statistical Regularities in Natural Images by Modeling the Outer Product of Image Intensities Peng Qi pengrobertqi@163.com Xiaolin Hu xlhu@tsinghua.edu.cn

More information

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives

Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives 1 Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives Yoshua Bengio, Aaron Courville, and Pascal Vincent Department of computer science and operations research, U. Montreal arxiv:1206.5538v1

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

A Unified Energy-Based Framework for Unsupervised Learning

A Unified Energy-Based Framework for Unsupervised Learning A Unified Energy-Based Framework for Unsupervised Learning Marc Aurelio Ranzato Y-Lan Boureau Sumit Chopra Yann LeCun Courant Insitute of Mathematical Sciences New York University, New York, NY 10003 Abstract

More information

Contrastive Divergence

Contrastive Divergence Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

Joint Training of Partially-Directed Deep Boltzmann Machines

Joint Training of Partially-Directed Deep Boltzmann Machines Joint Training of Partially-Directed Deep Boltzmann Machines Ian J. Goodfellow goodfeli@iro.umontreal.ca Aaron Courville aaron.courville@umontreal.ca Yoshua Bengio Département d Informatique et de Recherche

More information

The Recurrent Temporal Restricted Boltzmann Machine

The Recurrent Temporal Restricted Boltzmann Machine The Recurrent Temporal Restricted Boltzmann Machine Ilya Sutskever, Geoffrey Hinton, and Graham Taylor University of Toronto {ilya, hinton, gwtaylor}@cs.utoronto.ca Abstract The Temporal Restricted Boltzmann

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain

More information

Kyle Reing University of Southern California April 18, 2018

Kyle Reing University of Southern California April 18, 2018 Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

Learning Deep Boltzmann Machines using Adaptive MCMC

Learning Deep Boltzmann Machines using Adaptive MCMC Ruslan Salakhutdinov Brain and Cognitive Sciences and CSAIL, MIT 77 Massachusetts Avenue, Cambridge, MA 02139 rsalakhu@mit.edu Abstract When modeling high-dimensional richly structured data, it is often

More information

Neural networks: Unsupervised learning

Neural networks: Unsupervised learning Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

Annealing Between Distributions by Averaging Moments

Annealing Between Distributions by Averaging Moments Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We

More information

LEARNING REPRESENTATIONS OF SEQUENCES IPAM GRADUATE SUMMER SCHOOL ON DEEP LEARNING

LEARNING REPRESENTATIONS OF SEQUENCES IPAM GRADUATE SUMMER SCHOOL ON DEEP LEARNING LEARNING REPRESENTATIONS OF SEQUENCES IPAM GRADUATE SUMMER SCHOOL ON DEEP LEARNING GRAHAM TAYLOR SCHOOL OF ENGINEERING UNIVERSITY OF GUELPH Papers and software available at: http://www.uoguelph.ca/~gwtaylor

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Deep Learning of Invariant Spatio-Temporal Features from Video

Deep Learning of Invariant Spatio-Temporal Features from Video Deep Learning of Invariant Spatio-Temporal Features from Video Bo Chen California Institute of Technology Pasadena, CA, USA bchen3@caltech.edu Benjamin M. Marlin University of British Columbia Vancouver,

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Denoising Autoencoders

Denoising Autoencoders Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation

Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation Cognitive Science 30 (2006) 725 731 Copyright 2006 Cognitive Science Society, Inc. All rights reserved. Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation Geoffrey Hinton,

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

A NEW VIEW OF ICA. G.E. Hinton, M. Welling, Y.W. Teh. S. K. Osindero

A NEW VIEW OF ICA. G.E. Hinton, M. Welling, Y.W. Teh. S. K. Osindero ( ( A NEW VIEW OF ICA G.E. Hinton, M. Welling, Y.W. Teh Department of Computer Science University of Toronto 0 Kings College Road, Toronto Canada M5S 3G4 S. K. Osindero Gatsby Computational Neuroscience

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto 1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost

More information

arxiv: v1 [cs.cl] 23 Sep 2013

arxiv: v1 [cs.cl] 23 Sep 2013 Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition Xin Zheng 1,2, Zhiyong Wu 1,2,3, Helen Meng 1,3, Weifeng Li 1, Lianhong Cai 1,2 arxiv:1309.6176v1 [cs.cl] 23 Sep

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

Gaussian-Bernoulli Deep Boltzmann Machine

Gaussian-Bernoulli Deep Boltzmann Machine Gaussian-Bernoulli Deep Boltzmann Machine KyungHyun Cho, Tapani Raiko and Alexander Ilin Aalto University School of Science Department of Information and Computer Science Espoo, Finland {kyunghyun.cho,tapani.raiko,alexander.ilin}@aalto.fi

More information

Human Pose Tracking I: Basics. David Fleet University of Toronto

Human Pose Tracking I: Basics. David Fleet University of Toronto Human Pose Tracking I: Basics David Fleet University of Toronto CIFAR Summer School, 2009 Looking at People Challenges: Complex pose / motion People have many degrees of freedom, comprising an articulated

More information

Maxout Networks. Hien Quoc Dang

Maxout Networks. Hien Quoc Dang Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine

More information

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016 Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3

More information