Modeling Natural Images with Higher-Order Boltzmann Machines
|
|
- Evan Daniel
- 5 years ago
- Views:
Transcription
1 Modeling Natural Images with Higher-Order Boltzmann Machines Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto joint work with Geoffrey Hinton and Vlad Mnih CIFAR summer school, 12 August 2010
2 How to model natural images? p(v,h) X X X X image space v v: piels h: latent features
3 A generative perspective: p(v h) PCA Pearson 1901
4 A generative perspective: p(v h) PCA Pearson 1901 FA p(v h)=n(wh, D) Young 1940
5 A generative perspective: p(v h) PCA mean Pearson 1901 FA sample ~ p(v h) structure is lost! p(v h)=n(wh, D) Young 1940
6 A generative perspective: p(v h) PCA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999
7 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999
8 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999 PoT p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003
9 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 data point sample ~ p(v h) PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT structure is preserved! p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003
10 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 mcrbm PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT Ranzato Hinton 2010 p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003
11 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 mcrbm PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT Ranzato Hinton 2010 THIS IS NOT A MIXTURE OF GAUSSIANS!! p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003
12 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value v2 - input - p(v h) v1
13 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value with some eceptions! v2 - input - p(v h) v1
14 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value with some eceptions! v2 - input - p(v h) With one binary hidden unit we can pick the Gaussian with the right covariance v1 This is like PoT
15 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. v2 - input - p(v h) without mean v1
16 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. v2 - input - p(v h) with mean v1
17 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. When data is not centered, this is even more dramatic. v2 - input - p(v h) without mean v1 Failed to capture edge!
18 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. When data is not centered, this is even more dramatic. v2 - input - p(v h) with mean v1
19 p(v h) hiddens determine an image-specific mean and an image-specific covariance. mcrbm Ranzato Hinton 2010 How to modulate mean and covariance using hidden units? m c m c p v, h, h ep E v, h, h
20 - easy generation - slow inference p v hm, hc = N m h m, hc 1 1 E = v m ' v m 2 hm hc v
21 - easy generation - slow inference p v hm, hc = N m h m, hc - less easy generation - fast inference hm hc 1 1 E = v m ' v m 2 hm hc v 1 1 E = v ' v m v 2 v p v hm, hc = N hc 1 m h m, hc
22 Geometric interpretation If we multiply them, we get...
23 Geometric interpretation If we multiply them, we get... a sharper and shorter Gaussian
24 We start by modeling small image patches. - v visibles - h hiddens 1 0 p(v,h) v 1 0 h
25 Modeling the covariance only (using binary hiddens): crbm Ranzato Krizhevsky Hinton AISTATS E= v' v 2
26 c 1 h gated MRF v1 c c v2 c 1 E v, h =w 1 v 1 v 2 h... Interactions determined by state of latent variables 1 1 E v, h = v ' v 2 c 1 c c =C diag P h C ' c k h {0,1 }
27 So far we modeled the covariance only... now we add also the mean mcrbm Ranzato Hinton CVPR E = v ' v m v 2
28 h 1 1 m E v, h, h = v ' v ij W ij v i h j 2 m { c c v1 v2 h crbm m Conditional over visibles has non-zero mean that depends on both sets of hiddens: c m m p v h, h = N W h,
29 Interpreting mcrbm - Looking at p v h - relation to PCA, FA, PoT, etc.
30 Interpreting mcrbm - Looking at p v h - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c v2 - relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96
31 Interpreting mcrbm - Looking at p v h - Looking at E v, h h 3rd order BM v1 - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c c v2 - relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96 v2 - relation to conditional 3-way RBM Memisevic et al 07
32 Interpreting mcrbm - Looking at p v h - Looking at E v, h h 3rd order BM v1 - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c c v2 v2 - relation to conditional 3-way RBM Memisevic et al 07 - Looking at p h v 1 c 2 p h k =1 v = P k C ' v b k relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96 - relation to simple-comple cell model
33 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m
34 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m
35 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m
36 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples F=- log(p(v)) + K F w training sample
37 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample
38 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample
39 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample
40 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample
41 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample
42 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w
43 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples weight update F=- log(p(v)) + K F w F w data sample point w w F data w F sample w
44 LEARNING - maimum likelihood - Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples Start dynamics from previous point F=- log(p(v)) + K data sample point w w F data w F sample w
45 LEARNING - maimum likelihood - Fast Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples Start dynamics from previous point, and temporarily perturb weights by a little bit F=- log(p(v)) + K data sample point Tieleman and Hinton ICML 2009 w w F data w F sample w
46 LEARNING - maimum likelihood - Fast Persistent Contrastive Divergence Initialize: w, w f, f for each training data case do: - get training sample: v - compute derivatives: g = F / w v - draw sample: v HMC v ; w w f - compute derivatives: g = F / w w f v - update true parameters: w w g g - update fast weights: w f 0.95 w f f g g Tieleman and Hinton ICML 2009
47 Eperiments
48 Learn from 1616 natural image patches pre-processing: PCA whitening covariance filters hc v1 v2 hm
49 Learn from 1616 natural image patches pre-processing: PCA whitening grouping of covariance filters hc v1 v2 hm
50 Learn from 1616 natural image patches pre-processing: PCA whitening mean intensity filters hc v1 v2 hm
51 1) given image -> infer latent variables using p(h v) 2) keeping latent variables fied, sample from p(v h) random walk in input space sampling p(v h) oo oo oo o The latent configuration induces a whole subspace of images. The latent representation learns to be robust to small distortions.
52 Eample of image patches used during training
53 Samples drawn from the model (using HMC)
54 Eample of image patches used during training
55 Samples drawn from the model (using HMC)
56 Comparison Natural images mcrbm GRBM from Osindero and Hinton NIPS 2008 S-RBM + DBN from Osindero and Hinton NIPS 2008
57 From patches to big images Training by picking patches at random
58 From patches to big images But we could also take them from a grid This is not a good way to etend the model to big images: block artifacts
59 From patches to big images But a subset of filters applied to these patches and...
60 From patches to big images But a subset of filters applied to these patches and... other subsets applied to shifted grids Gregor LeCun 2010, our paper in submission
61 From patches to big images But a subset of filters applied to these patches and... other subsets applied to shifted grids no block artifacts & little redundancy Gregor LeCun 2010, our paper in submission
62 Learning on high-resolution images covariance filters
63 Learning on high-resolution images covariance filters
64 Learning on high-resolution images covariance filters
65 Learning on high-resolution images covariance filters
66 Learning on high-resolution images covariance filters
67 Learning on high-resolution images mean filters
68 Sampling high resolution images Gaussian model from Simoncelli 2005 marginal wavelet
69 Sampling high resolution images Gaussian model marginal wavelet from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010
70 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010
71 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010
72 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010
73 Sampling high resolution images Sampling starting from natural image
74 Sampling high resolution images Sampling starting from natural image
75 Inpainting: guessing the 90% of the piels masked input image
76 Inpainting: guessing the 90% of the piels MAP estimate missing piels
77 Inpainting: guessing the 90% of the piels true image
78 Inpainting: guessing the 90% of the piels masked input image
79 Inpainting: guessing the 90% of the piels MAP estimate missing piels
80 Inpainting: guessing the 90% of the piels true image
81 Inpainting: guessing the 90% of the piels masked input image
82 Inpainting: guessing the 90% of the piels MAP estimate missing piels
83 Inpainting: guessing the 90% of the piels true image
84 Conclusion 3-way Boltzmann Machines Joint model: fast inference, easy to interpret conditionals It generates very realistic samples Training is hard because of partition function Future: applications: segmentation, denoising multi-scale we'll make it DEEPER!!
85 THANK YOU
Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih
Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationHierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley
Hierarchical Sparse Bayesian Learning Pierre Garrigues UC Berkeley Outline Motivation Description of the model Inference Learning Results Motivation Assumption: sensory systems are adapted to the statistical
More informationA Spike and Slab Restricted Boltzmann Machine
Aaron Courville James Bergstra Yoshua Bengio DIRO, Université de Montréal, Montréal, Québec, Canada {courvila,bergstrj,bengioy}@iro.umontreal.ca Abstract We introduce the spike and slab Restricted Boltzmann
More informationA Generative Perspective on MRFs in Low-Level Vision Supplemental Material
A Generative Perspective on MRFs in Low-Level Vision Supplemental Material Uwe Schmidt Qi Gao Stefan Roth Department of Computer Science, TU Darmstadt 1. Derivations 1.1. Sampling the Prior We first rewrite
More informationAutoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas
On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationIs early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005
Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Richard Turner (turner@gatsby.ucl.ac.uk) Gatsby Computational Neuroscience Unit, 02/03/2006 Outline Historical
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationEvaluating Models of Natural Image Patches
1 / 16 Evaluating Models of Natural Image Patches Chris Williams Neural Information Processing School of Informatics, University of Edinburgh January 15, 2018 Outline Evaluating Models Comparing Whitening
More informationApproximate inference in Energy-Based Models
CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationEfficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition
Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Kihyuk Sohn Dae Yon Jung Honglak Lee Alfred O. Hero III Dept. of Electrical Engineering and Computer
More informationarxiv: v1 [cs.ne] 6 May 2014
Training Restricted Boltzmann Machine by Perturbation arxiv:1405.1436v1 [cs.ne] 6 May 2014 Siamak Ravanbakhsh, Russell Greiner Department of Computing Science University of Alberta {mravanba,rgreiner@ualberta.ca}
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationDeep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia
Deep Learning of Invariant Spatiotemporal Features from Video Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Introduction Focus: Unsupervised feature extraction from
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationInductive Principles for Restricted Boltzmann Machine Learning
Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationLearning Nonlinear Statistical Regularities in Natural Images by Modeling the Outer Product of Image Intensities
LETTER Communicated by Odelia Schwartz Learning Nonlinear Statistical Regularities in Natural Images by Modeling the Outer Product of Image Intensities Peng Qi pengrobertqi@163.com Xiaolin Hu xlhu@tsinghua.edu.cn
More informationNeural Networks for Machine Learning. Lecture 11a Hopfield Nets
Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationUnsupervised Feature Learning and Deep Learning: A Review and New Perspectives
1 Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives Yoshua Bengio, Aaron Courville, and Pascal Vincent Department of computer science and operations research, U. Montreal arxiv:1206.5538v1
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationA Unified Energy-Based Framework for Unsupervised Learning
A Unified Energy-Based Framework for Unsupervised Learning Marc Aurelio Ranzato Y-Lan Boureau Sumit Chopra Yann LeCun Courant Insitute of Mathematical Sciences New York University, New York, NY 10003 Abstract
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationA two-layer ICA-like model estimated by Score Matching
A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional
More informationJoint Training of Partially-Directed Deep Boltzmann Machines
Joint Training of Partially-Directed Deep Boltzmann Machines Ian J. Goodfellow goodfeli@iro.umontreal.ca Aaron Courville aaron.courville@umontreal.ca Yoshua Bengio Département d Informatique et de Recherche
More informationThe Recurrent Temporal Restricted Boltzmann Machine
The Recurrent Temporal Restricted Boltzmann Machine Ilya Sutskever, Geoffrey Hinton, and Graham Taylor University of Toronto {ilya, hinton, gwtaylor}@cs.utoronto.ca Abstract The Temporal Restricted Boltzmann
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationDeep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain
More informationKyle Reing University of Southern California April 18, 2018
Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationLearning Deep Boltzmann Machines using Adaptive MCMC
Ruslan Salakhutdinov Brain and Cognitive Sciences and CSAIL, MIT 77 Massachusetts Avenue, Cambridge, MA 02139 rsalakhu@mit.edu Abstract When modeling high-dimensional richly structured data, it is often
More informationNeural networks: Unsupervised learning
Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationModeling Documents with a Deep Boltzmann Machine
Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationRepresentational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some
More informationAnnealing Between Distributions by Averaging Moments
Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We
More informationLEARNING REPRESENTATIONS OF SEQUENCES IPAM GRADUATE SUMMER SCHOOL ON DEEP LEARNING
LEARNING REPRESENTATIONS OF SEQUENCES IPAM GRADUATE SUMMER SCHOOL ON DEEP LEARNING GRAHAM TAYLOR SCHOOL OF ENGINEERING UNIVERSITY OF GUELPH Papers and software available at: http://www.uoguelph.ca/~gwtaylor
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationDeep Learning of Invariant Spatio-Temporal Features from Video
Deep Learning of Invariant Spatio-Temporal Features from Video Bo Chen California Institute of Technology Pasadena, CA, USA bchen3@caltech.edu Benjamin M. Marlin University of British Columbia Vancouver,
More informationAn efficient way to learn deep generative models
An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLarge-scale Collaborative Prediction Using a Nonparametric Random Effects Model
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationDenoising Autoencoders
Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationUnsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation
Cognitive Science 30 (2006) 725 731 Copyright 2006 Cognitive Science Society, Inc. All rights reserved. Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation Geoffrey Hinton,
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationThe XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic
The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationEnergy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13
Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent
More informationA NEW VIEW OF ICA. G.E. Hinton, M. Welling, Y.W. Teh. S. K. Osindero
( ( A NEW VIEW OF ICA G.E. Hinton, M. Welling, Y.W. Teh Department of Computer Science University of Toronto 0 Kings College Road, Toronto Canada M5S 3G4 S. K. Osindero Gatsby Computational Neuroscience
More informationDoes the Wake-sleep Algorithm Produce Good Density Estimators?
Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationHow to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto
1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost
More informationarxiv: v1 [cs.cl] 23 Sep 2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition Xin Zheng 1,2, Zhiyong Wu 1,2,3, Helen Meng 1,3, Weifeng Li 1, Lianhong Cai 1,2 arxiv:1309.6176v1 [cs.cl] 23 Sep
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationDeep Learning & Neural Networks Lecture 2
Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation
More informationGenerative models for missing value completion
Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative
More informationGaussian-Bernoulli Deep Boltzmann Machine
Gaussian-Bernoulli Deep Boltzmann Machine KyungHyun Cho, Tapani Raiko and Alexander Ilin Aalto University School of Science Department of Information and Computer Science Espoo, Finland {kyunghyun.cho,tapani.raiko,alexander.ilin}@aalto.fi
More informationHuman Pose Tracking I: Basics. David Fleet University of Toronto
Human Pose Tracking I: Basics David Fleet University of Toronto CIFAR Summer School, 2009 Looking at People Challenges: Complex pose / motion People have many degrees of freedom, comprising an articulated
More informationMaxout Networks. Hien Quoc Dang
Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine
More informationPower EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract
Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationInferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016
Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationDiscriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos
Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3
More information