Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Size: px
Start display at page:

Download "Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih"

Transcription

1 Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011

2 Example: facial expression recognition Unlabeled face images Few labeled face images happiness neutral anger fear sadness What is the expression? (unseen occluded images)

3 Example: facial expression recognition 1) training on unlabeled data: maximum likelihood layer 2 layer 1 Input image Ranzato, Susskind, Mnih, Hinton On deep generative models.. CVPR11

4 Example: facial expression recognition 1) training on unlabeled data: maximum likelihood 2) use the generative model to inpute missing values Ranzato, Susskind, Mnih, Hinton On deep generative models.. CVPR11

5 Example: facial expression recognition 1) 2) 3) 4) training on unlabeled data: maximum likelihood use the generative model to inpute missing values use the latent representation as features train classifier on labeled features Ranzato, Susskind, Mnih, Hinton On deep generative models.. CVPR11

6 The power of hierarchical representations MNIST digits nd layer sparse code 1st layer sparse code input image Receptive field of 2nd layer units Ranzato et al. Sparse feature learning for Deep Belief Nets NIPS07

7 The power of hierarchical representations MNIST digits nd layer sparse code 1st layer sparse code input image Receptive field of 2nd layer units Ranzato et al. Sparse feature learning for Deep Belief Nets NIPS07

8 The power of hierarchical representations MNIST digits 2nd layer sparse code 1st layer sparse code input image Receptive fields of 2nd layer units Hierarchical representations can discover complex dependencies! Ranzato et al. Sparse feature learning for Deep Belief Nets NIPS07

9 Unsupervised Learning - we will learn hierarchical representations in a greedy fashion using unlabeled data (little if any assumption on the input) - we can focus on single layer unsupervised algorihtms - how can we interpret unsupervised algorithms? - how can they learn representations? - how can we compare them?

10 Two Approaches to Unsupervised Learning 1st strategy: constrain latent representation & optimize score only at training samples - K-Means - sparse coding Ranzato et al. NIPS 06, Ranzato et al. CVPR 07, Ranzato et al. NIPS 07, Kavukcuoglu et al. CVPR 09 - use lower dimensional representations Ranzato et al. ICML 08

11 Two Approaches to Unsupervised Learning 1st strategy: constrain latent representation & optimize score only at training samples - K-Means - sparse coding - use lower dimensional representations 2nd strategy: optimize score for training samples while normalizing the score over the whole space (maximum likelihood) Ranzato et al. AISTATS 10, Ranzato et al. CVPR 10, Ranzato et al. NIPS 10, Ranzato et al. CVPR 11 p(x) x

12 Two Approaches to Unsupervised Learning Pros st 1 strategy (constrain code) 2nd strategy (maximum likelihood) Cons - efficient learning - defined through objective function (monitor training) - choice of functions - not robust to missing inputs - not good for generation - intuitive design - it can generate - compositionality - hard to train (variational inference, MCMC, etc.)

13 Outline - mathematical formulation of the model - training - learning acoustic features for spech recognition - generation of natural images - recognition of facial expression under occlusion - conclusion

14 Outline - mathematical formulation of the model - training - learning acoustic features for spech recognition - generation of natural images - recognition of facial expression under occlusion - conclusion

15 Means and Covariances: PPCA p x h p x p x specifies covariance across all data points INPUT SPACE LATENT SPACE p x h Training sample Latent vector

16 Means and Covariances: PPCA p x h p x h t ~ p h x t p x ht specifies distribution for each data point INPUT SPACE LATENT SPACE p x h Training sample Latent vector

17 Means and Covariances: PPCA p x = h p x h p h p x h specifies distribution for each data point INPUT SPACE LATENT SPACE p x h Training sample Latent vector

18 Conditional Distribution Over Input p x h = N mean h, D - examples: PPCA, Factor Analysis, ICA, Gaussian INPUT SPACE LATENT SPACE p x h Training sample Latent vector

19 Conditional Distribution Over Input p x h = N mean h, D - examples: PPCA, Factor Analysis, ICA, Gaussian Poor model of dependency between input variables INPUT SPACE LATENT SPACE p x h Training sample Latent vector

20 Conditional Distribution Over Input p x h = N 0, Covariance h - examples: PoT, c INPUT SPACE LATENT SPACE p x h Training sample Welling et al. NIPS 2003, Ranzato et al. AISTATS 10 Latent vector

21 Conditional Distribution Over Input p x h = N 0, Covariance h - examples: PoT, c Poor model of mean intensity INPUT SPACE LATENT SPACE p x h Training sample Welling et al. NIPS 2003, Ranzato et al. AISTATS 10 Latent vector

22 input image

23 input image Which image has similar mean (but different covariance)? Which image has same covariance (but different mean)?

24 input image same covariance but different mean Equivalent image according to PoT similar mean but different covariance Equivalent image according to FA

25 Conditional Distribution Over Input p x h = N mean h,covariance h - this is what we propose: mc, mpot INPUT SPACE LATENT SPACE p x h Training sample Latent vector Ranzato et al. CVPR 10, Ranzato et al. NIPS 2010, Ranzato et al. CVPR 11

26 Conditional Distribution Over Input p x h = N mean h,covariance h - this is what we propose: mc, mpot INPUT SPACE LATENT SPACE p x h Training sample Latent vector Ranzato et al. CVPR 10, Ranzato et al. NIPS 2010, Ranzato et al. CVPR 11

27 mc x x x x - two sets of latent variables to modulate mean and covariance of the conditional distribution over the input - energy-based model p x, h m, hc exp E x, hm, h c x ℝ D c M h {0,1 } m N h {0,1} Ranzato Hinton CVPR 10

28 - easy generation - slow inference p x hm, h c = N m h m, hc 1 1 E = x m ' x m 2 hm hc x

29 - easy generation - slow inference p x hm, h c = N m h m, hc - less easy generation - fast inference hm hc 1 1 E = x m ' x m 2 hm hc x 1 1 E = x ' x m x 2 x p x hm, h c = N hc 1 m h m, hc

30 Geometric interpretation If we multiply them, we get...

31 Geometric interpretation p x hm, h c If we multiply them, we get... a sharper and shorter Gaussian

32 Covariance part of the energy function: 1 1 E x, h, h = x ' x 2 D x ℝ 1 D D ℝ c m xp xq pair-wise MRF Ranzato Hinton CVPR 10

33 Covariance part of the energy function: 1 E x, h, h = x ' C C ' x 2 D factorization x ℝ D F C ℝ c m xp xq pair-wise MRF Ranzato Hinton CVPR 10

34 Covariance part of the energy function: 1 c E x, h, h = x ' C [diag h ]C ' x 2 D factorization + hiddens x ℝ D F C ℝ c hk c F h {0,1 } c m xp C C xq gated MRF Ranzato Hinton CVPR 10

35 Covariance part of the energy function: 1 c E x, h, h = x ' C [diag P h ]C ' x 2 D factorization + hiddens x ℝ D F C ℝ c hk c M h {0,1 } F M P ℝ P c m xp C C xq gated MRF Ranzato Hinton CVPR 10

36 Overall energy function: 1 1 c m E x, h, h = x ' C [diag P h ]C ' x x ' x x ' W h 2 2 D covariance mean part x ℝ part D N W ℝ c hk m N h {0,1} c m P xp C C xq gated MRF W h m j Ranzato Hinton CVPR 10

37 Overall energy function: 1 1 c m E x, h, h = x ' C [diag P h ]C ' x x ' x x ' W h 2 2 c m covariance part h mean part c k c m m p x h, h = N Wh, P xp C 1 =C diag [ P hc ]C ' I C xq gated MRF W h m j Ranzato Hinton CVPR 10

38 Overall energy function: 1 1 c m E x, h, h = x ' C [diag P h ]C ' x x ' x x ' W h 2 2 c m covariance part h inference c k mean part 1 2 p h =1 x = P k C ' x bk 2 c k P xp C C xq gated MRF W h m j p hmj =1 x = W j ' x b j Ranzato Hinton CVPR 10

39 Outline - mathematical formulation of the model - training - learning acoustic features for spech recognition - generation of natural images - recognition of facial expression under occlusion - conclusion

40 Learning m c E x, h, h - maximum likelihood h, h e p x = E x,h e x,h, h m c m m c,h c - Fast Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples h ck P xi C W C h xj m n 1 c m E = x ' C [ diag P h ]C ' x x ' W h... 2

41 Learning F= - log(p(x)) + K F w training sample

42 Learning F w training sample HMC dynamics

43 Learning F= - log(p(x)) + K F w training sample HMC dynamics

44 Learning F= - log(p(x)) + K F w training sample HMC dynamics

45 Learning F= - log(p(x)) + K F w training sample HMC dynamics

46 Learning F= - log(p(x)) + K F w training sample HMC dynamics

47 Learning F= - log(p(x)) + K F w HMC dynamics

48 Learning F= - log(p(x)) + K F w F w data sample point w w weight update F data w F sample w

49 Learning F= - log(p(x)) + K F w F w data sample point w w weight update F data w F sample w

50 Learning Start dynamics from previous point F= - log(p(x)) + K data sample point w w F data w F sample w

51 Outline - mathematical formulation of the model - training - learning acoustic features for spech recognition - generation of natural images - recognition of facial expression under occlusion - conclusion

52 Speech Recognition on TIMIT INPUT - frame: 25ms Hamming window - 10ms overlap between frames - 39 log-filterbank outputs + energy per frame - 15 frames - PCA whitening 384 components - no deltas, no delta-deltas... <-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

53 Speech Recognition on TIMIT gated MRF with 1024 precision and 512 mean units trained on independent windows gmrf <-235 ms->

54 Speech Recognition on TIMIT gated MRF with 1024 precision and 512 mean units trained on independent windows gmrf <-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

55 Speech Recognition on TIMIT gated MRF with 1024 precision and 512 mean units trained on independent windows gmrf Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

56 Speech Recognition on TIMIT - For each training sample, infer latent variables of gated MRF - Train 2nd layer binary-binary (2048 units) gmrf <-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

57 Speech Recognition on TIMIT - Train 1st layer gated MRF - Train 2nd layer binary-binary (2048 units) - Train 3rd layer binary-binary (2048 units) gmrf <-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

58 Speech Recognition on TIMIT 8 layer (deep) model... gmrf <-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

59 Speech Recognition on TIMIT 8 layer (deep) model Supervised training: predict states of aligned HMM... gmrf <-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

60 Speech Recognition on TIMIT 8 layer (deep) model Test: predict states of HMM decoding (bigram language model)... gmrf <-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010

61 Speech Recognition on TIMIT METHOD PER CRF Large-Margin GMM CD-HMM Augmented CRF RNN Bayesian Triphone HMM Triphone HMM discrim. trained DBN with gated MRF 34.8% 33.0% 27.3% 26.6% 26.1% 25.6% 22.7% 20.5%

62 Speech Recognition on TIMIT METHOD PER Year CRF Large-Margin GMM CD-HMM Augmented CRF RNN Bayesian Triphone HMM Triphone HMM discrim. trained DBN with gated MRF 34.8% 33.0% 27.3% 26.6% 26.1% 25.6% 22.7% 20.5%

63 Outline - mathematical formulation of the model - training - learning acoustic features for spech recognition - generation of natural images - recognition of facial expression under occlusion - conclusion

64 Generation natural image patches mc Ranzato and Hinton CVPR 2010 Natural images G from Osindero and Hinton NIPS 2008 S- + DBN from Osindero and Hinton NIPS 2008

65 From Patches to High-Resolution Images Training by picking patches at random

66 From Patches to High-Resolution Images This is not a good way to extend the model to big images: block artifacts But we could also take them from a grid

67 From Patches to High-Resolution Images IDEA: have one subset of filters applied to these locations,

68 From Patches to High-Resolution Images IDEA: have one subset of filters applied to these locations, another subset to these locations

69 From Patches to High-Resolution Images IDEA: have one subset of filters applied to these locations, another subset to these locations, etc. Train jointly all parameters. Gregor LeCun arxiv 2010 Ranzato, Mnih, Hinton NIPS 2010 No block artifacts Reduced redundancy

70 Sampling High-Resolution Images Gaussian model from Simoncelli 2005 marginal wavelet

71 Sampling High-Resolution Images Gaussian model marginal wavelet from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

72 Sampling High-Resolution Images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE Ranzato, Mnih, Hinton NIPS 2010 from Schmidt, Gao, Roth CVPR 2010

73 Sampling High-Resolution Images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE Ranzato, Mnih, Hinton NIPS 2010 from Schmidt, Gao, Roth CVPR 2010

74 Sampling High-Resolution Images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE Ranzato, Mnih, Hinton NIPS 2010 from Schmidt, Gao, Roth CVPR 2010

75 Making the model.. DEEPER Treat these units as data to train a similar model on the top SECOND STAGE Field of binary 's. Each hidden unit has a receptive field of 30x30 pixels in input space.

76 Sampling from the DEEPER model - Sample from 2nd layer - project sample in image space using 1st layer p(x h) h 2 h... nd 2 layer gmrf 1st layer 1 E h1, h2 = h1 ' W h 2 2 j 1 k j p h =1 h = W j ' h b k p h =1 h = W k h b

77 Samples from Deep Generative Model 1st stage model 3rd stage model

78 Samples from Deep Generative Model 1st stage model 3rd stage model

79 Samples from Deep Generative Model 1st stage model 3rd stage model

80 Samples from Deep Generative Model 1st stage model 3rd stage model

81 Sampling High-Resolution Images Gaussian model FoE from Schmidt, etal CVPR 2010 from Simoncelli 2005 Deep - 1 layer Ranzato, Mnih, Hinton NIPS 2010 marginal wavelet from Simoncelli 2005 Deep - 3 layers Ranzato, et al. CVPR 2011

82 Outline - mathematical formulation of the model - training - learning acoustic features for spech recognition - generation of natural images - recognition of facial expression under occlusion - conclusion

83 Facial Expression Recognition Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions anger Ranzato, et al. CVPR 2011

84 Facial Expression Recognition Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions disgust

85 Facial Expression Recognition Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions fear

86 Facial Expression Recognition Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions happines s

87 Facial Expression Recognition Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions neutral

88 Facial Expression Recognition Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions sadness

89 Facial Expression Recognition Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions surprise

90 Facial Expression Recognition - 1st layer using local (not shared) connectivity - layers above are fully connected - 5 layers in total - Result - Linear Classifier on raw pixels 71.5% - Gaussian RBF SVM on raw pixels 76.2% - Gabor + PCA + linear classifier 80.1% Dailey et al. J. Cog. Science Sparse coding 74.6% Wright et al. PAMI DEEP model (3 layers): 82.5%

91 Facial Expression Recognition - We can draw samples from the model (5th layer with 128 hiddens)... 5th layer... gmrf 1st layer

92 Facial Expression Recognition - We can draw samples from the model (5th layer with 128 hiddens)

93 Facial Expression Recognition - We can draw samples from the model (5th layer with 128 hiddens) gmrf gmrf... 5th layer

94 Facial Expression Recognition - We can draw samples from the model (5th layer with 128 hiddens)

95 Facial Expression Recognition - We can draw samples from the model (5th layer with 128 hiddens)

96 Facial Expression Recognition - 7 synthetic occlusions - use generative model to fill-in (conditional on the known pixels) gmrf gmrf gmrf gmrf gmrf...

97 Facial Expression Recognition originals Type 1 occlusion: eyes Restored images

98 Facial Expression Recognition originals Type 2 occlusion: mouth Restored images

99 Facial Expression Recognition originals Type 3 occlusion: right half Restored images

100 Facial Expression Recognition originals Type 4 occlusion: bottom half Restored images

101 Facial Expression Recognition originals Type 5 occlusion: top half Restored images

102 Facial Expression Recognition originals Type 6 occlusion: nose Restored images

103 Facial Expression Recognition originals Type 7 occlusion: 70% of pixels at random Restored images

104 Facial Expression Recognition Origina l Input

105 Facial Expression Recognition Origina l Input 1st layer gmrf gmrf

106 Facial Expression Recognition Origina l Input 1st layer 2nd layer gmrf gmrf

107 Facial Expression Recognition Origina l Input 1st layer 2nd layer 3rd layer gmrf gmrf

108 Facial Expression Recognition Origina l Input 1st layer 2nd layer 3rd layer 4th layer gmrf gmrf

109 Facial Expression Recognition Original Input 1st layer 2nd layer 3rd layer gmrf gmrf gmrf gmrf gmrf 4th layer times

110 Facial Expression Recognition CASE 1: original images for training, occluded for test Dailey, et al. J. Cog. Neuros Wright, et al. PAMI 2008 Ranzato, et al. CVPR 2011

111 Facial Expression Recognition CASE 2: occluded images for both training and test Dailey, et al. J. Cog. Neuros Wright, et al. PAMI 2008 Ranzato, et al. CVPR 2011

112 Outline - mathematical formulation of the model - training - learning acoustic features for spech recognition - generation of natural images - recognition of facial expression under occlusion - conclusion

113 Summary - Unsupervised Learning: regularize & learn representations - Deep Generative Model - 1st layer: gated MRF - Higher layers: binary 's - fast inference - Realistic generation: natural images - Applications: - facial expression recognition robust to occlusion - speech recognition

114 THANK YOU

115 Learned Filters h N P C vp c k C vq F W M h m j

116 Learned Filters h N P C C F W h M c k m j

117 Using -Energy to Score Images less likely test images > > >

118 Using Energy to Score Images Upside-down images > > > > > >

119 Using Energy to Score Images Average of those images for which difference of energy is higher

Modeling Natural Images with Higher-Order Boltzmann Machines

Modeling Natural Images with Higher-Order Boltzmann Machines Modeling Natural Images with Higher-Order Boltzmann Machines Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu joint work with Geoffrey Hinton and Vlad Mnih CIFAR

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

Graphical Object Models for Detection and Tracking

Graphical Object Models for Detection and Tracking Graphical Object Models for Detection and Tracking (ls@cs.brown.edu) Department of Computer Science Brown University Joined work with: -Ying Zhu, Siemens Corporate Research, Princeton, NJ -DorinComaniciu,

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Denoising Autoencoders

Denoising Autoencoders Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material A Generative Perspective on MRFs in Low-Level Vision Supplemental Material Uwe Schmidt Qi Gao Stefan Roth Department of Computer Science, TU Darmstadt 1. Derivations 1.1. Sampling the Prior We first rewrite

More information

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Variational Autoencoders Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Contents 1. Why unsupervised learning, and why generative models? (Selected slides from Yann

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Richard Turner (turner@gatsby.ucl.ac.uk) Gatsby Computational Neuroscience Unit, 02/03/2006 Outline Historical

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Deep Learning Architectures and Algorithms

Deep Learning Architectures and Algorithms Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning

More information

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Kihyuk Sohn Dae Yon Jung Honglak Lee Alfred O. Hero III Dept. of Electrical Engineering and Computer

More information

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici Subspace Analysis for Facial Image Recognition: A Comparative Study Yongbin Zhang, Lixin Lang and Onur Hamsici Outline 1. Subspace Analysis: Linear vs Kernel 2. Appearance-based Facial Image Recognition.

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

arxiv: v2 [cs.ne] 22 Feb 2013

arxiv: v2 [cs.ne] 22 Feb 2013 Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley Hierarchical Sparse Bayesian Learning Pierre Garrigues UC Berkeley Outline Motivation Description of the model Inference Learning Results Motivation Assumption: sensory systems are adapted to the statistical

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada

More information

Learning to Disentangle Factors of Variation with Manifold Learning

Learning to Disentangle Factors of Variation with Manifold Learning Learning to Disentangle Factors of Variation with Manifold Learning Scott Reed Kihyuk Sohn Yuting Zhang Honglak Lee University of Michigan, Department of Electrical Engineering and Computer Science 08

More information

Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human

More information

Approximate inference in Energy-Based Models

Approximate inference in Energy-Based Models CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based

More information

Feature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager

Feature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Feature Noising Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Outline Part 0: Some backgrounds Part 1: Dropout as adaptive

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Auto-Encoders & Variants

Auto-Encoders & Variants Auto-Encoders & Variants 113 Auto-Encoders MLP whose target output = input Reconstruc7on=decoder(encoder(input)), input e.g. x code= latent features h encoder decoder reconstruc7on r(x) With bo?leneck,

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto 1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost

More information

Bayesian Paragraph Vectors

Bayesian Paragraph Vectors Bayesian Paragraph Vectors Geng Ji 1, Robert Bamler 2, Erik B. Sudderth 1, and Stephan Mandt 2 1 Department of Computer Science, UC Irvine, {gji1, sudderth}@uci.edu 2 Disney Research, firstname.lastname@disneyresearch.com

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Clustering and Gaussian Mixtures

Clustering and Gaussian Mixtures Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

arxiv: v1 [stat.ml] 30 Mar 2016

arxiv: v1 [stat.ml] 30 Mar 2016 A latent-observed dissimilarity measure arxiv:1603.09254v1 [stat.ml] 30 Mar 2016 Yasushi Terazono Abstract Quantitatively assessing relationships between latent variables and observed variables is important

More information

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However, Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Structured deep models: Deep learning on graphs and beyond

Structured deep models: Deep learning on graphs and beyond Structured deep models: Deep learning on graphs and beyond Hidden layer Hidden layer Input Output ReLU ReLU, 25 May 2018 CompBio Seminar, University of Cambridge In collaboration with Ethan Fetaya, Rianne

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information