Modeling Natural Images with Higher-Order Boltzmann Machines

Size: px

Start display at page:

Download "Modeling Natural Images with Higher-Order Boltzmann Machines"

Evan Daniel
5 years ago
Views:

1 Modeling Natural Images with Higher-Order Boltzmann Machines Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto joint work with Geoffrey Hinton and Vlad Mnih CIFAR summer school, 12 August 2010

2 How to model natural images? p(v,h) X X X X image space v v: piels h: latent features

3 A generative perspective: p(v h) PCA Pearson 1901

4 A generative perspective: p(v h) PCA Pearson 1901 FA p(v h)=n(wh, D) Young 1940

5 A generative perspective: p(v h) PCA mean Pearson 1901 FA sample ~ p(v h) structure is lost! p(v h)=n(wh, D) Young 1940

6 A generative perspective: p(v h) PCA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999

7 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999

8 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA Pearson 1901 PPCA p(v h)=n(wh, I) Tipping Bishop 1999 PoT p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

9 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 data point sample ~ p(v h) PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT structure is preserved! p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

10 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 mcrbm PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT Ranzato Hinton 2010 p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

11 A generative perspective: p(v h) PCA Sparse Coding p(v h)=n(wh, I) Pearson 1901 mcrbm PPCA p(v h)=n(wh, I) Tipping Bishop 1999 Olshausen Field 1996 also Welling Zvi Hinton (GRBM) & ICA PoT Ranzato Hinton 2010 THIS IS NOT A MIXTURE OF GAUSSIANS!! p(v h)=n(0, (CHC')^-1) Welling etal 2002 also Wainwright etal (GSM) Karklin Lewicki 2003

12 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value v2 - input - p(v h) v1

13 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value with some eceptions! v2 - input - p(v h) v1

14 Why modeling covariance, p(v h) Eample: image with two piels - v1 v 2 assume we subtract off the mean from the image most of the times piels take the same value with some eceptions! v2 - input - p(v h) With one binary hidden unit we can pick the Gaussian with the right covariance v1 This is like PoT

15 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. v2 - input - p(v h) without mean v1

16 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. v2 - input - p(v h) with mean v1

17 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. When data is not centered, this is even more dramatic. v2 - input - p(v h) without mean v1 Failed to capture edge!

18 Why modeling both mean and covariance, p(v h) We can produce an even more precise fit by modeling the mean. When data is not centered, this is even more dramatic. v2 - input - p(v h) with mean v1

19 p(v h) hiddens determine an image-specific mean and an image-specific covariance. mcrbm Ranzato Hinton 2010 How to modulate mean and covariance using hidden units? m c m c p v, h, h ep E v, h, h

20 - easy generation - slow inference p v hm, hc = N m h m, hc 1 1 E = v m ' v m 2 hm hc v

21 - easy generation - slow inference p v hm, hc = N m h m, hc - less easy generation - fast inference hm hc 1 1 E = v m ' v m 2 hm hc v 1 1 E = v ' v m v 2 v p v hm, hc = N hc 1 m h m, hc

22 Geometric interpretation If we multiply them, we get...

23 Geometric interpretation If we multiply them, we get... a sharper and shorter Gaussian

24 We start by modeling small image patches. - v visibles - h hiddens 1 0 p(v,h) v 1 0 h

25 Modeling the covariance only (using binary hiddens): crbm Ranzato Krizhevsky Hinton AISTATS E= v' v 2

26 c 1 h gated MRF v1 c c v2 c 1 E v, h =w 1 v 1 v 2 h... Interactions determined by state of latent variables 1 1 E v, h = v ' v 2 c 1 c c =C diag P h C ' c k h {0,1 }

27 So far we modeled the covariance only... now we add also the mean mcrbm Ranzato Hinton CVPR E = v ' v m v 2

28 h 1 1 m E v, h, h = v ' v ij W ij v i h j 2 m { c c v1 v2 h crbm m Conditional over visibles has non-zero mean that depends on both sets of hiddens: c m m p v h, h = N W h,

29 Interpreting mcrbm - Looking at p v h - relation to PCA, FA, PoT, etc.

30 Interpreting mcrbm - Looking at p v h - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c v2 - relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96

31 Interpreting mcrbm - Looking at p v h - Looking at E v, h h 3rd order BM v1 - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c c v2 - relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96 v2 - relation to conditional 3-way RBM Memisevic et al 07

- Looking at hiddens h v1 c c v2 v2 - relation to conditional 3-way RBM Memisevic et al 07 -

32 Interpreting mcrbm - Looking at p v h - Looking at E v, h h 3rd order BM v1 - relation to PCA, FA, PoT, etc. - Looking at hiddens h v1 c c v2 v2 - relation to conditional 3-way RBM Memisevic et al 07 - Looking at p h v 1 c 2 p h k =1 v = P k C ' v b k relation to line process and PoT Geman etal 84, Blake etal 87, Black etal 96 - relation to simple-comple cell model

33 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m

34 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m

35 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples h c 1 c 2 m E v, h, h = k hk P k C v j h j W j v 2 c v1 v2 hm m

36 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples F=- log(p(v)) + K F w training sample

37 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

38 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

39 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

40 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

41 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w training sample

42 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples HMC dynamics F=- log(p(v)) + K F w

43 LEARNING - maimum likelihood - Contrastive Divergence - Hybrid Monte Carlo to draw samples weight update F=- log(p(v)) + K F w F w data sample point w w F data w F sample w

44 LEARNING - maimum likelihood - Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples Start dynamics from previous point F=- log(p(v)) + K data sample point w w F data w F sample w

45 LEARNING - maimum likelihood - Fast Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples Start dynamics from previous point, and temporarily perturb weights by a little bit F=- log(p(v)) + K data sample point Tieleman and Hinton ICML 2009 w w F data w F sample w

46 LEARNING - maimum likelihood - Fast Persistent Contrastive Divergence Initialize: w, w f, f for each training data case do: - get training sample: v - compute derivatives: g = F / w v - draw sample: v HMC v ; w w f - compute derivatives: g = F / w w f v - update true parameters: w w g g - update fast weights: w f 0.95 w f f g g Tieleman and Hinton ICML 2009

47 Eperiments

48 Learn from 1616 natural image patches pre-processing: PCA whitening covariance filters hc v1 v2 hm

49 Learn from 1616 natural image patches pre-processing: PCA whitening grouping of covariance filters hc v1 v2 hm

50 Learn from 1616 natural image patches pre-processing: PCA whitening mean intensity filters hc v1 v2 hm

51 1) given image -> infer latent variables using p(h v) 2) keeping latent variables fied, sample from p(v h) random walk in input space sampling p(v h) oo oo oo o The latent configuration induces a whole subspace of images. The latent representation learns to be robust to small distortions.

52 Eample of image patches used during training

53 Samples drawn from the model (using HMC)

54 Eample of image patches used during training

55 Samples drawn from the model (using HMC)

56 Comparison Natural images mcrbm GRBM from Osindero and Hinton NIPS 2008 S-RBM + DBN from Osindero and Hinton NIPS 2008

57 From patches to big images Training by picking patches at random

58 From patches to big images But we could also take them from a grid This is not a good way to etend the model to big images: block artifacts

59 From patches to big images But a subset of filters applied to these patches and...

60 From patches to big images But a subset of filters applied to these patches and... other subsets applied to shifted grids Gregor LeCun 2010, our paper in submission

61 From patches to big images But a subset of filters applied to these patches and... other subsets applied to shifted grids no block artifacts & little redundancy Gregor LeCun 2010, our paper in submission

62 Learning on high-resolution images covariance filters

63 Learning on high-resolution images covariance filters

64 Learning on high-resolution images covariance filters

65 Learning on high-resolution images covariance filters

66 Learning on high-resolution images covariance filters

67 Learning on high-resolution images mean filters

68 Sampling high resolution images Gaussian model from Simoncelli 2005 marginal wavelet

69 Sampling high resolution images Gaussian model marginal wavelet from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

70 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

71 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

72 Sampling high resolution images Gaussian model marginal wavelet Mean Covariance Model from Simoncelli 2005 Pair-wise MRF FoE from Schmidt, Gao, Roth CVPR 2010

73 Sampling high resolution images Sampling starting from natural image

74 Sampling high resolution images Sampling starting from natural image

75 Inpainting: guessing the 90% of the piels masked input image

76 Inpainting: guessing the 90% of the piels MAP estimate missing piels

77 Inpainting: guessing the 90% of the piels true image

78 Inpainting: guessing the 90% of the piels masked input image

79 Inpainting: guessing the 90% of the piels MAP estimate missing piels

80 Inpainting: guessing the 90% of the piels true image

81 Inpainting: guessing the 90% of the piels masked input image

82 Inpainting: guessing the 90% of the piels MAP estimate missing piels

83 Inpainting: guessing the 90% of the piels true image

84 Conclusion 3-way Boltzmann Machines Joint model: fast inference, easy to interpret conditionals It generates very realistic samples Training is hard because of partition function Future: applications: segmentation, denoising multi-scale we'll make it DEEPER!!

85 THANK YOU

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition