Lecture 14: Deep Generative Learning

Size: px

Start display at page:

Download "Lecture 14: Deep Generative Learning"

Hillary Phillips
5 years ago
Views:

Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung

kr Sample generation 2 Training examples Generative samples Generative Modeling by Neural Networks Variational Auto Encoder (VAE) Neural networks with continuous latent variables Encoder:

1 Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han Computer Vision Lab. Sample generation 2 Training examples Generative samples Generative Modeling by Neural Networks Variational Auto Encoder (VAE) Neural networks with continuous latent variables Encoder: approximates a posterior distribution Decoder: stochastically reconstructs the data from the latent variables Generative Adversarial Networks (GANs) Generator: generates samples using a uniform distribution Discriminator: discriminates between real and generated images Deep Boltzmann Machines (DBMs) Deep Belief Networks (DBNs) Why Generative Modeling? Limitations of discriminative neural networks Requirement of large-scale labeled training data Unreasonable mistakes for trivial data variations Poor local optima Generative modeling Adjust model parameters to maximize the probability that the data would have been produced 3 4

2 Autoencoder Traditional autoencoders Feature learning Learning with the reconstruction error Basic pipeline Autoencoder Traditional autoencoders Feature learning Learning with the reconstruction error Supervised feature learning Input Features Reconstructed input Input Features Encoder! " Decoder! Encoder! " Classifier '!: image ": latent variable! : image!: image ": latent variable ': class label L =!! Fine-tuning by backpropagation 5 6 Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Problem scenario true prior +, ()) true conditional +, (0 )) Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Limitation true prior +, ()) true conditional +, (0 )) ) 0 ) 0 ): classes, attributes 0: image ): classes, attributes 0: image A value ) * is generated from a prior distribution +, ()). A value 0 * is generated from a conditional distribution +, (0 )). It is difficult to estimate +, ()) and +, (0 )) in practice. We need to approximate these distributions. True parameters 2 and the values of the latent variables ) 3 are hidden. 7 8

3 Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Main idea +, ()) +, (0 )) Decoder ) 0 Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Main idea 4 5 () 0) Encoder 0 ) Decoder +, (0 )) 0 ): classes, attributes 0: image 0: image ): classes, attributes 0 : image Assume the prior +, ()) is a unit Gaussian. Assume the likelihood +, (0 )) is a diagonal Gaussian. Decoder estimates mean and variance of +, (0 )). Encoder estimates mean and variance of 4 5 () 0). Decoder estimates mean and variance of +, (0 )) Optimization Optimization Variational lower bound of marginal likelihood log +, 0 * = 9 :; ) 0 log +, 0 * = < => 4 5 ) 0 * +, ) 0 * + L (, A; 0 * Maximization of variational lower bound L (, A; 0 = 9 :; ) 0 log 4 5 ) 0 + log +, 0, ) = 9 :; ) 0 log 4 5 ) 0 + log + ) + log +, 0 ) KL divergence of the approximate from the true posterior Variational lower bound on the marginal likelihood = < => 4 5 ) 0 +, ) + 9 :; ) 0 log +, 0 ) Regularization term Reconstruction term log +, 0 * L (, A; 0 * Maximize the variational lower bound of the marginal likelihood. Training Reconstruct term: MSE between input and generated images Regularization term: optimization by reparametrization trick using mean and standard deviation 11 12

Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: generate samples Estimating model parameter ( without access to latent variable " Inference +, ()) ) Decoder +, (0 )) 0

Decoder samples 0 from the mean and variance.

4 Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: generate samples Estimating model parameter ( without access to latent variable " Inference +, ()) ) Decoder +, (0 )) 0 Learned Manifold Visualizations of learned manifold for generative models with 2D latent space ): classes, attributes 0: image Decoder samples mean and variance from +, ()). Decoder samples 0 from the mean and variance Generative Adversarial Networks Optimization 15 Two models A generative model D that captures the data distribution A discriminative model < that estimates the probability that a sample came from the training data rather than D Objective Maximize the probability of < making a mistake Trained with backpropagation Generator ) 0 Random noise Generated image Discriminator Train generator and 0 discriminator jointly! Real image [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014 Real or fake? 16 Objective O <, D 9 0STUVU 0 log < 0; ( G + 9 )S) ) log 1 < D ); ( E ; ( G min K max O <, D N < and D play the two-player minimax game with value function O <, D Simultaneously training D to minimize log 1 < D ); ( E ; ( G + ) ) : prior on input noise variables D ); ( E Representing a mapping from ) to data space A differentiable function represented by a multilayer perceptron with parameters ( E < 0; ( G Representing the probability that 0 came from data rather than generator distribution + E [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014

Saddle point estimation Optimization O <, D 9 0STUVU 0 log < 0; ( G + 9 )S) ) log 1 < D ); ( E ; ( G min K max O <, D N Optimization Algorithm

The number of steps to apply to the discriminator, k, is a hyperparameter. We used k =1, the least expensive option, in our experiments.

Sample minibatch of m examples {x (1),...,x (m) } from data generating distribution p data (x).

end for Sample minibatch of m noise samples {z (1),...,z (m) } from noise prior p g (z).

end for The gradient-based updates can use any standard gradient-based learning rule. We used momentum in our experiments. 17 [Goodfellow14] I.

: Generative Adversarial Nets, NIPS 2014 18 [Goodfellow14] I.

5 Saddle point estimation Optimization O <, D 9 0STUVU 0 log < 0; ( G + 9 )S) ) log 1 < D ); ( E ; ( G min K max O <, D N Optimization Algorithm 1 Minibatch stochastic gradient descent training of generative adversarial nets. The number of steps to apply to the discriminator, k, is a hyperparameter. We used k =1, the least expensive option, in our experiments. for number of training iterations do for k steps do Sample minibatch of m noise samples {z (1),...,z (m) } from noise prior p g (z). Sample minibatch of m examples {x (1),...,x (m) } from data generating distribution p data (x). Update the discriminator by ascending its stochastic gradient: r d 1 m mx i=1 h log D x (i) + log 1 D G z (i) i. end for Sample minibatch of m noise samples {z (1),...,z (m) } from noise prior p g (z). Update the generator by descending its stochastic gradient: r g 1 m mx i=1 log 1 D G z (i). end for The gradient-based updates can use any standard gradient-based learning rule. We used momentum in our experiments. 17 [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014 Generated Examples Laplacian Generative Adversarial Networks Generating high-resolution images progressively WX Y = Z WX Y[\ + h^y = Z WX Y[\ + D Y " Y, Z WX Y[\ Residual image Upsampling Generative model Nearest neighbors from training set I 0 64x64 h 0 G 0 z 0 l 0 I 1 h 1 G 1 z 1 l 1 I 2 l 2 I 3 h 2 G 2 G 3 z 2 z 3 19 [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014 [Denton15] E. Denton, S. Chintalal, A. Szlam, R. Fergus: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS

Laplacian Generative Adversarial Networks Results Training: the opposite direction Model at each level is trained independently.

CC-LAPGAN: Truck LAPGAN GAN [14] D 0 Real/Generated? [Denton15] E. Denton, S. Chintalal, A. Szlam, R.

Fergus: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS 2015 22 DCGAN Representation learning by GAN Unsupervised representation learning Learning from scratch

$Architecture guidelines for stable DCGANs Replace pooling layers with strided convolutions (in discriminator) and fractional-strided convolutions (in generator).$

6 Laplacian Generative Adversarial Networks Results Training: the opposite direction Model at each level is trained independently. Blur and downsample Upsample I 2 I 2 I 3 I 3 I = I 0 I 1 I 1 z 0 z 1 z 2 I 3 l 2 D 3 Real/ Generated? h 0 l 0 h 0 G 0 h 1 l 1 D 1 h 1 G 1 Real/Generated? D 2 G 2 G 3 h 2 h 2 z 3 Real/ Generated? CC-LAPGAN: Truck LAPGAN GAN [14] D 0 Real/Generated? [Denton15] E. Denton, S. Chintalal, A. Szlam, R. Fergus: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS Nearest neighbors from training set [Denton15] E. Denton, S. Chintalal, A. Szlam, R. Fergus: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS DCGAN Representation learning by GAN Unsupervised representation learning Learning from scratch First CNN-based GAN model with good performance Apply to classification Combine discriminator and any classifier for prediction Good performance even without fine tuning Generator Discriminator DCGAN Architecture guidelines for stable DCGANs Replace pooling layers with strided convolutions (in discriminator) and fractional-strided convolutions (in generator). Use batch normalization in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU in generator for all layers except for the output, which uses tanh. Use LeakyReLU activation in the discriminator for all layers. Real or fake? [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR

Results Interpolation in Latent Space On CIFAR-10 using trained model with ImageNet-1K Model 1 Layer K-means 3 Layer K-means Learned RF View Invariant K-means Exemplar CNN DCGAN (ours) + L2-SVM

4%) max # of features units 4800 3200 6400 1024 512 On SVHN with 1000 labels Model KNN TSVM M1+KNN M1+TSVM M1+M2 SWWAE without dropout SWWAE with dropout DCGAN (ours) + L2-SVM Supervised CNN with the

7 Results Interpolation in Latent Space On CIFAR-10 using trained model with ImageNet-1K Model 1 Layer K-means 3 Layer K-means Learned RF View Invariant K-means Exemplar CNN DCGAN (ours) + L2-SVM Accuracy 80.6% 82.0% 81.9% 84.3% 82.8% Accuracy (400 per class) 63.7% (±0.7%) 70.7% (±0.7%) 72.6% (±0.7%) 77.4% (±0.2%) 73.8% (±0.4%) max # of features units On SVHN with 1000 labels Model KNN TSVM M1+KNN M1+TSVM M1+M2 SWWAE without dropout SWWAE with dropout DCGAN (ours) + L2-SVM Supervised CNN with the same architecture error rate 77.93% 66.55% 65.63% 54.33% 36.02% 27.83% 23.56% 22.48% 28.87% (validation) [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2016 [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2016 Vector Arithmetic for Visual Concepts CycleGAN Image-to-image translation Converting an image from one representation to another e.g., gray to color, image to semantic labels, edge map to photograph Generative adversarial network Cycle consistency Using transitivity as a way to regularize structured data [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR Monet photo zebra horse summer photo Monet horse zebra winter winter summer [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV

Unpaired Translation There is some underlying relationship between domains.

{ Y } [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

$min K,j cycle-consistency loss max L D, f, < g, < b N k,n l X { Formulation by GAN L _`a D, < b, c, d + L _`a f, < g, d, c + L hih D, f L _`a D, < b, c, d = 9 estuvu e log < b e + 9 0STUVU 0 log 1 <$ $ICCV 2017 30 D Y Y X { Y cycle-consistency loss L hih D, f = 9 0STUVU 0 f D 0 0 \ + 9 estuvu e D f e e \ Results of CycleGAN Wasserstein GAN Input Cycle alone GAN alone GAN+forward GAN+backward$

8 Unpaired Translation There is some underlying relationship between domains. Seeking the relationship without explicit mapping Learning to translate between domains without input-output pairs Using supervision in the level of sets Paired { x i y i },, { } { }, Unpaired { X } { Y } [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV , Use bidirectional GAN and Introduce cycle consistency loss! min K,j cycle-consistency loss max L D, f, < g, < b N k,n l X { Formulation by GAN L _`a D, < b, c, d + L _`a f, < g, d, c + L hih D, f L _`a D, < b, c, d = 9 estuvu e log < b e + 9 0STUVU 0 log 1 < b D 0 L _`a f, < g, d, c = 9 0STUVU 0 log < g estuvu e log 1 < g D e Y D X X G F [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV D Y Y X { Y cycle-consistency loss L hih D, f = 9 0STUVU 0 f D 0 0 \ + 9 estuvu e D f e e \ Results of CycleGAN Wasserstein GAN Input Cycle alone GAN alone GAN+forward GAN+backward CycleGAN (ours) Ground truth Original GAN Discriminator minimize Jensen-Shannon (JS) divergence between the real data distribution and the generated distribution mn + o, +, = 1 2 qr(+ o + s qr(+, + s winter Yosemite summer Yosemite = 1 2 qr + \ + o + +, qr +, + o + +, 2 + o : real data distribution, +, : generated distribution Not well suited for optimization: ill-defined gradients in all situations Wasserstein GAN Discriminator minimizes Earth Mover s distance (EMD) t + o, +, = inf 9 {, v! ' v x S y,s z summer Yosemite winter Yosemite [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV Π + o, +, : a set of joint distribution of + o and +, Gradients are much more well-defined. [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML

Wasserstein GAN EMD Optimal transport cost

Jensen-Shannon divergence vs earth mover s

Discriminator: DCGAN Variational Autoencoder

9 Wasserstein GAN EMD Optimal transport cost from one distribution to another Jensen-Shannon divergence vs earth mover s distance 1-Lipschitz constraint Wasserstein GAN t + o, +, = inf v x S y,s z 9 {, v! ' t + o, +, ã-lipschitz constraint = sup Å ÇÉÑ 9 {Sy Ö! 9 {Sz Ö! Kantorovich-Rubinstein Duality min, max Ü áy,y à 9 {S y Ö Ü! 9 {Sz Ö Ü (â, " ) Generator 0 Random noise Fake image Critic network Ö Ü Ö Ü! or Ö Ü â, " 0 ä JS EMD [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML Real image [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML WGAN VAE vs. GAN WGAN vs. GAN Generator: MLP with 4 hidden layers Discriminator: DCGAN Variational Autoencoder (VAE) Generative Adversarial Network (GAN) [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML

10 VAE vs. GAN VAE GAN Pros Stable training Clear evaluation metrics High quality generation Less overfitting problem Cons Crude generation quality Overfitting Slow training Mode collapsing Unstable training No good evaluation metrics 37 38

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Emily Denton 1, Soumith Chintala 2, Arthur Szlam 2, Rob Fergus 2 1 New York University 2 Facebook AI Research Denotes equal