Lecture 14: Deep Generative Learning

Similar documents
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Models. (Unsupervised Learning)

GENERATIVE ADVERSARIAL LEARNING

Generative Adversarial Networks, and Applications

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

The Success of Deep Generative Models

Wasserstein GAN. Juho Lee. Jan 23, 2017

Unsupervised Learning

Generative adversarial networks

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Generative Adversarial Networks

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,

Nishant Gurnani. GAN Reading Group. April 14th, / 107

UNSUPERVISED LEARNING

Do you like to be successful? Able to see the big picture

A Unified View of Deep Generative Models

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Generative Adversarial Networks

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

arxiv: v1 [cs.lg] 20 Apr 2017

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Variational Inference via Stochastic Backpropagation

Basic Principles of Unsupervised and Unsupervised

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Neural networks and optimization

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS

Understanding GANs: Back to the basics

Normalization Techniques in Training of Deep Neural Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

AClassic Generative Adversarial Net (GAN) [1] learns

CSC321 Lecture 20: Autoencoders

Lecture 16 Deep Neural Generative Models

Variational Autoencoders

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Reading Group on Deep Learning Session 1

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Jakub Hajic Artificial Intelligence Seminar I

Auto-Encoding Variational Bayes

Maxout Networks. Hien Quoc Dang

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Singing Voice Separation using Generative Adversarial Networks

Deep Learning for Computer Vision

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Generative models for missing value completion

Greedy Layer-Wise Training of Deep Networks

Nonparametric Inference for Auto-Encoding Variational Bayes

Deep Learning Autoencoder Models

Theory and Applications of Wasserstein Distance. Yunsheng Bai Mar 2, 2018

Deep Learning Architectures and Algorithms

Deep Learning Lab Course 2017 (Deep Learning Practical)

Understanding How ConvNets See

AdaGAN: Boosting Generative Models

Machine Learning Summer 2018 Exercise Sheet 4

Training Generative Adversarial Networks Via Turing Test

arxiv: v3 [stat.ml] 20 Feb 2018

Deep unsupervised learning

arxiv: v3 [cs.lg] 2 Nov 2018

TUTORIAL PART 1 Unsupervised Learning

Introduction to Convolutional Neural Networks (CNNs)

Deep Learning: a gentle introduction

Feature Design. Feature Design. Feature Design. & Deep Learning

Unsupervised Neural Nets

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Neural Networks and Deep Learning.

Learning Deep Architectures for AI. Part II - Vijay Chakilam

The connection of dropout and Bayesian statistics

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Variational Autoencoders (VAEs)

Introduction to Neural Networks

Deep Generative Models

Adversarial Examples

Chapter 20. Deep Generative Models

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Convolutional Neural Networks

Open Set Learning with Counterfactual Images

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018

PATTERN CLASSIFICATION

STA 414/2104: Lecture 8

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Pattern Recognition and Machine Learning

Regularization in Neural Networks

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Lecture 5: Logistic Regression. Neural Networks

Negative Momentum for Improved Game Dynamics

Composite Functional Gradient Learning of Generative Adversarial Models. Appendix

Semi-Supervised Learning with the Deep Rendering Mixture Model

Deep Learning Architecture for Univariate Time Series Forecasting

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Loss Functions and Optimization. Lecture 3-1

Bayesian Deep Learning

Towards a Data-driven Approach to Exploring Galaxy Evolution via Generative Adversarial Networks

MMD GAN 1 Fisher GAN 2

RegML 2018 Class 8 Deep learning

Transcription:

Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr Sample generation 2 Training examples Generative samples Generative Modeling by Neural Networks Variational Auto Encoder (VAE) Neural networks with continuous latent variables Encoder: approximates a posterior distribution Decoder: stochastically reconstructs the data from the latent variables Generative Adversarial Networks (GANs) Generator: generates samples using a uniform distribution Discriminator: discriminates between real and generated images Deep Boltzmann Machines (DBMs) Deep Belief Networks (DBNs) Why Generative Modeling? Limitations of discriminative neural networks Requirement of large-scale labeled training data Unreasonable mistakes for trivial data variations Poor local optima Generative modeling Adjust model parameters to maximize the probability that the data would have been produced 3 4

Autoencoder Traditional autoencoders Feature learning Learning with the reconstruction error Basic pipeline Autoencoder Traditional autoencoders Feature learning Learning with the reconstruction error Supervised feature learning Input Features Reconstructed input Input Features Encoder! " Decoder! Encoder! " Classifier '!: image ": latent variable! : image!: image ": latent variable ': class label L =!! Fine-tuning by backpropagation 5 6 Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Problem scenario true prior +, ()) true conditional +, (0 )) Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Limitation true prior +, ()) true conditional +, (0 )) ) 0 ) 0 ): classes, attributes 0: image ): classes, attributes 0: image A value ) * is generated from a prior distribution +, ()). A value 0 * is generated from a conditional distribution +, (0 )). It is difficult to estimate +, ()) and +, (0 )) in practice. We need to approximate these distributions. True parameters 2 and the values of the latent variables ) 3 are hidden. 7 8

Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Main idea +, ()) +, (0 )) Decoder ) 0 Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: sample generation Estimating model parameter ( without access to latent variable " Main idea 4 5 () 0) Encoder 0 ) Decoder +, (0 )) 0 ): classes, attributes 0: image 0: image ): classes, attributes 0 : image Assume the prior +, ()) is a unit Gaussian. Assume the likelihood +, (0 )) is a diagonal Gaussian. Decoder estimates mean and variance of +, (0 )). Encoder estimates mean and variance of 4 5 () 0). Decoder estimates mean and variance of +, (0 )). 9 10 Optimization Optimization Variational lower bound of marginal likelihood log +, 0 * = 9 :; ) 0 log +, 0 * = < => 4 5 ) 0 * +, ) 0 * + L (, A; 0 * Maximization of variational lower bound L (, A; 0 = 9 :; ) 0 log 4 5 ) 0 + log +, 0, ) = 9 :; ) 0 log 4 5 ) 0 + log + ) + log +, 0 ) KL divergence of the approximate from the true posterior Variational lower bound on the marginal likelihood = < => 4 5 ) 0 +, ) + 9 :; ) 0 log +, 0 ) Regularization term Reconstruction term log +, 0 * L (, A; 0 * Maximize the variational lower bound of the marginal likelihood. Training Reconstruct term: MSE between input and generated images Regularization term: optimization by reparametrization trick using mean and standard deviation 11 12

Variational Autoencoder Characteristics A Bayesian approach on an autoencoder: generate samples Estimating model parameter ( without access to latent variable " Inference +, ()) ) Decoder +, (0 )) 0 Learned Manifold Visualizations of learned manifold for generative models with 2D latent space ): classes, attributes 0: image Decoder samples mean and variance from +, ()). Decoder samples 0 from the mean and variance. 13 14 Generative Adversarial Networks Optimization 15 Two models A generative model D that captures the data distribution A discriminative model < that estimates the probability that a sample came from the training data rather than D Objective Maximize the probability of < making a mistake Trained with backpropagation Generator ) 0 Random noise Generated image Discriminator Train generator and 0 discriminator jointly! Real image [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014 Real or fake? 16 Objective O <, D 9 0STUVU 0 log < 0; ( G + 9 )S) ) log 1 < D ); ( E ; ( G min K max O <, D N < and D play the two-player minimax game with value function O <, D Simultaneously training D to minimize log 1 < D ); ( E ; ( G + ) ) : prior on input noise variables D ); ( E Representing a mapping from ) to data space A differentiable function represented by a multilayer perceptron with parameters ( E < 0; ( G Representing the probability that 0 came from data rather than generator distribution + E [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014

Saddle point estimation Optimization O <, D 9 0STUVU 0 log < 0; ( G + 9 )S) ) log 1 < D ); ( E ; ( G min K max O <, D N Optimization Algorithm 1 Minibatch stochastic gradient descent training of generative adversarial nets. The number of steps to apply to the discriminator, k, is a hyperparameter. We used k =1, the least expensive option, in our experiments. for number of training iterations do for k steps do Sample minibatch of m noise samples {z (1),...,z (m) } from noise prior p g (z). Sample minibatch of m examples {x (1),...,x (m) } from data generating distribution p data (x). Update the discriminator by ascending its stochastic gradient: r d 1 m mx i=1 h log D x (i) + log 1 D G z (i) i. end for Sample minibatch of m noise samples {z (1),...,z (m) } from noise prior p g (z). Update the generator by descending its stochastic gradient: r g 1 m mx i=1 log 1 D G z (i). end for The gradient-based updates can use any standard gradient-based learning rule. We used momentum in our experiments. 17 [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014 18 [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014 Generated Examples Laplacian Generative Adversarial Networks Generating high-resolution images progressively WX Y = Z WX Y[\ + h^y = Z WX Y[\ + D Y " Y, Z WX Y[\ Residual image Upsampling Generative model Nearest neighbors from training set I 0 64x64 h 0 G 0 z 0 l 0 I 1 h 1 G 1 z 1 l 1 I 2 l 2 I 3 h 2 G 2 G 3 z 2 z 3 19 [Goodfellow14] I. J. Goodfellow, et al.: Generative Adversarial Nets, NIPS 2014 [Denton15] E. Denton, S. Chintalal, A. Szlam, R. Fergus: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS 2015 20

Laplacian Generative Adversarial Networks Results Training: the opposite direction Model at each level is trained independently. Blur and downsample Upsample I 2 I 2 I 3 I 3 I = I 0 I 1 I 1 z 0 z 1 z 2 I 3 l 2 D 3 Real/ Generated? h 0 l 0 h 0 G 0 h 1 l 1 D 1 h 1 G 1 Real/Generated? D 2 G 2 G 3 h 2 h 2 z 3 Real/ Generated? CC-LAPGAN: Truck LAPGAN GAN [14] D 0 Real/Generated? [Denton15] E. Denton, S. Chintalal, A. Szlam, R. Fergus: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS 2015 21 Nearest neighbors from training set [Denton15] E. Denton, S. Chintalal, A. Szlam, R. Fergus: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS 2015 22 DCGAN Representation learning by GAN Unsupervised representation learning Learning from scratch First CNN-based GAN model with good performance Apply to classification Combine discriminator and any classifier for prediction Good performance even without fine tuning Generator Discriminator DCGAN Architecture guidelines for stable DCGANs Replace pooling layers with strided convolutions (in discriminator) and fractional-strided convolutions (in generator). Use batch normalization in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU in generator for all layers except for the output, which uses tanh. Use LeakyReLU activation in the discriminator for all layers. Real or fake? [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2016 23 [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2016 24

Results Interpolation in Latent Space On CIFAR-10 using trained model with ImageNet-1K Model 1 Layer K-means 3 Layer K-means Learned RF View Invariant K-means Exemplar CNN DCGAN (ours) + L2-SVM Accuracy 80.6% 82.0% 81.9% 84.3% 82.8% Accuracy (400 per class) 63.7% (±0.7%) 70.7% (±0.7%) 72.6% (±0.7%) 77.4% (±0.2%) 73.8% (±0.4%) max # of features units 4800 3200 6400 1024 512 On SVHN with 1000 labels Model KNN TSVM M1+KNN M1+TSVM M1+M2 SWWAE without dropout SWWAE with dropout DCGAN (ours) + L2-SVM Supervised CNN with the same architecture error rate 77.93% 66.55% 65.63% 54.33% 36.02% 27.83% 23.56% 22.48% 28.87% (validation) [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2016 [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2016 Vector Arithmetic for Visual Concepts CycleGAN 25 26 Image-to-image translation Converting an image from one representation to another e.g., gray to color, image to semantic labels, edge map to photograph Generative adversarial network Cycle consistency Using transitivity as a way to regularize structured data [Radford16] A. Radford, L. Metz, S. Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2016 27 Monet photo zebra horse summer photo Monet horse zebra winter winter summer [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017 28

Unpaired Translation There is some underlying relationship between domains. Seeking the relationship without explicit mapping Learning to translate between domains without input-output pairs Using supervision in the level of sets Paired { x i y i },, { } { }, Unpaired { X } { Y } [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017 29, Use bidirectional GAN and Introduce cycle consistency loss! min K,j cycle-consistency loss max L D, f, < g, < b N k,n l X { Formulation by GAN L _`a D, < b, c, d + L _`a f, < g, d, c + L hih D, f L _`a D, < b, c, d = 9 estuvu e log < b e + 9 0STUVU 0 log 1 < b D 0 L _`a f, < g, d, c = 9 0STUVU 0 log < g 0 + 9 estuvu e log 1 < g D e Y D X X G F [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017 30 D Y Y X { Y cycle-consistency loss L hih D, f = 9 0STUVU 0 f D 0 0 \ + 9 estuvu e D f e e \ Results of CycleGAN Wasserstein GAN Input Cycle alone GAN alone GAN+forward GAN+backward CycleGAN (ours) Ground truth Original GAN Discriminator minimize Jensen-Shannon (JS) divergence between the real data distribution and the generated distribution mn + o, +, = 1 2 qr(+ o + s + 1 2 qr(+, + s winter Yosemite summer Yosemite = 1 2 qr + \ + o + +, 2 + 1 2 qr +, + o + +, 2 + o : real data distribution, +, : generated distribution Not well suited for optimization: ill-defined gradients in all situations Wasserstein GAN Discriminator minimizes Earth Mover s distance (EMD) t + o, +, = inf 9 {, v! ' v x S y,s z summer Yosemite winter Yosemite [Zhu17] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017 31 Π + o, +, : a set of joint distribution of + o and +, Gradients are much more well-defined. [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML 2017 32

Wasserstein GAN EMD Optimal transport cost from one distribution to another Jensen-Shannon divergence vs earth mover s distance 1-Lipschitz constraint Wasserstein GAN t + o, +, = inf v x S y,s z 9 {, v! ' t + o, +, ã-lipschitz constraint = sup Å ÇÉÑ 9 {Sy Ö! 9 {Sz Ö! Kantorovich-Rubinstein Duality min, max Ü áy,y à 9 {S y Ö Ü! 9 {Sz Ö Ü (â, " ) Generator 0 Random noise Fake image Critic network Ö Ü Ö Ü! or Ö Ü â, " 0 ä JS EMD [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML 2017 33 Real image [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML 2017 34 WGAN VAE vs. GAN WGAN vs. GAN Generator: MLP with 4 hidden layers Discriminator: DCGAN Variational Autoencoder (VAE) Generative Adversarial Network (GAN) [Arjovsky17] M. Arjovsky, S. Chintala, L. Bottou: Wasserstein Generative Adversarial Networks. ICML 2017 35 36

VAE vs. GAN VAE GAN Pros Stable training Clear evaluation metrics High quality generation Less overfitting problem Cons Crude generation quality Overfitting Slow training Mode collapsing Unstable training No good evaluation metrics 37 38