A Unified View of Deep Generative Models

Size: px

Start display at page:

Download "A Unified View of Deep Generative Models"

Howard Harvey
5 years ago
Views:

1 SAILING LAB Laboratory for Statistical Artificial InteLigence & INtegreative Genomics A Unified View of Deep Generative Models Zhiting Hu and Eric Xing Petuum Inc. Carnegie Mellon University 1

2 Deep generative models 2

3 Deep generative models Define probabilistic distributions over a set of variables "Deep" means multiple layers of hidden variables! # $... # % &

4 Early forms of deep generative models Hierarchical Bayesian models Sigmoid brief nets [Neal 1992]! " ($) = 0,1 * 8 79! " (+) = 0,1, - " = 0,1. / 0 1" = 1 2 1,! " (+) (+) ($) / 6 7" = 1 27,! " = ! " (+) = ! " $

5 Early forms of deep generative models Hierarchical Bayesian models Sigmoid brief nets [Neal 1992] Neural network models Helmholtz machines [Dayan et al.,1995]! "! # inference weights $ [Dayan et al. 1995]

6 Early forms of deep generative models Hierarchical Bayesian models Sigmoid brief nets [Neal 1992] Neural network models Helmholtz machines [Dayan et al.,1995] Predictability minimization [Schmidhuber 1995] DATA Figure courtesy: Schmidhuber 1996

7 Early forms of deep generative models Training of DGMs via an EM style framework Sampling / data augmentation + = +?, + ABC? ~% +? & + ~% + ABC?, & Variational inference log % & E )* + & log %, &, + KL(2 3 + & %(+)) L(8, 9; &) max 8,9 L(8, 9; &) Wake sleep Wake: min, G )* (H I) log %, J K Sleep: min 3 G LM (I H) log 2 3 K J

8 Resurgence of deep generative models Variational autoencoders (VAEs) [Kingma & Welling, 2014] / Neural Variational Inference and Learning (NVIL) [Mnih & Gregor, 2014] ( ) (& $) inference model! " ($ &) generative model Figure courtesy: Kingma & Welling, 2014

9 Resurgence of deep generative models Variational autoencoders (VAEs) [Kingma & Welling, 2014] / Neural Variational Inference and Learning (NVIL) [Mnih & Gregor, 2014] Generative adversarial networks (GANs)! " : generative model # $ : discriminator

10 Outline Theoretical Basis of deep generative models Wake sleep algorithm Variational autoencoders Generative adversarial networks A unified view of deep generative models New formulations of deep generative models Symmetric modeling of latent and visible variables

11 Synonyms in the literature Posterior Distribution -> Inference model Variational approximation Recognition model Inference network (if parameterized as neural networks) Recognition network (if parameterized as neural networks) (Probabilistic) encoder "The Model" (prior + conditional, or joint) -> Generative model The (data) likelihood model Generative network (if parameterized as neural networks) Generator (Probabilistic) decoder 11

12 Recap: Variational Inference Consider a generative model! " # %, and prior! % Joint distribution:! " #, % =! " # %! % Assume variational distribution ( ) % # Objective: Maximize lower bound for log likelihood log! # =./ ( ) % #! 0 % # + 2 ( ) % # log! " #, % % ( ) % # 2 ( ) % # log! " #, % % ( ) % # L(0, 7; #) Equivalently, minimize free energy : 0, ;; # = log! # +./(( ) % #! 0 (% #))

13 Recap: Variational Inference Maximize the variational lower bound L(#, %; ') E-step: maximize L wrt. * with # fixed max. L #, %; ' = 0 12 (3 5) log 9 : ; < + >?(@. < ; 9(<)) If with closed form (< ;) exp[log 9 : (;, <)] M-step: maximize L wrt. # with * fixed max : L #, %; ' = 0 12 < ; log 9 : ; < + >?(@. < ; 9(<)) 13

14 Wake Sleep Algorithm [Hinton et al., Science 1995] Train a separate inference model along with the generative model Generally applicable to a wide range of generative models, e.g., Helmholtz machines Consider a generative model! " # $ and prior! $ Joint distribution! " #, $ =! " # $! $ E.g., multi-layer brief nets Inference model ' ( $ # Maximize data log-likelihood with two steps of loss relaxation: Maximize the lower bound of log-likelihood, or equivalently, minimize the free energy +,, -; # = log! # + 45(' ( $ #!, ($ #)) Minimize a different objective (reversed KLD) wrt * to ease the optimization Disconnect to the original variational lower bound loss +,, -; # = log! # + 45(! " $ # ' ( ($ #))

15 Wake Sleep Algorithm Free energy:.!, 0; - = log # (% 0, - #! (, -)) Minimize the free energy wrt.! of # $ à wake phase - R 2 R 1 max! E >? (, -) log #! (-,,) Get samples from % & (( *) through inference on hidden variables Use the samples as targets for updating the generative model # $ (, -) Correspond to the variational M step [Figure courtesy: Maei s slides]

16 Wake Sleep Algorithm Free energy: # D,?; B = log 8 B + KL(<? A B 8 D (A B)) Minimize the free energy wrt. & of < " A B Correspond to the variational E step Difficulties: <? A B = C D(A, B) Optimal C D A, B EA intractable High variance of direct gradient estimate! " # $, &; ( = +! ", -. (0 2) log 8 9 (:, () + Gradient estimate with the log-derivative trick:! ", -. log 8 9 =! " < " log 8 9 = < " log 8 9! " log < " =, -. [log 8 9! " log < " ] Monte Carlo estimation:! ", -. log 8 9, 0G -. [log 8 9 ((, : I )! " < " : I ( ] The scale factor log 8 9 of the derivative! " log < " can have arbitrary large magnitude

17 Wake Sleep Algorithm R 2 G 2 # Free energy: % ', ); # = log! # + 23(6 ) $ #! ' ($ #)) WS works around the difficulties with the sleep phase approximation Minimize the following objective à sleep phase % ', ); # = log! # + 23(! " $ # 6 7 ($ #)) max ) E => ($,#) log 6 7 $ # Dreaming up samples from! " # $ through top-down pass Use the samples as targets for updating the inference model (Recent approaches other than sleep phase is to reduce the variance of gradient estimate: slides later) [Figure courtesy: Maei s slides] R 1 G 1

18 Wake Sleep Algorithm Wake sleep Parametrized inference model! " # $ Wake phase: minimize &'(! " # $ * + (# $)) wrt. - E /0 (# $) 1 2 log * + $ # Sleep phase: minimize &'(* 2 # $! 6 (# $)) wrt. 7 E 89 (#,$) 1 6 log! 6 (#, $) low variance Learning with generated samples of $ Two objective, not guaranteed to converge Variational EM Variational distribution! 6 # $ Variational M step: minimize &'(! 6 # $ * + (# $)) wrt. - E /" (# $) 1 2 log * + $ # Variational E step: minimize &'(! 6 # $ * + (# $)) wrt. 7! 6 exp[log * 2 ] if with closed-form 1 6 B /0 log * 2 (C, D) need variance-reduce in practice Learning with real data $ Single objective, guaranteed to converge

19 Variational Autoencoders (VAEs) [Kingma & Welling, 2014] Use variational inference with an inference model Enjoy similar applicability with wake-sleep algorithm Generative model! " # $, and prior!($) Joint distribution! " #, $ =! " # $! $ Inference model ) * $ # ) * ($ #) inference model! " (# $) generative model Figure courtesy: Kingma & Welling, 2014

20 Variational Autoencoders (VAEs) Variational lower bound L #, %; ' = E 34 - ' log +, ', - KL(/ 0 - ' +(-)) Optimize L(#, %; ') wrt. * of +, ' - The same with the wake phase Optimize L(#, %; ') wrt.. of / 0 - ' < 0 L *,.; = = + < 34 (A B) log +, = C + Use reparameterization trick to reduce variance Alternatives: use control variates as in reinforcement learning [Mnih & Gregor, 2014; Paisley et al., 2012]

21 Reparametrized gradient Optimize L ", $; & wrt. ' of ( ) * & Recap: gradient estimate with log-derivative trick: 6 ) H CD log / 0 &, * = H CD [log / 0 &, * 6 ) log ( ) ] High variance: 6 ) H CD log / 0 H LM C D [log / 0 (2, 3 4 ) 6 ) ( ) ] The scale factor log / 0 (2, 3 4 ) of the derivative 6 ) log ( ) can have arbitrary large magnitude gradient estimate with reparameterization trick * ( ) * & G = g ) =, &, = /(=) 6 ) E CD * & log / 0 &, * = E = E(F) 6 ) log / 0 &, * ) = (Empirically) lower variance of the gradient estimate E.g., * 8 9 &, : & : & ; = 8 0,1, * = 9 & + :(&)=

VAEs: example results VAEs tend to generate blurred images due to the mode covering behavior (more later) Latent code interpolation and sentences generation from VAEs [Bowman

22 VAEs: example results VAEs tend to generate blurred images due to the mode covering behavior (more later) Latent code interpolation and sentences generation from VAEs [Bowman et al., 2015]. iwanttotalktoyou. i want to be with you. i do n t want to be with you. idon twanttobewithyou. she did n t want to be with him. Celebrity faces [Radford 2015] 22

23 Generative Adversarial Nets (GANs) [Goodfellow et al., 2014] Generative model! = # $ %, % )(%) Map noise variable % to data space! Define an implicit distribution over!: ) -. (!) a stochastic process to simulate data! Intractable to evaluate likelihood Discriminator / 0! Output the probability that! came from the data rather than the generator No explicit inference model No obvious connection to previous models with inference networks like VAEs We will build formal connections between GANs and VAEs later

24 Generative Adversarial Nets (GANs) Learning A minimax game between the generator and the discriminator Train! to maximize the probability of assigning the correct label to both training examples and generated samples Train " to fool the discriminator max D L D = E x pdata (x) [log D(x)] + E x G(z),z p(z) [log(1 min G L G = E x G(z),z p(z) [log(1 D(x))]. D(x))] [Figure courtesy: Kim s slides]

25 GANs: example results Generated bedrooms [Radford et al., 2016] 25

26 The Zoo of DGMs Variational autoencoders (VAEs) [Kingma & Welling, 2014] Adversarial autoencoder [Makhzani et al., 2015] Importance weighted autoencoder [Burda et al., 2015] Implicit variational autoencoder [Mescheder., 2017] Generative adversarial networks (GANs) [Goodfellos et al., 2014] InfoGAN [Chen et al., 2016] CycleGAN [Zhu et al., 2017] Wasserstein GAN [Arjovsky et al., 2017] Autoregressive neural networks PixelRNN / PixelCNN [Oord et al., 2016] RNN (e.g., for language modeling) Generative moment matching networks (GMMNs) [Li et al., 2015; Dziugaite et al., 2015] Retricted Boltzmann Machines (RBMs) [Smolensky, 1986]

27 Alchemy Vs Chemistry 27

28 Outline Theoretical backgrounds of deep generative models Wake sleep algorithm Variational autoencoders Generative adversarial networks A unified view of deep generative models New formulations of deep generative models Symmetric modeling of latent and visible variables Z Hu, Z YANG, R Salakhutdinov, E Xing, On Unifying Deep Generative Models, arxiv

29 Generative Adversarial Nets (GANs): Implicit distribution over! # $ (! ') p (x y) = pg (x) y =0 p data (x) y =1. (distribution of generated images) (distribution of real images)! # )*!! = - $.,. #. ' = 0! # 1232! the code space of. is degenerated sample directly from data

30 A new formulation Rewrite GAN objectives in the variational-em format Recap: conventional formulation: max L = E x=g (z),z p(z y=0) [log(1 D (x))] + E x pdata (x) [log D (x)] max L = E x=g (z),z p(z y=0) [log D (x)] + E x pdata (x) [log(1 = E x=g (z),z p(z y=0) [log D (x)] Rewrite in the new form Implicit distribution over! # $ (! ')! = * $ +, + # + ' Discriminator distribution -. ('!) -. / '! = -. (1 '!) max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) D (x))]

31 GANs vs. Variational EM Variational EM Objectives max & L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / max, L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / Single objective for both - and. Extra prior regularization by +(/) The reconstruction term: maximize the conditional log-likelihood of ) with the generative distribution +, () /) conditioning on the latent code / inferred by % & (/ )) GAN Objectives max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) Two objectives Have global optimal state in the game theoretic view The objectives: maximize the conditional log-likelihood of " (or 1 ") with the distribution % & (" )) conditioning on data/generation ) inferred by +, () ") +, () /) is the generative model % & (/ )) is the inference model Interpret % & (" )) as the generative model Interpret +, () ") as the inference model

32 GANs vs. Variational EM Variational EM Objectives max & L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / max, L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / Single objective for both - and. Extra prior regularization by +(/) The reconstruction term: maximize the conditional log-likelihood of ) with the generative distribution +, () /) conditioning on the latent code / inferred by % & (/ )) GAN Interpret C as latent variables Interpret generation of C as performing inference over latent Objectives max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) Two objectives Have global optimal state in the game theoretic view The objectives: maximize the conditional log-likelihood of " (or 1 ") with the distribution % & (" )) conditioning on data/generation ) inferred by +, () ") +, () /) is the generative model % & (/ )) is the inference model Interpret % & (" )) as the generative model Interpret +, () ") as the inference model

33 GANs: minimizing KLD As in Variational EM, we can further rewrite in the form of minimizing KLD to reveal more insights into the optimization problem For each optimization step of! " ($ &) at point ( = ( *,, =, *, let! & : uniform prior distribution! "-". $ = E 0(1)! "-". $ & $ & & $! * "-". ($) Lemma 1: The updates of 6 at 6 * have h r E p (x y)p(y) log q r = 0 (y x) i = = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) KL: KL divergence JSD: Jensen-shannon k k divergence = 0,

34 h i GANs: minimizing KLD Lemma 1: The updates of! at! " have h r E p (x y)p(y) log q r = 0 (y x) i = = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) Connection to variational inference See # as latent variables, $ as visible % &'&( # : prior distribution ) * * # $ ),', $ # % " &'& ( (#) : posterior distribution % & (# $): variational distribution Amortized inference: updates model parameter! Suggests relations to VAEs, as we will explore shortly = 0,

35 h i $ GANs: minimizing KLD! "7"# $ % = 1 =! ()*) ($)! "7"# $ % = 0 =!./8/# ($) 0 1 ($ % Lemma = 0) 1: The updates of! at! h " have r E p (x y)p(y) log q r = 0 (y x) i = = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) $! "7" 345 $ % = 0 =!./8/ 345 ($) = 0, Minimizing the KLD drives # $% (') to # )*+* (') By definition: #,-,. ' = E 1(2) #,-,. ' 3 = # $%4%. ' + # )*+* ' / 2 KL #, ; 3 = 1 >? ; 3 = 1 = KL # )*+* (;) >? ; 3 = 1 : constant, no free parameters KL #, ; 3 = 0 >? ; 3 = 0 = KL # $% (;) >? ; 3 = 0 : parameter A to optimize >?? ' 3 = 0 > 3 = 0 ' # C-C ",-,. ' seen as a mixture of # (') and # $%4% )*+* '.? mixing weights induced from > 3 = 0 ' C-C " Drives # $% ' 3 to mixture of # (') and # $%4% )*+*('). Drives # $% ' to # )*+* (')

36 $ GANs: minimizing KLD! "7"# $ % = 1 =! ()*) ($)! "7"# $ % = 0 =!./8/# ($) 0 1 ($ % = 0) h Lemma r 1 i E p (x y)p(y) log q r (y x) = 0 = missed 0 mode i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) $! "7" 345 $ % = 0 =!./8/ 345 ($) = 0, Missing mode phenomena of GANs Asymmetry of KLD Concentrates! " # $ = 0 to large modes of ' ( # $! *+ # misses modes of!,-.- (#) Symmetry of JSD Does not affect the behavior of mode missing KL! *+ (3) ' ( 3 $ = 0! *+ 3 = 5! *+ 3 log ' ( 3 $ = 0 :3 Large positive contribution to the KLD in the regions of 3 space where ' ( 3 $ = 0 is small, unless! *+ 3 is also small! *+ 3 tends to avoid regions where ' ( 3 $ = 0 is small

37 GANs: minimizing KLD Lemma 1: The updates of! at! " have h i r E p (x y)p(y) log q r (y x) = 0 = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) No assumption on optimal discriminator # $ % " & ' = 0, Previous results usually rely on (near) optimal discriminator # & = 1 ' = +,-.- ' /(+,-.- ' (')) Optimality assumption is impractical: limited expressiveness of 4 $ [Arora et al 2017] Our result is a generalization of the previous theorem [Arjovsky & Bottou 2017] Plug the optimal discriminator into the above equation, we recover the theorem h i apple r E p (x y)p(y) log q r 1 (y x) = r 0 = 0 2 KL (p g kp data ) JSD (p g kp data ) = 0, which gives simplified explanations of the training dynamics and the missing mode issue only Give insights on the generator training when discriminator is optimal

38 GANs: minimizing KLD In summary: Reveal connection to variational inference Build connections to VAEs (slides soon) Inspire new model variants based on the connections Offer insights into the generator training Formal explanation of the missing mode behavior of GANs Still hold when the discriminator does not achieve its optimum at each iteration

39 GANs vs InfoGAN! " max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) max L = E p (x y)p(y) [log q (z x,y)q (y x)] max, L, = E p (x y)p(y) log q (z x,y)q r (y x)

40 Relates VAEs with GANs Resemblance of GAN generator learning to variational inference Suggest strong relations between VAEs and GANs Indeed, VAEs are basically minimizing KLD with an opposite direction, and with a degenerated adversarial discriminator swap the generation (solid-line) and inference (dashed-line) processes of InfoGAN degenerated discriminator InfoGAN VAEs

41 GANs vs VAEs side by side GANs (InfoGAN) VAEs Generative distribution p (x y) = pg (x) y =0 p data (x) y =1. p (x z,y)= p (x z) y =0 p data (x) y =1. Discriminator distribution )-inference model! " ($ &)! ($ &), perfect, degenerated! * ) &, $ of InfoGAN! * () &, $) KLD to minimize min / KL (3 / & $! 4 & ), $ ) ~ min / KL(6 / 7) min / KL! * ) &, $! 4 $ & 3 / ), $ & ~min / KL(7 6 / )

42 GANs vs VAEs side by side GANs (InfoGAN) VAEs KLD to minimize min $ KL (, $ , 2 ) ~ min $ KL(( $ *) min $ KL( , , $ (5, 2 1)) ~min $ KL(* ( $ ) Asymmetry of KLDs inspires combination of GANs and VAEs GANs: min $ KL(( $ *) tends to missing mode VAEs: min $ KL(* ( $ ) tends to cover regions with small values of, -./. Augment VAEs with GAN loss [Larsen et al., 2016] Alleviate the mode covering issue of VAEs Improve the sharpness of VAE generated images Augment GANs with VAE loss [Che et al., 2017] the mode missing issue [Figure courtesy: PRML] Mode covering Mode missing

43 Mutual exchanges of ideas: augment the loss GANs (InfoGAN) VAEs KLD to minimize min $ KL (, $ , 2 ) ~ min $ KL(( $ *) min $ KL( , , $ (5, 2 1)) ~min $ KL(* ( $ ) Asymmetry of KLDs inspires combination of GANs and VAEs GANs: min $ KL(( $ *) tends to missing mode VAEs: min $ KL(* ( $ ) tends to cover regions with small values of, -./. Augment VAEs with GAN loss [Larsen et al., 2016] Alleviate the mode covering issue of VAEs Improve the sharpness of VAE generated images Augment GANs with VAE loss [Che et al., 2017] Alleviate the mode missing issue of GAN

44 Mutual exchanges of ideas: augment the model GANs (InfoGAN) VAEs Discriminator distribution! " ($ &)! ($ &), perfect, degenerated Activate the adversarial mechanism in VAEs Enable adaptive incorporation of fake samples for learning Straightforward derivation by making symbolic analog to GANs! Vanilla VAEs Adversary Activated VAEs

45 Adversary Activated VAEs (AAVAE) Vanilla VAEs: max, L vae, = E p 0 (x) Eq (z x,y)q r (y x) [log p (x z,y)] KL(q (z x,y)q r (y x)kp(z y)p(y)) Replace! ($ &) with learnable one! ( ($ &) with parameters ) As usual, denote reversed distribution! ( * $ + =! ( $ + max, L aavae, = E p 0 (x) h E q (z x,y)q r (y x) [log p (x z,y)] i KL(q (z x,y)q r (y x)kp(z y)p(y))

46 AAVAE: empirical results Applied the adversary activating method on vanilla VAEs class-conditional VAEs (CVAE) semi-supervised VAEs (SVAE) Evaluated test-set variational lower bound on MNIST The higher the better Train Data Size VAE AA-VAE CVAE AA-CVAE SVAE AA-SVAE 1% % % X-axis: the ratio of training data for learning: (1%, 10%, 100%) Y-axis: value of test-set lower bound

47 " X AAVAE: empirical results Evaluated classification accuracy of SVAE and AA-SVAE 1% 10% SVAE ± ±.0009 AASVAE ± ±.0010 curacy of semi-supervised VAEs and the adversa Used 1% and 10% data labels in MNIST

48 Importance weighted GANs (IWGAN) Generator learning in vanilla GANs max E x p (x y)p(y) log q r (y x) 0 Generator learning in IWGAN max E x1,...,x k p (x y)p(y) apple Xk i=1 q r 0 (y x i) q 0 (y x i ) log qr 0 (y x i) Assigns higher weights to samples that are more realistic and fool the discriminator better

49 IWGAN: empirical results Evaluated on MNIST and SVHN Used pretrained NN to evaluate: Inception scores of samples from GANs and IW-GAN Confidence of a pre-trained classifier on generated samples + diversity of generated samples MNIST SVHN GAN 8.34± ±.03 IWGAN 8.45± ±.03 Classification accuracy Table 2: of Left: samples Inception from scores CGAN and IW-CGAN MNIST SVHN CGAN 0.985± ±.005 IWCGAN 0.987± ±.006 of vanilla GANs and the importance w

50 Symmetric modeling of latent & visible variables Empirical distributions over visible variables Impossible to be explicit distribution The only information we have is the observe data examples Do not know the true parametric form of data distribution Naturally an implicit distribution Easy to sample from, hard to evaluate likelihood Prior distributions over latent variables Traditionally defined as explicit distributions, e.g., Gaussian prior distribution Amiable for likelihood evaluation We can assume the parametric form according to our prior knowledge New tools to allow implicit priors and models GANs, density ratio estimation, approximate Bayesian computations E.g., adversarial autoencoder [Makhzani et al., 2015] replaces the Gaussian prior of vanilla VAEs with implicit priors

51 Symmetric modeling of latent & visible variables No difference in terms of formulations with implicit distributions and black-box NN models just swap the symbols! and " $ & '()*( ($) -. /01234/*5 $ - & 6171 (-) $. /01234/*5 - prior distr. + $ 4/5/ (+) ", -./012-(3 + Generation model Inference model " $ %&'(& (") +, -./012-(3 " data distr.

52 Symmetric modeling of latent & visible variables No difference in terms of formulations with implicit distributions and black-box NN models Difference in terms of space complexity depend on the problem at hand choose appropriate tools: implicit/explicit distribution, adversarial/maximum-likelihood optimization, prior distr. prior distr. adversarial loss : maximum likelihood loss max C log $ " %&'(& D z prior Generation model data distr. Generation model Inference model prior distr. Inference model &8/. + adversarial loss : max > log $ + &8/. B maximum likelihood loss data distr. data distr.

53 Conclusions Z Hu, Z YANG, R Salakhutdinov, E Xing, On Unifying Deep Generative Models, arxiv Deep generative models research have a long history Deep blief nets / Helmholtz machines / Predictability Minimization / Unification of deep generative models GANs and VAEs are essentially minimizing KLD in opposite directions Extends two phases of classic wake sleep algorithm, respectively A general formulation framework useful for Analyzing broad class of existing DGM and variants: ADA/InfoGAN/Joint-models/ Inspiring new models and algorithms by borrowing ideas across research fields Symmetric view of latent/visible variables No difference in formulation with implicit prior distributions and black-box NN transformations Difference in space complexity: choose appropriate tools

54 Thank You 54

arxiv: v5 [cs.lg] 11 Jul 2018

arxiv: v5 [cs.lg] 11 Jul 2018 On Unifying Deep Generative Models Zhiting Hu 1,2, Zichao Yang 1, Ruslan Salakhutdinov 1, Eric Xing 2 {zhitingh,zichaoy,rsalakhu}@cs.cmu.edu, eric.xing@petuum.com Carnegie Mellon University 1, Petuum Inc.