A Unified View of Deep Generative Models

Size: px
Start display at page:

Download "A Unified View of Deep Generative Models"

Transcription

1 SAILING LAB Laboratory for Statistical Artificial InteLigence & INtegreative Genomics A Unified View of Deep Generative Models Zhiting Hu and Eric Xing Petuum Inc. Carnegie Mellon University 1

2 Deep generative models 2

3 Deep generative models Define probabilistic distributions over a set of variables "Deep" means multiple layers of hidden variables! # $... # % &

4 Early forms of deep generative models Hierarchical Bayesian models Sigmoid brief nets [Neal 1992]! " ($) = 0,1 * 8 79! " (+) = 0,1, - " = 0,1. / 0 1" = 1 2 1,! " (+) (+) ($) / 6 7" = 1 27,! " = ! " (+) = ! " $

5 Early forms of deep generative models Hierarchical Bayesian models Sigmoid brief nets [Neal 1992] Neural network models Helmholtz machines [Dayan et al.,1995]! "! # inference weights $ [Dayan et al. 1995]

6 Early forms of deep generative models Hierarchical Bayesian models Sigmoid brief nets [Neal 1992] Neural network models Helmholtz machines [Dayan et al.,1995] Predictability minimization [Schmidhuber 1995] DATA Figure courtesy: Schmidhuber 1996

7 Early forms of deep generative models Training of DGMs via an EM style framework Sampling / data augmentation + = +?, + ABC? ~% +? & + ~% + ABC?, & Variational inference log % & E )* + & log %, &, + KL(2 3 + & %(+)) L(8, 9; &) max 8,9 L(8, 9; &) Wake sleep Wake: min, G )* (H I) log %, J K Sleep: min 3 G LM (I H) log 2 3 K J

8 Resurgence of deep generative models Variational autoencoders (VAEs) [Kingma & Welling, 2014] / Neural Variational Inference and Learning (NVIL) [Mnih & Gregor, 2014] ( ) (& $) inference model! " ($ &) generative model Figure courtesy: Kingma & Welling, 2014

9 Resurgence of deep generative models Variational autoencoders (VAEs) [Kingma & Welling, 2014] / Neural Variational Inference and Learning (NVIL) [Mnih & Gregor, 2014] Generative adversarial networks (GANs)! " : generative model # $ : discriminator

10 Outline Theoretical Basis of deep generative models Wake sleep algorithm Variational autoencoders Generative adversarial networks A unified view of deep generative models New formulations of deep generative models Symmetric modeling of latent and visible variables

11 Synonyms in the literature Posterior Distribution -> Inference model Variational approximation Recognition model Inference network (if parameterized as neural networks) Recognition network (if parameterized as neural networks) (Probabilistic) encoder "The Model" (prior + conditional, or joint) -> Generative model The (data) likelihood model Generative network (if parameterized as neural networks) Generator (Probabilistic) decoder 11

12 Recap: Variational Inference Consider a generative model! " # %, and prior! % Joint distribution:! " #, % =! " # %! % Assume variational distribution ( ) % # Objective: Maximize lower bound for log likelihood log! # =./ ( ) % #! 0 % # + 2 ( ) % # log! " #, % % ( ) % # 2 ( ) % # log! " #, % % ( ) % # L(0, 7; #) Equivalently, minimize free energy : 0, ;; # = log! # +./(( ) % #! 0 (% #))

13 Recap: Variational Inference Maximize the variational lower bound L(#, %; ') E-step: maximize L wrt. * with # fixed max. L #, %; ' = 0 12 (3 5) log 9 : ; < + >?(@. < ; 9(<)) If with closed form (< ;) exp[log 9 : (;, <)] M-step: maximize L wrt. # with * fixed max : L #, %; ' = 0 12 < ; log 9 : ; < + >?(@. < ; 9(<)) 13

14 Wake Sleep Algorithm [Hinton et al., Science 1995] Train a separate inference model along with the generative model Generally applicable to a wide range of generative models, e.g., Helmholtz machines Consider a generative model! " # $ and prior! $ Joint distribution! " #, $ =! " # $! $ E.g., multi-layer brief nets Inference model ' ( $ # Maximize data log-likelihood with two steps of loss relaxation: Maximize the lower bound of log-likelihood, or equivalently, minimize the free energy +,, -; # = log! # + 45(' ( $ #!, ($ #)) Minimize a different objective (reversed KLD) wrt * to ease the optimization Disconnect to the original variational lower bound loss +,, -; # = log! # + 45(! " $ # ' ( ($ #))

15 Wake Sleep Algorithm Free energy:.!, 0; - = log # (% 0, - #! (, -)) Minimize the free energy wrt.! of # $ à wake phase - R 2 R 1 max! E >? (, -) log #! (-,,) Get samples from % & (( *) through inference on hidden variables Use the samples as targets for updating the generative model # $ (, -) Correspond to the variational M step [Figure courtesy: Maei s slides]

16 Wake Sleep Algorithm Free energy: # D,?; B = log 8 B + KL(<? A B 8 D (A B)) Minimize the free energy wrt. & of < " A B Correspond to the variational E step Difficulties: <? A B = C D(A, B) Optimal C D A, B EA intractable High variance of direct gradient estimate! " # $, &; ( = +! ", -. (0 2) log 8 9 (:, () + Gradient estimate with the log-derivative trick:! ", -. log 8 9 =! " < " log 8 9 = < " log 8 9! " log < " =, -. [log 8 9! " log < " ] Monte Carlo estimation:! ", -. log 8 9, 0G -. [log 8 9 ((, : I )! " < " : I ( ] The scale factor log 8 9 of the derivative! " log < " can have arbitrary large magnitude

17 Wake Sleep Algorithm R 2 G 2 # Free energy: % ', ); # = log! # + 23(6 ) $ #! ' ($ #)) WS works around the difficulties with the sleep phase approximation Minimize the following objective à sleep phase % ', ); # = log! # + 23(! " $ # 6 7 ($ #)) max ) E => ($,#) log 6 7 $ # Dreaming up samples from! " # $ through top-down pass Use the samples as targets for updating the inference model (Recent approaches other than sleep phase is to reduce the variance of gradient estimate: slides later) [Figure courtesy: Maei s slides] R 1 G 1

18 Wake Sleep Algorithm Wake sleep Parametrized inference model! " # $ Wake phase: minimize &'(! " # $ * + (# $)) wrt. - E /0 (# $) 1 2 log * + $ # Sleep phase: minimize &'(* 2 # $! 6 (# $)) wrt. 7 E 89 (#,$) 1 6 log! 6 (#, $) low variance Learning with generated samples of $ Two objective, not guaranteed to converge Variational EM Variational distribution! 6 # $ Variational M step: minimize &'(! 6 # $ * + (# $)) wrt. - E /" (# $) 1 2 log * + $ # Variational E step: minimize &'(! 6 # $ * + (# $)) wrt. 7! 6 exp[log * 2 ] if with closed-form 1 6 B /0 log * 2 (C, D) need variance-reduce in practice Learning with real data $ Single objective, guaranteed to converge

19 Variational Autoencoders (VAEs) [Kingma & Welling, 2014] Use variational inference with an inference model Enjoy similar applicability with wake-sleep algorithm Generative model! " # $, and prior!($) Joint distribution! " #, $ =! " # $! $ Inference model ) * $ # ) * ($ #) inference model! " (# $) generative model Figure courtesy: Kingma & Welling, 2014

20 Variational Autoencoders (VAEs) Variational lower bound L #, %; ' = E 34 - ' log +, ', - KL(/ 0 - ' +(-)) Optimize L(#, %; ') wrt. * of +, ' - The same with the wake phase Optimize L(#, %; ') wrt.. of / 0 - ' < 0 L *,.; = = + < 34 (A B) log +, = C + Use reparameterization trick to reduce variance Alternatives: use control variates as in reinforcement learning [Mnih & Gregor, 2014; Paisley et al., 2012]

21 Reparametrized gradient Optimize L ", $; & wrt. ' of ( ) * & Recap: gradient estimate with log-derivative trick: 6 ) H CD log / 0 &, * = H CD [log / 0 &, * 6 ) log ( ) ] High variance: 6 ) H CD log / 0 H LM C D [log / 0 (2, 3 4 ) 6 ) ( ) ] The scale factor log / 0 (2, 3 4 ) of the derivative 6 ) log ( ) can have arbitrary large magnitude gradient estimate with reparameterization trick * ( ) * & G = g ) =, &, = /(=) 6 ) E CD * & log / 0 &, * = E = E(F) 6 ) log / 0 &, * ) = (Empirically) lower variance of the gradient estimate E.g., * 8 9 &, : & : & ; = 8 0,1, * = 9 & + :(&)=

22 VAEs: example results VAEs tend to generate blurred images due to the mode covering behavior (more later) Latent code interpolation and sentences generation from VAEs [Bowman et al., 2015]. iwanttotalktoyou. i want to be with you. i do n t want to be with you. idon twanttobewithyou. she did n t want to be with him. Celebrity faces [Radford 2015] 22

23 Generative Adversarial Nets (GANs) [Goodfellow et al., 2014] Generative model! = # $ %, % )(%) Map noise variable % to data space! Define an implicit distribution over!: ) -. (!) a stochastic process to simulate data! Intractable to evaluate likelihood Discriminator / 0! Output the probability that! came from the data rather than the generator No explicit inference model No obvious connection to previous models with inference networks like VAEs We will build formal connections between GANs and VAEs later

24 Generative Adversarial Nets (GANs) Learning A minimax game between the generator and the discriminator Train! to maximize the probability of assigning the correct label to both training examples and generated samples Train " to fool the discriminator max D L D = E x pdata (x) [log D(x)] + E x G(z),z p(z) [log(1 min G L G = E x G(z),z p(z) [log(1 D(x))]. D(x))] [Figure courtesy: Kim s slides]

25 GANs: example results Generated bedrooms [Radford et al., 2016] 25

26 The Zoo of DGMs Variational autoencoders (VAEs) [Kingma & Welling, 2014] Adversarial autoencoder [Makhzani et al., 2015] Importance weighted autoencoder [Burda et al., 2015] Implicit variational autoencoder [Mescheder., 2017] Generative adversarial networks (GANs) [Goodfellos et al., 2014] InfoGAN [Chen et al., 2016] CycleGAN [Zhu et al., 2017] Wasserstein GAN [Arjovsky et al., 2017] Autoregressive neural networks PixelRNN / PixelCNN [Oord et al., 2016] RNN (e.g., for language modeling) Generative moment matching networks (GMMNs) [Li et al., 2015; Dziugaite et al., 2015] Retricted Boltzmann Machines (RBMs) [Smolensky, 1986]

27 Alchemy Vs Chemistry 27

28 Outline Theoretical backgrounds of deep generative models Wake sleep algorithm Variational autoencoders Generative adversarial networks A unified view of deep generative models New formulations of deep generative models Symmetric modeling of latent and visible variables Z Hu, Z YANG, R Salakhutdinov, E Xing, On Unifying Deep Generative Models, arxiv

29 Generative Adversarial Nets (GANs): Implicit distribution over! # $ (! ') p (x y) = pg (x) y =0 p data (x) y =1. (distribution of generated images) (distribution of real images)! # )*!! = - $.,. #. ' = 0! # 1232! the code space of. is degenerated sample directly from data

30 A new formulation Rewrite GAN objectives in the variational-em format Recap: conventional formulation: max L = E x=g (z),z p(z y=0) [log(1 D (x))] + E x pdata (x) [log D (x)] max L = E x=g (z),z p(z y=0) [log D (x)] + E x pdata (x) [log(1 = E x=g (z),z p(z y=0) [log D (x)] Rewrite in the new form Implicit distribution over! # $ (! ')! = * $ +, + # + ' Discriminator distribution -. ('!) -. / '! = -. (1 '!) max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) D (x))]

31 GANs vs. Variational EM Variational EM Objectives max & L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / max, L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / Single objective for both - and. Extra prior regularization by +(/) The reconstruction term: maximize the conditional log-likelihood of ) with the generative distribution +, () /) conditioning on the latent code / inferred by % & (/ )) GAN Objectives max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) Two objectives Have global optimal state in the game theoretic view The objectives: maximize the conditional log-likelihood of " (or 1 ") with the distribution % & (" )) conditioning on data/generation ) inferred by +, () ") +, () /) is the generative model % & (/ )) is the inference model Interpret % & (" )) as the generative model Interpret +, () ") as the inference model

32 GANs vs. Variational EM Variational EM Objectives max & L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / max, L 4,6 = 8 9: (; <) log +, ) / + AB % & / ) + / Single objective for both - and. Extra prior regularization by +(/) The reconstruction term: maximize the conditional log-likelihood of ) with the generative distribution +, () /) conditioning on the latent code / inferred by % & (/ )) GAN Interpret C as latent variables Interpret generation of C as performing inference over latent Objectives max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) Two objectives Have global optimal state in the game theoretic view The objectives: maximize the conditional log-likelihood of " (or 1 ") with the distribution % & (" )) conditioning on data/generation ) inferred by +, () ") +, () /) is the generative model % & (/ )) is the inference model Interpret % & (" )) as the generative model Interpret +, () ") as the inference model

33 GANs: minimizing KLD As in Variational EM, we can further rewrite in the form of minimizing KLD to reveal more insights into the optimization problem For each optimization step of! " ($ &) at point ( = ( *,, =, *, let! & : uniform prior distribution! "-". $ = E 0(1)! "-". $ & $ & & $! * "-". ($) Lemma 1: The updates of 6 at 6 * have h r E p (x y)p(y) log q r = 0 (y x) i = = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) KL: KL divergence JSD: Jensen-shannon k k divergence = 0,

34 h i GANs: minimizing KLD Lemma 1: The updates of! at! " have h r E p (x y)p(y) log q r = 0 (y x) i = = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) Connection to variational inference See # as latent variables, $ as visible % &'&( # : prior distribution ) * * # $ ),', $ # % " &'& ( (#) : posterior distribution % & (# $): variational distribution Amortized inference: updates model parameter! Suggests relations to VAEs, as we will explore shortly = 0,

35 h i $ GANs: minimizing KLD! "7"# $ % = 1 =! ()*) ($)! "7"# $ % = 0 =!./8/# ($) 0 1 ($ % Lemma = 0) 1: The updates of! at! h " have r E p (x y)p(y) log q r = 0 (y x) i = = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) $! "7" 345 $ % = 0 =!./8/ 345 ($) = 0, Minimizing the KLD drives # $% (') to # )*+* (') By definition: #,-,. ' = E 1(2) #,-,. ' 3 = # $%4%. ' + # )*+* ' / 2 KL #, ; 3 = 1 >? ; 3 = 1 = KL # )*+* (;) >? ; 3 = 1 : constant, no free parameters KL #, ; 3 = 0 >? ; 3 = 0 = KL # $% (;) >? ; 3 = 0 : parameter A to optimize >?? ' 3 = 0 > 3 = 0 ' # C-C ",-,. ' seen as a mixture of # (') and # $%4% )*+* '.? mixing weights induced from > 3 = 0 ' C-C " Drives # $% ' 3 to mixture of # (') and # $%4% )*+*('). Drives # $% ' to # )*+* (')

36 $ GANs: minimizing KLD! "7"# $ % = 1 =! ()*) ($)! "7"# $ % = 0 =!./8/# ($) 0 1 ($ % = 0) h Lemma r 1 i E p (x y)p(y) log q r (y x) = 0 = missed 0 mode i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) $! "7" 345 $ % = 0 =!./8/ 345 ($) = 0, Missing mode phenomena of GANs Asymmetry of KLD Concentrates! " # $ = 0 to large modes of ' ( # $! *+ # misses modes of!,-.- (#) Symmetry of JSD Does not affect the behavior of mode missing KL! *+ (3) ' ( 3 $ = 0! *+ 3 = 5! *+ 3 log ' ( 3 $ = 0 :3 Large positive contribution to the KLD in the regions of 3 space where ' ( 3 $ = 0 is small, unless! *+ 3 is also small! *+ 3 tends to avoid regions where ' ( 3 $ = 0 is small

37 GANs: minimizing KLD Lemma 1: The updates of! at! " have h i r E p (x y)p(y) log q r (y x) = 0 = 0 i r he p(y) [KL (p (x y)kq r (x y))] JSD (p (x y = 0)kp (x y = 1)) No assumption on optimal discriminator # $ % " & ' = 0, Previous results usually rely on (near) optimal discriminator # & = 1 ' = +,-.- ' /(+,-.- ' (')) Optimality assumption is impractical: limited expressiveness of 4 $ [Arora et al 2017] Our result is a generalization of the previous theorem [Arjovsky & Bottou 2017] Plug the optimal discriminator into the above equation, we recover the theorem h i apple r E p (x y)p(y) log q r 1 (y x) = r 0 = 0 2 KL (p g kp data ) JSD (p g kp data ) = 0, which gives simplified explanations of the training dynamics and the missing mode issue only Give insights on the generator training when discriminator is optimal

38 GANs: minimizing KLD In summary: Reveal connection to variational inference Build connections to VAEs (slides soon) Inspire new model variants based on the connections Offer insights into the generator training Formal explanation of the missing mode behavior of GANs Still hold when the discriminator does not achieve its optimum at each iteration

39 GANs vs InfoGAN! " max L = E p (x y)p(y) [log q (y x)] max L = E p (x y)p(y) log q r (y x) max L = E p (x y)p(y) [log q (z x,y)q (y x)] max, L, = E p (x y)p(y) log q (z x,y)q r (y x)

40 Relates VAEs with GANs Resemblance of GAN generator learning to variational inference Suggest strong relations between VAEs and GANs Indeed, VAEs are basically minimizing KLD with an opposite direction, and with a degenerated adversarial discriminator swap the generation (solid-line) and inference (dashed-line) processes of InfoGAN degenerated discriminator InfoGAN VAEs

41 GANs vs VAEs side by side GANs (InfoGAN) VAEs Generative distribution p (x y) = pg (x) y =0 p data (x) y =1. p (x z,y)= p (x z) y =0 p data (x) y =1. Discriminator distribution )-inference model! " ($ &)! ($ &), perfect, degenerated! * ) &, $ of InfoGAN! * () &, $) KLD to minimize min / KL (3 / & $! 4 & ), $ ) ~ min / KL(6 / 7) min / KL! * ) &, $! 4 $ & 3 / ), $ & ~min / KL(7 6 / )

42 GANs vs VAEs side by side GANs (InfoGAN) VAEs KLD to minimize min $ KL (, $ , 2 ) ~ min $ KL(( $ *) min $ KL( , , $ (5, 2 1)) ~min $ KL(* ( $ ) Asymmetry of KLDs inspires combination of GANs and VAEs GANs: min $ KL(( $ *) tends to missing mode VAEs: min $ KL(* ( $ ) tends to cover regions with small values of, -./. Augment VAEs with GAN loss [Larsen et al., 2016] Alleviate the mode covering issue of VAEs Improve the sharpness of VAE generated images Augment GANs with VAE loss [Che et al., 2017] the mode missing issue [Figure courtesy: PRML] Mode covering Mode missing

43 Mutual exchanges of ideas: augment the loss GANs (InfoGAN) VAEs KLD to minimize min $ KL (, $ , 2 ) ~ min $ KL(( $ *) min $ KL( , , $ (5, 2 1)) ~min $ KL(* ( $ ) Asymmetry of KLDs inspires combination of GANs and VAEs GANs: min $ KL(( $ *) tends to missing mode VAEs: min $ KL(* ( $ ) tends to cover regions with small values of, -./. Augment VAEs with GAN loss [Larsen et al., 2016] Alleviate the mode covering issue of VAEs Improve the sharpness of VAE generated images Augment GANs with VAE loss [Che et al., 2017] Alleviate the mode missing issue of GAN

44 Mutual exchanges of ideas: augment the model GANs (InfoGAN) VAEs Discriminator distribution! " ($ &)! ($ &), perfect, degenerated Activate the adversarial mechanism in VAEs Enable adaptive incorporation of fake samples for learning Straightforward derivation by making symbolic analog to GANs! Vanilla VAEs Adversary Activated VAEs

45 Adversary Activated VAEs (AAVAE) Vanilla VAEs: max, L vae, = E p 0 (x) Eq (z x,y)q r (y x) [log p (x z,y)] KL(q (z x,y)q r (y x)kp(z y)p(y)) Replace! ($ &) with learnable one! ( ($ &) with parameters ) As usual, denote reversed distribution! ( * $ + =! ( $ + max, L aavae, = E p 0 (x) h E q (z x,y)q r (y x) [log p (x z,y)] i KL(q (z x,y)q r (y x)kp(z y)p(y))

46 AAVAE: empirical results Applied the adversary activating method on vanilla VAEs class-conditional VAEs (CVAE) semi-supervised VAEs (SVAE) Evaluated test-set variational lower bound on MNIST The higher the better Train Data Size VAE AA-VAE CVAE AA-CVAE SVAE AA-SVAE 1% % % X-axis: the ratio of training data for learning: (1%, 10%, 100%) Y-axis: value of test-set lower bound

47 " X AAVAE: empirical results Evaluated classification accuracy of SVAE and AA-SVAE 1% 10% SVAE ± ±.0009 AASVAE ± ±.0010 curacy of semi-supervised VAEs and the adversa Used 1% and 10% data labels in MNIST

48 Importance weighted GANs (IWGAN) Generator learning in vanilla GANs max E x p (x y)p(y) log q r (y x) 0 Generator learning in IWGAN max E x1,...,x k p (x y)p(y) apple Xk i=1 q r 0 (y x i) q 0 (y x i ) log qr 0 (y x i) Assigns higher weights to samples that are more realistic and fool the discriminator better

49 IWGAN: empirical results Evaluated on MNIST and SVHN Used pretrained NN to evaluate: Inception scores of samples from GANs and IW-GAN Confidence of a pre-trained classifier on generated samples + diversity of generated samples MNIST SVHN GAN 8.34± ±.03 IWGAN 8.45± ±.03 Classification accuracy Table 2: of Left: samples Inception from scores CGAN and IW-CGAN MNIST SVHN CGAN 0.985± ±.005 IWCGAN 0.987± ±.006 of vanilla GANs and the importance w

50 Symmetric modeling of latent & visible variables Empirical distributions over visible variables Impossible to be explicit distribution The only information we have is the observe data examples Do not know the true parametric form of data distribution Naturally an implicit distribution Easy to sample from, hard to evaluate likelihood Prior distributions over latent variables Traditionally defined as explicit distributions, e.g., Gaussian prior distribution Amiable for likelihood evaluation We can assume the parametric form according to our prior knowledge New tools to allow implicit priors and models GANs, density ratio estimation, approximate Bayesian computations E.g., adversarial autoencoder [Makhzani et al., 2015] replaces the Gaussian prior of vanilla VAEs with implicit priors

51 Symmetric modeling of latent & visible variables No difference in terms of formulations with implicit distributions and black-box NN models just swap the symbols! and " $ & '()*( ($) -. /01234/*5 $ - & 6171 (-) $. /01234/*5 - prior distr. + $ 4/5/ (+) ", -./012-(3 + Generation model Inference model " $ %&'(& (") +, -./012-(3 " data distr.

52 Symmetric modeling of latent & visible variables No difference in terms of formulations with implicit distributions and black-box NN models Difference in terms of space complexity depend on the problem at hand choose appropriate tools: implicit/explicit distribution, adversarial/maximum-likelihood optimization, prior distr. prior distr. adversarial loss : maximum likelihood loss max C log $ " %&'(& D z prior Generation model data distr. Generation model Inference model prior distr. Inference model &8/. + adversarial loss : max > log $ + &8/. B maximum likelihood loss data distr. data distr.

53 Conclusions Z Hu, Z YANG, R Salakhutdinov, E Xing, On Unifying Deep Generative Models, arxiv Deep generative models research have a long history Deep blief nets / Helmholtz machines / Predictability Minimization / Unification of deep generative models GANs and VAEs are essentially minimizing KLD in opposite directions Extends two phases of classic wake sleep algorithm, respectively A general formulation framework useful for Analyzing broad class of existing DGM and variants: ADA/InfoGAN/Joint-models/ Inspiring new models and algorithms by borrowing ideas across research fields Symmetric view of latent/visible variables No difference in formulation with implicit prior distributions and black-box NN transformations Difference in space complexity: choose appropriate tools

54 Thank You 54

arxiv: v5 [cs.lg] 11 Jul 2018

arxiv: v5 [cs.lg] 11 Jul 2018 On Unifying Deep Generative Models Zhiting Hu 1,2, Zichao Yang 1, Ruslan Salakhutdinov 1, Eric Xing 2 {zhitingh,zichaoy,rsalakhu}@cs.cmu.edu, eric.xing@petuum.com Carnegie Mellon University 1, Petuum Inc.

More information

B PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018

B PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018 A ADVERSARIAL DOMAIN ADAPTATION (ADA ADA aims to transfer prediction nowledge learned from a source domain with labeled data to a target domain without labels, by learning domain-invariant features. Let

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

The Success of Deep Generative Models

The Success of Deep Generative Models The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability

More information

ON UNIFYING DEEP GENERATIVE MODELS

ON UNIFYING DEEP GENERATIVE MODELS ON UNIFYING DEEP GENERATIVE MODELS Zhiting Hu 1,2 Zichao Yang 1 Ruslan Salakhutdinov 1 Eric P. Xing 1,2 Carnegie Mellon University 1, Petuum Inc. 2 ABSTRACT Deep generative models have achieved impressive

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation

More information

Denoising Criterion for Variational Auto-Encoding Framework

Denoising Criterion for Variational Auto-Encoding Framework Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Denoising Criterion for Variational Auto-Encoding Framework Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

GENERATIVE ADVERSARIAL LEARNING

GENERATIVE ADVERSARIAL LEARNING GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks SIBGRAPI 2017 Tutorial Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask Presentation content inspired by Ian Goodfellow s tutorial

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Generative Adversarial Networks. Presented by Yi Zhang

Generative Adversarial Networks. Presented by Yi Zhang Generative Adversarial Networks Presented by Yi Zhang Deep Generative Models N(O, I) Variational Auto-Encoders GANs Unreasonable Effectiveness of GANs GANs Discriminator tries to distinguish genuine data

More information

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models Auto-Encoding Variational Bayes Diederik Kingma and Max Welling Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo J. Rezende, Shakir Mohamed, Daan Wierstra Neural Variational

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo

More information

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Variational Auto Encoders

Variational Auto Encoders Variational Auto Encoders 1 Recap Deep Neural Models consist of 2 primary components A feature extraction network Where important aspects of the data are amplified and projected into a linearly separable

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Generative adversarial networks

Generative adversarial networks 14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based

More information

Deep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila

Deep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila Deep Generative Models for Graph Generation Jian Tang HEC Montreal CIFAR AI Chair, Mila Email: jian.tang@hec.ca Deep Generative Models Goal: model data distribution p(x) explicitly or implicitly, where

More information

Deep Generative Models

Deep Generative Models Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Generative Models for Sentences

Generative Models for Sentences Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)

More information

Variational Inference for Monte Carlo Objectives

Variational Inference for Monte Carlo Objectives Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

BACKPROPAGATION THROUGH THE VOID

BACKPROPAGATION THROUGH THE VOID BACKPROPAGATION THROUGH THE VOID Optimizing Control Variates for Black-Box Gradient Estimation 27 Nov 2017, University of Cambridge Speaker: Geoffrey Roeder, University of Toronto OPTIMIZING EXPECTATIONS

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

AdaGAN: Boosting Generative Models

AdaGAN: Boosting Generative Models AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)

More information

Nonparametric Inference for Auto-Encoding Variational Bayes

Nonparametric Inference for Auto-Encoding Variational Bayes Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an

More information

Natural Gradients via the Variational Predictive Distribution

Natural Gradients via the Variational Predictive Distribution Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Diversity-Promoting Bayesian Learning of Latent Variable Models

Diversity-Promoting Bayesian Learning of Latent Variable Models Diversity-Promoting Bayesian Learning of Latent Variable Models Pengtao Xie 1, Jun Zhu 1,2 and Eric Xing 1 1 Machine Learning Department, Carnegie Mellon University 2 Department of Computer Science and

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Variational AutoEncoder: An Introduction and Recent Perspectives

Variational AutoEncoder: An Introduction and Recent Perspectives Metode de optimizare Riemanniene pentru învățare profundă Proiect cofinanțat din Fondul European de Dezvoltare Regională prin Programul Operațional Competitivitate 2014-2020 Variational AutoEncoder: An

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Energy-Based Generative Adversarial Network

Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial

More information

Evaluating the Variance of

Evaluating the Variance of Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 14 Sinan Kalkan Today Hopfield Networks Boltzmann Machines Deep BM, Restricted BM Generative Adversarial Networks Variational Auto-encoders Autoregressive

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

arxiv: v1 [stat.ml] 6 Dec 2018

arxiv: v1 [stat.ml] 6 Dec 2018 missiwae: Deep Generative Modelling and Imputation of Incomplete Data arxiv:1812.02633v1 [stat.ml] 6 Dec 2018 Pierre-Alexandre Mattei Department of Computer Science IT University of Copenhagen pima@itu.dk

More information

CAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING

CAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING CAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING (Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis & Sriram Vishwanath, 2017) Summer Term 2018 Created for the Seminar

More information

Variational Bayes on Monte Carlo Steroids

Variational Bayes on Monte Carlo Steroids Variational Bayes on Monte Carlo Steroids Aditya Grover, Stefano Ermon Department of Computer Science Stanford University {adityag,ermon}@cs.stanford.edu Abstract Variational approaches are often used

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu-Chiang Frank Wang 王鈺強, Associate Professor

More information

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University

More information

Training Deep Generative Models: Variations on a Theme

Training Deep Generative Models: Variations on a Theme Training Deep Generative Models: Variations on a Theme Philip Bachman McGill University, School of Computer Science phil.bachman@gmail.com Doina Precup McGill University, School of Computer Science dprecup@cs.mcgill.ca

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

arxiv: v7 [cs.lg] 27 Jul 2018

arxiv: v7 [cs.lg] 27 Jul 2018 How Generative Adversarial Networks and Their Variants Work: An Overview of GAN Yongjun Hong, Uiwon Hwang, Jaeyoon Yoo and Sungroh Yoon Department of Electrical & Computer Engineering Seoul National University,

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

Inference Suboptimality in Variational Autoencoders

Inference Suboptimality in Variational Autoencoders Inference Suboptimality in Variational Autoencoders Chris Cremer Department of Computer Science University of Toronto ccremer@cs.toronto.edu Xuechen Li Department of Computer Science University of Toronto

More information

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Emily Denton 1, Soumith Chintala 2, Arthur Szlam 2, Rob Fergus 2 1 New York University 2 Facebook AI Research Denotes equal

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Contrastive Divergence

Contrastive Divergence Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a

More information

Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks

Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks Panos Stinis (joint work with T. Hagge, A.M. Tartakovsky and E. Yeung) Pacific Northwest National Laboratory

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Variational Dropout via Empirical Bayes

Variational Dropout via Empirical Bayes Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,

More information

MMD GAN 1 Fisher GAN 2

MMD GAN 1 Fisher GAN 2 MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber

More information

Machine Learning Summer 2018 Exercise Sheet 4

Machine Learning Summer 2018 Exercise Sheet 4 Ludwig-Maimilians-Universitaet Muenchen 17.05.2018 Institute for Informatics Prof. Dr. Volker Tresp Julian Busch Christian Frey Machine Learning Summer 2018 Eercise Sheet 4 Eercise 4-1 The Simpsons Characters

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

Inductive Principles for Restricted Boltzmann Machine Learning

Inductive Principles for Restricted Boltzmann Machine Learning Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Summary of A Few Recent Papers about Discrete Generative models

Summary of A Few Recent Papers about Discrete Generative models Summary of A Few Recent Papers about Discrete Generative models Presenter: Ji Gao Department of Computer Science, University of Virginia https://qdata.github.io/deep2read/ Outline SeqGAN BGAN: Boundary

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information