AdaGAN: Boosting Generative Models

Size: px
Start display at page:

Download "AdaGAN: Boosting Generative Models"

Transcription

1 AdaGAN: Boosting Generative Models Ilya Tolstikhin joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain

2 Radford et al., 2015)

3 Radford et al., 2015)

4 Ngyuen et al., 2016)

5 Youtube: Yota Ishida)

6 Motivation Generative Adversarial Networks, Variational Autoencoders, and other unsupervised representation learning/generative modelling techniques ROCK! Dozens of papers being submitted, discussed, published,... Among papers submitted to ICLR have titles matching Generative Adversarial*. NIPS2016 Tutorial on GANs. GAN Workshop at NIPS2016 was the second-largest. That s all very exciting. However: GANs are very hard to train. The field is, apparently, in its trials-and-errors phase. People try all kinds of heuristic hacks. Sometimes they work, sometimes not... Very few papers manage to go beyond heuristics and provide a deeper conceptual understanding of what s going on.

7 AdaGAN: adaptive mixtures of GANs

8 AdaGAN: adaptive mixtures of GANs

9 Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data, P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) T min D f P data, α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata, P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

10 Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) min D T f P data α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

11 f-gan: Adversarial training for f-divergence minimization For any probability distributions P and Q, f-divergence is ) dq D f Q P ) := f dp x) dp x), where f : 0, ) R is convex and satisfies f1) = 0. Properties of f-divergence: 1. D f P Q) 0 and D f P Q) = 0 when P = Q; 2. D f P Q) = D f Q P ) for f x) := xf1/x); 3. D f P Q) is symmetric when fx) = xf1/x) for all x; 4. D f P Q) = D g P Q) for any gx) = fx) + Cx 1); 5. Taking f 0 x) := fx) x 1)f 1) is a good choice. It is nonnegative, nonincreasing on 0, 1], and nondecreasing on [1, ).

12 f-gan: Adversarial training for f-divergence minimization For any probability distributions P and Q, f-divergence is ) dq D f Q P ) := f dp x) dp x), where f : 0, ) R is convex and satisfies f1) = 0. Examples of f-divergences: 1. Kullback-Leibler divergence for fx) = x log x; 2. Reverse Kullback-Leibler divergence for fx) = log x; 3. Total Variation distance for fx) = x 1 ; 4. Jensen-Shannon divergence for fx) = x + 1) log x x log x: JSP Q) = KL P P + Q ) + KL Q P + Q ). 2 2

13 f-gan: Adversarial training for f-divergence minimization Kullback-Leibler minimization: assume X 1,..., X n Q ) dq min KLQ P ) min log P P dp x) dqx) min logdp x))dqx) P max logdp x))dqx) P 1 N max logdp X i )). P N Reverse Kullback-Leibler minimization: ) dp min KLP Q) min log P P dq x) dp x) min HP ) logdqx))dp x) P max HP ) + logdqx))dp x). P i=1

14 f-gan: Adversarial training for f-divergence minimization Now we will follow Nowozin et al., NIPS2016). For any probability distributions P and Q over X, f-divergence is ) dq D f Q P ) := f dp x) dp x). Variational representation: a very neat result, saying that D f Q P ) = [ sup E X Q [gx)] E Y P f gy ) )], g : X domf ) where f x) := sup u x u fu) is a convex conjugate of f. D f can be computed by solving an optimization problem! The f-divergence minimization can be now written as: inf D f Q P ) inf P P sup [ E X Q [gx)] E Y P f gy ) )]. g :...

15 f-gan: Adversarial training for f-divergence minimization The f-divergence minimization can be now written as: inf D f Q P ) inf P P [ sup E X Q [gx)] E Y P f gy ) )]. g : X domf ) 1. Take P to be a distribution of G θ Z), where Z P Z is a latent noise and G θ is a deep net; 2. JS divergence corresponds to fx) = x + 1) log x x log x and f t) = log 2 e t) and thus domain of f is, log 2); 3. Take g = g f D ω, where g f v) = log 2 log1 + e v ) and D ω is a deep net check that gx) < log 2 for all x). Then up to additive 2 log 2 term we get: inf G θ sup D ω E X Q log e + E DωX) Z P Z log 1 ) 1 ) 1 + e Dω G θ Z) Compare to the original GAN objective Goodfellow et al., NIPS2014) inf G θ sup E X Pd [log D ω X)] + E Z PZ [log 1 D ω G θ Z)) ) ]. D ω

16 f-gan: Adversarial training for f-divergence minimization Then we get the original GAN objective Goodfellow et al., NIPS2014) inf G θ sup E X Pd [log D ω X)] + E Z PZ [log 1 D ω G θ Z)) ) ]. *) D ω Let s denote a distribution of G θ Z) when Z P Z with P θ. For any fixed generator G θ, the theoretical-optimal D is: D θx) = If we insert this back to *) we get inf G θ dp d x) dp d x) + dp θ x). 2 JSP d P θ ) log4).

17 f-gan: Adversarial training for f-divergence minimization The f-divergence minimization can be now written as: inf D f Q P ) inf P P This is GREAT, because: [ sup E X Q [gx)] E Y P f gy ) )]. g : X domf ) 1. Estimate E X Q [gx)] using an i.i.d. sample data) X 1,..., X N Q: E X Q [gx)] 1 N N gx i ); i=1 2. Often we don t know the analytical form of P, but can only sample from P. No problems! Sample i.i.d. Y 1,..., Y M P and write: E Y P [ f gy ) )] 1 M M f gy j ) ). j=1 This results in implicit generative modelling: using complex model distributions, which are analytically intractable, but easy to sample.

18 f-gan: Adversarial training for f-divergence minimization Note that there is still a difference between doing GAN: inf G θ sup E X Pd [log D ω X)] + E Z PZ [log 1 D ω G θ Z)) ) ] D ω and minimizing JSP data, P model ): because inf P model [ )] sup E X Pdata [gx)] E Y Pmodel f JS gy ), g : We replaced expectations with sample averages. Uniform lows of large numbers may not apply, as our function classes are huge; 2. Instead of taking supremum over all possible witness functions g we optimize over restricted class of DNNs which is still huge); 3. In practice we never optimize g/d ω to the end. This leads to vanishing gradients and other bad things.

19 Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) min D T f P data α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

20 Minimizing f-divergences with mixtures of distributions Assumption: Let s assume we know more or less how to solve min D f Q P ) Q G for any f and any P, given we have an i.i.d. sample from P. New problem: Now we want to approximate an unknown P d in D f using a mixture model of the form Greedy strategy: P T model := T α i P i. i=1 1. Train P 1 to minimize D f P 1 P d ), set α 1 = 1, and get P 1 model = P 1; 2. Assume we trained P 1,..., P t with weights α 1,..., α t. We are going to train one more component Q, choose weight β and update P t+1 model = t 1 β)α i P i + βq. i=1

21 Minimizing f-divergences with mixtures of distributions 1. Train P 1 to minimize D f P 1 P d ), set α 1 = 1, and get P 1 model = P 1; 2. Assume we trained P 1,..., P t with weights α 1,..., α t. We are going to train one more component Q, choose weight β and update P t+1 model = t 1 β)α i P i + βq. i=1 The goal: we want to choose β [0, 1] and Q greedily, so as to minimize where we denoted P g := P t model. D f 1 β)p g + βq P d ), A modest goal: find β [0, 1] and Q, such that, for c < 1: D f 1 β)p g + βq P d ) c D f P g P d ). Inefficient in practice: as t we expect β 0. Only β fraction of points gets sampled from Q gets harder and harder to tune Q.

22 Minimizing f-divergences with mixtures of distributions D f 1 β)p g + βq P d ) *) Inefficient in practice: as t we expect β 0. Only β fraction of points gets sampled from Q gets harder and harder to tune Q. Workaround: instead optimize an upper bound on *), which contains a term D f Q, Q 0 ) for some Q 0 : Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d ) βdq R) + 1 β)d f P g P ) d βr. 1 β If furthermore D f is a Hilbertian metric, then, for any R, we have ) βd f Q R) + D f 1 β)p g + βr P d ). JS-divergence, TV, Hellinger, and others are Hilbertian metrics.

23 Minimizing f-divergences with mixtures of distributions D f 1 β)p g + βq P d ) *) Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d ) βdq R) + 1 β)d f P g P ) d βr. 1 β If furthermore D f is a Hilbertian metric, then, for any R, we have ) 2 βd f Q R) + D f 1 β)p g + βr P d )). Recipe: choose R to minimize the right-most terms and then minimize D f Q R ), which we can do given we have an i.i.d. sample from R.

24 Minimizing f-divergences with mixtures of distributions First surrogate upper bound Lemma Given P d, P g, and β [0, 1], for any Q and R, if D f is Hilbertian ) 2 βd f Q R) + D f 1 β)p g + βr P d )). The optimal R is given in the next result: Theorem Given P d, P g, and β 0, 1], the solution to min D f 1 β)p g + βr P d ), R P where P is a class of all probability distributions, has the density dr βx) = 1 β λ dp d x) 1 β)dp g x)) + for some unique λ satisfying dr β = 1. Also, λ = 1 if and only if P d 1 β)dp g > dp d ) = 0, which is equivalent to βdq β = dp d 1 β)dp g.

25 Minimizing f-divergences with mixtures of distributions Second surrogate upper bound Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d D f 1 β)p g + βq P d ) βdq R) + 1 β)d f P g P ) d βr. 1 β The optimal R is given in the next result: Theorem Given P d, P g, and β 0, 1], assume P d dp g = 0) < β. Then the solution to min D f P g P ) d βr R:βdR dp d 1 β is given by the density dr β x) = 1 β dpd x) λ 1 β)dp g x) ) + for a unique λ 1 satisfying dr β = 1.

26 Minimizing f-divergences with mixtures of distributions Two different optimal R : dr βx) = 1 β λ dp d x) 1 β)dp g x)) + dr β x) = 1 β dpd x) λ 1 β)dp g x) ) + Very symmetrically looking, but no obvious connections found... Both do not depend on f! λ and λ are implicitly defined via fixed-point equations. drβ x) is also a solution to the original problem, that is of min D f 1 β)p g + βq P d ). Q R We see that setting β = 1 is the best possible choice in theory. So we ll have to use various heuristics, including uniform, constant,...

27 Minimizing f-divergences with mixtures of distributions D f 1 β)p g + βq P d ) *) Lemma Given P d, P g, and β [0, 1], for any Q and any R with β dr dp d ) βdq R) + 1 β)d f P g P ) d βr. 1 β If furthermore D f is a Hilbertian metric, then, for any R, we have ) 2 βd f Q R) + D f 1 β)p g + βr P d )). Recipe: choose R to minimize the right-most terms and then minimize D f Q R ), which we can do given we have an i.i.d. sample from R.

28 Minimizing f-divergences with mixtures of distributions Recipe: choose R to minimize the right-most terms in the surrogate upper bounds and then minimize the first term D f Q R ). We found optimal R distributions. Let s now assume we are super powerful and managed to find Q with D f Q R ) = 0. In other words, we are going to update our model: with Q either R β or R β. P g 1 β)p g + βq How well are we doing in terms of the original objective Theorem We have: D f 1 β)pg +βr β D f 1 β)p g + βq P d )? Pd ) Df 1 β)pg +βp d Pd ) 1 β)df P g P d ) and D f 1 β)pg + βr β Pd ) 1 β)df P g P d ).

29 Minimizing f-divergences with mixtures of distributions Other highlights: More careful convergence analysis. In particular we provide a necessary and sufficient condition for the iterative process to converge in a finite number of steps: there exists M > 0 such that P d 1 β)dp 1 > MdP d ) = 0. In other words, P 1 P d In practice we don t attain D f Q R ) = 0. We prove that given D f Q R β) γd d P g P d ) or D f Q R β ) γd dp g P d ) we achieve D f 1 β)p g + βq P d ) c D f P g P d ) with c = cγ, β) < 1. This is kind of similar to weak learnability in AdaBoost.

30 Contents 1. Generative Adversarial Networks and f-divergences GANs are trying to minimize specific f-divergence min D f P data P model ). P model P At least, this is one way to explain what they are doing. 2. Minimizing f-divergences with mixtures of distributions We study theoretical properties of the following problem: ) min D T f P data α t P t α T,P T P and show that it can be reduced to min P T P D f t=1 Pdata P T ). *) 3. Back to GANs Now we use GANs to train P T in *), which results in AdaGAN.

31 AdaGAN: adaptive mixtures of GANs We want to find Q so as to minimize D f Q Rβ ), where dr βx) = 1 β λ dp d x) 1 β)dp g x)) +. We would like to use GAN for that. How do we sample from R β? Let s rewrite: dr βx) = dp dx) β λ 1 β) dp ) g x). dp d + Recall: E Pd [log DX)] + E Pg [log 1 DY ) ) ] is maximized by: D x) = dp d x) dp d x) + dp g x) dp g x) = 1 D x) dp d D := hx). x) Each training sample end up with a weight w i = p i λ 1 β)hx i ) ) β + 1 λ 1 β)hx i ) ) Nβ +. **) Finally, we compute λ using **).

32 AdaGAN: adaptive mixtures of GANs Each training sample end up with a weight w i = p i λ 1 β)hx i ) ) β 1 + Nβ λ 1 β) 1 Dx i) Dx i ) Now we run a usual GAN with an importance-weighted sample. Dx i ) 1: our current model P g is missing this point. Big w i. Dx i ) 0: x i is covered by P g nicely. Zero w i. The better our model gets, the more data points will be zeroed. This leads to an aggressive adaptive iterative procedure. β can be also chosen so as to force #{w i : w i > 0} r N. ). +

33 AdaGAN: adaptive mixtures of GANs Coverage metric C: mass of the true data covered by the model: C := P data dp model X) > t) where P model dp model > t) = Experimental setting: Mixture of 5 two-dimensional Gaussians. Best of T GANs Bagging AdaGAN Blue points: median over 35 random runs; Green error bars: 90% intervals. Not the confidence intervals for medians!

34 AdaGAN: adaptive mixtures of GANs Coverage metric C: mass of the true data covered by the model: C := P data dp model X) > t) where P model dp model > t) = TopKLast r: β t is chosen to zero out r N of the training; Boosted: AdaGAN with β t = 1/t.

35 AdaGAN: adaptive mixtures of GANs Coverage metric C: mass of the true data covered by the model: C := P data dp model X) > t) where P model dp model > t) = Experimental setting: Mixture of 10 two-dimensional Gaussians.

36 AdaGAN: adaptive mixtures of GANs

37 Conclusions Pros: It seems that AdaGAN is solving the missing modes problem; AdaGAN is compatible with any implementation of GAN and more generally with any other implicit generative modelling techniques; Still easy to sample from; Easy to implement and great freedom with heuristics; Theoretically grounded! Cons: We loose the nice manifold structure in the latent space; Increased computational complexity; Individual GANs are still very unstable.

38 Thank you! AdaGAN: Boosting Generative Models. Tolstikhin, Gelly, Bousquet, Simon-Gabriel, Schoelkopf, NIPS

39 Minimizing the optimal transport Optimal transport for a cost function cx, y): X X R + is W c P X, P G ) := inf E X,Y ) Γ[cX, Y )], Γ PX P X,Y P G ) If P G Y Z = z) = δ Gz) for all z Z, where G: Z X, we have W c P X, P G ) = [ )] inf E PX E QZ X) c X, GZ), Q : Q Z =P Z where Q Z is the marginal distribution of Z when X P X, Z QZ X).

40 Minimizing the optimal transport Optimal transport for a cost function cx, y): X X R + is W c P X, P G ) := inf E X,Y ) Γ[cX, Y )], Γ PX P X,Y P G ) If P G Y Z = z) = δ Gz) for all z Z, where G: Z X, we have W c P X, P G ) = [ )] inf E PX E QZ X) c X, GZ), Q : Q Z =P Z where Q Z is the marginal distribution of Z when X P X, Z QZ X).

41 Relaxing the constraint W c P X, P G ) = [ )] inf E PX E QZ X) c X, GZ), Q : Q Z =P Z Penalized Optimal Transport replaces the constraint with a penalty: POTP X, P G ) := inf Q E P X E QZ X) [ c X, GZ) )] + λ DQZ, P Z ) and uses the adversarial training in the Z space to approximate D. For the 2-Wasserstein distance cx, Y ) = X Y 2 2 POT recovers Adversarial Auto-Encoders Makhzani et al., 2016) For the 1-Wasserstein distance cx, Y ) = X Y 2 POT and WGAN are solving the same problem from the primal and dual forms respectively. Importantly, unlike VAE, POT does not force QZ X = x) to intersect for different x, which is known to lead to the blurriness. Bousquet et al., 2017)

42 Further details on GAN: the log trick inf G θ sup T ω E X Pd [log T ω X)] + E Z PZ [log 1 T ω G θ Z)) ) ] In practice verify this!) G θ is often trained to instead minimize E Z PZ [log T ω G θ Z)) ) ] The log trick ) One possible explanation: For any f r and P G the optimal T in the variational representation of D fr P X P G ) has the form ) T x) := f px x) r p G x) p X x) p G x) = f r ) 1 T x) ) 1. Approximate T T ω using the adversarial training with f r; 2. Express p X pg x) = f r ) 1 T w x) ) ; 3. Approximate the desired f-divergence D f P X P G ) using ) px x) f p G x)dx f f r X p G x) ) 1 T x) )) p G x)dx X Applying this argument with the JS f r and fx) = log1 + x 1 ) recovers the log trick. The corresponding f-divergence is much more mode-seeking than the reverse KL divergence.

arxiv: v2 [stat.ml] 24 May 2017

arxiv: v2 [stat.ml] 24 May 2017 AdaGAN: Boosting Generative Models Ilya Tolstikhin 1, Sylvain Gelly 2, Olivier Bousquet 2, Carl-Johann Simon-Gabriel 1, and Bernhard Schölkopf 1 arxiv:1701.02386v2 [stat.ml] 24 May 2017 1 Max Planck Institute

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group

More information

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

Summary of A Few Recent Papers about Discrete Generative models

Summary of A Few Recent Papers about Discrete Generative models Summary of A Few Recent Papers about Discrete Generative models Presenter: Ji Gao Department of Computer Science, University of Virginia https://qdata.github.io/deep2read/ Outline SeqGAN BGAN: Boundary

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018 Some theoretical properties of GANs Gérard Biau Toulouse, September 2018 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1 video Source:

More information

Generative Adversarial Networks. Presented by Yi Zhang

Generative Adversarial Networks. Presented by Yi Zhang Generative Adversarial Networks Presented by Yi Zhang Deep Generative Models N(O, I) Variational Auto-Encoders GANs Unreasonable Effectiveness of GANs GANs Discriminator tries to distinguish genuine data

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

GENERATIVE ADVERSARIAL LEARNING

GENERATIVE ADVERSARIAL LEARNING GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Theory and Applications of Wasserstein Distance. Yunsheng Bai Mar 2, 2018

Theory and Applications of Wasserstein Distance. Yunsheng Bai Mar 2, 2018 Theory and Applications of Wasserstein Distance Yunsheng Bai Mar 2, 2018 Roadmap 1. 2. 3. Why Study Wasserstein Distance? Elementary Distances Between Two Distributions Applications of Wasserstein Distance

More information

Understanding GANs: Back to the basics

Understanding GANs: Back to the basics Understanding GANs: Back to the basics David Tse Stanford University Princeton University May 15, 2018 Joint work with Soheil Feizi, Farzan Farnia, Tony Ginart, Changho Suh and Fei Xia. GANs at NIPS 2017

More information

Generative Models and Optimal Transport

Generative Models and Optimal Transport Generative Models and Optimal Transport Marco Cuturi Joint work / work in progress with G. Peyré, A. Genevay (ENS), F. Bach (INRIA), G. Montavon, K-R Müller (TU Berlin) Statistics 0.1 : Density Fitting

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

A Unified View of Deep Generative Models

A Unified View of Deep Generative Models SAILING LAB Laboratory for Statistical Artificial InteLigence & INtegreative Genomics A Unified View of Deep Generative Models Zhiting Hu and Eric Xing Petuum Inc. Carnegie Mellon University 1 Deep generative

More information

arxiv: v3 [stat.ml] 12 Mar 2018

arxiv: v3 [stat.ml] 12 Mar 2018 Wasserstein Auto-Encoders Ilya Tolstikhin 1, Olivier Bousquet 2, Sylvain Gelly 2, and Bernhard Schölkopf 1 1 Max Planck Institute for Intelligent Systems 2 Google Brain arxiv:1711.01558v3 [stat.ml] 12

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Geometrical Insights for Implicit Generative Modeling

Geometrical Insights for Implicit Generative Modeling Geometrical Insights for Implicit Generative Modeling Leon Bottou a,b, Martin Arjovsky b, David Lopez-Paz a, Maxime Oquab a,c a Facebook AI Research, New York, Paris b New York University, New York c Inria,

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Generative adversarial networks

Generative adversarial networks 14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

14 : Mean Field Assumption

14 : Mean Field Assumption 10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks SIBGRAPI 2017 Tutorial Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask Presentation content inspired by Ian Goodfellow s tutorial

More information

f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

f-gan: Training Generative Neural Samplers using Variational Divergence Minimization f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin,

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Variational Auto-Encoders (VAE)

Variational Auto-Encoders (VAE) Variational Auto-Encoders (VAE) Jonathan Pillow Lecture 21 slides NEU 560 Spring 2018 VAE Generative Model latent ~z N (0, I) data f aaacyhicdzhnbhmxemed5assxy0cuvhesfyidiskofb0ungoiktasnk0cpzzxoq/zm+mjsxfeasu8as8ew+dk64qswaks3/n7z9jj2dipfbyfl862z279+4/2huyp3r85omz/ypn5940jkofg2ncymi8skghjwildkwdpiyslibz4xw/widzwugvulqwuuxki1pwhik1qmcvzgdzel9b9ip10f1rtqjl2jgbh3r+vlpdgwuauwted8vc4igwh4jlihnvelcmz9kvdjputiefhfwdi32dmlnag5eorrro/l0rmpj+qsbjqrjo/dzbjf/fhg3wh0zbansgah57ud1iioaupqdt4ycjxcbbubpprztpmgmc0x/leaxhmhulmj6gsuhfdjfhbrnjjkbnywixfrob0xuuhdpegg+bnqnhjusyqjr1d2umhzun/ljwv55tpxjftcqzdkfbehgt6ai4vdkg8xbmy562w27vclech/bkold+ftc9+tjuey+8jk/ig1ks9+sinjaz0iecspknfcc/sk+zza6z5a0167q1l8hgzf9/a5t+5qk=

More information

arxiv: v1 [stat.ml] 24 May 2018

arxiv: v1 [stat.ml] 24 May 2018 Primal-Dual Wasserstein GAN Mevlana Gemici arxiv:805.09575v [stat.ml] 24 May 208 mevlana.gemici@gmail.com Zeynep Akata Max Welling zeynepakata@gmail.com welling.max@gmail.com Abstract We introduce Primal-Dual

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

The Success of Deep Generative Models

The Success of Deep Generative Models The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability

More information

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Topics: consistent estimators; sub-σ-fields and partial observations; Doob s theorem about sub-σ-field measurability;

More information

arxiv: v1 [cs.lg] 6 Dec 2018

arxiv: v1 [cs.lg] 6 Dec 2018 Embedding-reparameterization procedure for manifold-valued latent variables in generative models arxiv:1812.02769v1 [cs.lg] 6 Dec 2018 Eugene Golikov Neural Networks and Deep Learning Lab Moscow Institute

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Instructor: Dr. Volkan Cevher. 1. Background

Instructor: Dr. Volkan Cevher. 1. Background Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

MMD GAN 1 Fisher GAN 2

MMD GAN 1 Fisher GAN 2 MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Variational Auto Encoders

Variational Auto Encoders Variational Auto Encoders 1 Recap Deep Neural Models consist of 2 primary components A feature extraction network Where important aspects of the data are amplified and projected into a linearly separable

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

B PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018

B PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018 A ADVERSARIAL DOMAIN ADAPTATION (ADA ADA aims to transfer prediction nowledge learned from a source domain with labeled data to a target domain without labels, by learning domain-invariant features. Let

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Density Estimation under Independent Similarly Distributed Sampling Assumptions

Density Estimation under Independent Similarly Distributed Sampling Assumptions Density Estimation under Independent Similarly Distributed Sampling Assumptions Tony Jebara, Yingbo Song and Kapil Thadani Department of Computer Science Columbia University ew York, Y 7 { jebara,yingbo,kapil

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Algorithmic Game Theory and Applications. Lecture 5: Introduction to Linear Programming

Algorithmic Game Theory and Applications. Lecture 5: Introduction to Linear Programming Algorithmic Game Theory and Applications Lecture 5: Introduction to Linear Programming Kousha Etessami real world example : the diet problem You are a fastidious eater. You want to make sure that every

More information

arxiv: v1 [cs.lg] 20 Apr 2017

arxiv: v1 [cs.lg] 20 Apr 2017 Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).

More information

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Variational Autoencoders Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Contents 1. Why unsupervised learning, and why generative models? (Selected slides from Yann

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Ambiguity Sets and their applications to SVM

Ambiguity Sets and their applications to SVM Ambiguity Sets and their applications to SVM Ammon Washburn University of Arizona April 22, 2016 Ammon Washburn (University of Arizona) Ambiguity Sets April 22, 2016 1 / 25 Introduction Go over some (very

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Distirbutional robustness, regularizing variance, and adversaries

Distirbutional robustness, regularizing variance, and adversaries Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November

More information

arxiv: v7 [cs.lg] 27 Jul 2018

arxiv: v7 [cs.lg] 27 Jul 2018 How Generative Adversarial Networks and Their Variants Work: An Overview of GAN Yongjun Hong, Uiwon Hwang, Jaeyoon Yoo and Sungroh Yoon Department of Electrical & Computer Engineering Seoul National University,

More information

Are You a Bayesian or a Frequentist?

Are You a Bayesian or a Frequentist? Are You a Bayesian or a Frequentist? Michael I. Jordan Department of EECS Department of Statistics University of California, Berkeley http://www.cs.berkeley.edu/ jordan 1 Statistical Inference Bayesian

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

arxiv: v1 [cs.lg] 8 Dec 2016

arxiv: v1 [cs.lg] 8 Dec 2016 Improved generator objectives for GANs Ben Poole Stanford University poole@cs.stanford.edu Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova Google Brain {alemi, jaschasd, anelia}@google.com arxiv:1612.02780v1

More information

arxiv: v3 [cs.lg] 2 Nov 2018

arxiv: v3 [cs.lg] 2 Nov 2018 PacGAN: The power of two samples in generative adversarial networks Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh Carnegie Mellon University, University of Illinois at Urbana-Champaign arxiv:72.486v3

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

Energy-Based Generative Adversarial Network

Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information