Understanding GANs: Back to the basics

Size: px

Start display at page:

Download "Understanding GANs: Back to the basics"

Percival Lambert
5 years ago
Views:

1 Understanding GANs: Back to the basics David Tse Stanford University Princeton University May 15, 2018 Joint work with Soheil Feizi, Farzan Farnia, Tony Ginart, Changho Suh and Fei Xia.

2 GANs at NIPS 2017

3 Generative Adversarial Networks (Goodfellow et al. 2014) x i 2< r, y i, ŷ i 2< d x i N(0, I r ) data fake data Discriminator randomness Generator min max parameters of generator parameters of discriminator

4 GAN architectures v MMD-GAN (Dziugaite et al. 2015) v DC-GAN (Radford et al. 2015) v Least Squares GAN (Mao et al. 2016) v f-gan (Nowozin et al. 2016) v Wasserstein GAN (Arjovsky et al. 2017) v Wasserstein GAN with gradient penalty (Gulrajani et al. 2017) v Relaxed Wasserstein GAN (Guo et al. 2017) v Introspective GAN (Lazarow et al. 2017) v Boundary equilibrium GAN (Berthelot et al. 2017) v Loss-sensitive GAN (Qi, 2017) v Convolutional GAN (Yang et al. 2017) v Dual GAN (Yi et al, 2017) v Triangle GAN (Gan et al. 2017) v Multi-generator GAN (Hoang et al. 2017), just some examples

5 How GANs are designed Model-free Evaluation primarily on real data.

6 Back to the Basics What is the simplest high dimensional model to learn? Can state-of-the-art GAN architectures learn Gaussians?

7 GAN architectures v MMD-GAN (Dziugaite et al. 2015) v DC-GAN (Radford et al. 2015) v Least Squares GAN (Mao et al. 2016) v f-gan (Nowozin et al. 2016) v Wasserstein GAN (Arjovsky et al. 2017) v Wasserstein GAN with gradient penalty(gulrajani et al. 2017) v Relaxed Wasserstein GAN (Guo et al. 2017) v Introspective GAN (Lazarow et al. 2017) v Boundary equilibrium GAN (Berthelot et al. 2017) v Loss-sensitive GAN (Qi, 2017) v Convolutional GAN (Yang et al. 2017) v Dual GAN (Yi et al, 2017) v Triangle GAN (Gan et al. 2017) v Multi-generator GAN (Hoang et al. 2017),

8 Learning 32-dim N(0,I) WGAN Gradient Penalty: Adam(1e-4,0.5,0.9) WGAN Weight Clipping: RMSProp(1e-5)

9 ReLU à ELU Clevert et al, Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), ICLR 2016.

10 DNN àlinear generator

11 Model-based Approach What is the best GAN architecture for learning Gaussians? Absolute stupidity?

12 Answer: Quadratic GAN data Discriminator Linear Generator fake data randomness

13 Performance r=32

14 Rest of the talk How did we design this GAN? What are its theoretical guarantees?

15 Design approach Good population solution à initial formulation Fast generalization à final formulation Global stability under gradient descent à for free

16 Formulating GANs Unsupervised learning Given: Given: Supervised learning generator class G predictor class G Find: loss function ` Find: loss function ` How to formulate? How to formulate? G* also solves an unsupervised learning problem.

17 Reduction: unsupervised to supervised data generated randomness coupling GAN under loss ` supervised learner under loss ` Solve the easiest supervised problem.

18 Unsupervised and supervised Supervised learning: feature vector x, label y Unsupervised learning: Classical connection: feature vector x without label y Our connection: label y without feature vector x

19 GAN formulation: general loss Given: generator class G loss function ` data Q Y, randomness Q X Solve: optimal transport Monge Kantorovich ( ) ( ) Ŷ = G(X)

20 Dual formulation Primal: Kantorovich Dual: data Discriminator Generator fake data randomness

21 Wasserstein GAN (Arjovsky et al. 2017) Solve: Special case of general formulation:

22 What is the right loss for learning Gaussians? Gauss Legendre Wiener Kalman Quadratic loss

23 Quadratic GAN 1.0 Primal: Dual: data Discriminator Generator fake data randomness convex conjugate

24 Population Solution Theorem: When r = d, population solution is the ground truth N(0, K). When r < d, population solution is the r-pca.

25 Issues with architecture data Discriminator Generator fake data randomness Not computational. Poor generalization. (exponentially-many samples)

26 Why poor generalization? Optimal transport (Wasserstein) space (Arora et al. 2017) Idea: properly constraining the discriminator v Original optimal population solution to remain optimal v No spurious solutions

27 Quadratic GAN 2.0 data Discriminator Linear Generator fake data randomness Theorem: min G max H Tr h I HH t ˆKY (HH t I GG ti r = d: G* = maximum-likelihood r < d: G* = empirical PCA Fast generalization : linear # of samples

28 Stability Analyzing alternating gradient-descent as a nonlinear dynamical system. (G*,H*) Q: Does alternating gradient descent converge to G*?

Global Stability Sample trajectory 1 Sample trajectory 2 Frobenius distance to G* Frobenius distance to G* Lyapunov function Lyapunov function

29 Global Stability Sample trajectory 1 Sample trajectory 2 Frobenius distance to G* Frobenius distance to G* Lyapunov function Lyapunov function Theorem If K Y =I and r = d, alternating gradient descent converges to G* from any starting point.. Conjecture: Quadratic GAN is globally stable in general.

30 What have we learnt? How to formulate a GAN objective to match a model. How to constrain the discriminator to enable fast generalization. How global stability of GD can follow from a properly designed architecture.

Understanding GANs: the LQG Setting

Understanding GANs: the LQG Setting Soheil Feizi 1, Changho Suh 2, Fei Xia 1 and David Tse 1 1 Stanford University 2 Korea Advanced Institute of Science and Technology arxiv:1710.10793v1 [stat.ml] 30 Oct