Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018

Size: px
Start display at page:

Download "Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018"

Transcription

1 Some theoretical properties of GANs Gérard Biau Toulouse, September 2018

2 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1

3 video Source: Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation, ICLR

4 Source: 3

5 video Source: 4

6 Original paper Generative Adversarial Nets Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Département d informatique et de recherche opérationnelle Université de Montréal Montréal, QC H3C 3J7 Abstract We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1 2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples. 1 Introduction The promise of deep learning is to discover rich, hierarchical models [2] that represent probability distributions over the kinds of data encountered in artificial intelligence applications, such as natural images, audio waveforms containing speech, and symbols in natural language corpora. So far, the most striking successes in deep learning have involved discriminative models, usually those that map a high-dimensional, rich sensory input to a class label [14, 20]. These striking successes have primarily been based on the backpropagation and dropout algorithms, using piecewise linear units [17, 8, 9] which have a particularly well-behaved gradient. Deep generative models have had less of an impact, due to the difficulty of approximating many intractable probabilistic computations that arise in maximum likelihood estimation and related strategies, and due to difficulty of leveraging the benefits of piecewise linear units in the generative context. We propose a new generative model estimation procedure that sidesteps these difficulties. 1 In the proposed adversarial nets framework, the generative model is pitted against an adversary: a discriminative model that learns to determine whether a sample is from the model distribution or the data distribution. The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistiguishable from the genuine articles. Ian Goodfellow is now a research scientist at Google, but did this work earlier as a UdeM student Jean Pouget-Abadie did this work while visiting Université de Montréal from Ecole Polytechnique. Sherjil Ozair is visiting Université de Montréal from Indian Institute of Technology Delhi Yoshua Bengio is a CIFAR Senior Fellow. 1 All code and hyperparameters available at 1 5

7 Source: 6

8 Outline 1. Mathematical context of GANs 2. Optimality properties 3. Approximation properties 4. Statistical analysis 7

9 Mathematical context of GANs

10 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. 8

11 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. 8

12 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. Principle: Ω Z R d G θ E. By definition, G θ (Z) L = p θ dµ. 8

13 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. Principle: Ω Z R d G θ E. By definition, G θ (Z) L = p θ dµ. Associated family of densities: P = {p θ } θ Θ. 8

14 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. Principle: Ω Z R d G θ E. By definition, G θ (Z) L = p θ dµ. Associated family of densities: P = {p θ } θ Θ. Each density p θ is a potential candidate to represent p. 8

15 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. 9

16 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. 9

17 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. 9

18 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. Any strategy based on nonparametric density estimation techniques. 9

19 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. Any strategy based on nonparametric density estimation techniques. It is not assumed that p belongs to P. 9

20 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. Any strategy based on nonparametric density estimation techniques. It is not assumed that p belongs to P. Think parametric but forget classical rules. 9

21 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. 10

22 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. 10

23 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. In GANs algorithms, D is a neural network. 10

24 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. In GANs algorithms, D is a neural network. The higher D(x), the higher the probability that x is drawn from p. 10

25 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. In GANs algorithms, D is a neural network. The higher D(x), the higher the probability that x is drawn from p. Generators and discriminators have opposite objectives. Source: 10

26 Source: 11

27 Adversarial principle Auxiliary random variables: Z 1,..., Z n, i.i.d. according to Z. 12

28 Adversarial principle Auxiliary random variables: Z 1,..., Z n, i.i.d. according to Z. Objective: solve in θ inf θ Θ sup D D [ n D(X i ) i=1 n ] (1 D G θ (Z i )). i=1 12

29 Adversarial principle Auxiliary random variables: Z 1,..., Z n, i.i.d. according to Z. Objective: solve in θ inf θ Θ sup D D [ n D(X i ) i=1 Equivalently: find ˆθ Θ such that n ] (1 D G θ (Z i )). i=1 sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ, D D D D where ˆL(θ, D) def = n ln D(X i ) + i=1 n ln(1 D G θ (Z i )). i=1 12

30 Some comments The criterion ˆL(θ, D) is the original one. 13

31 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. 13

32 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: 13

33 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? 13

34 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? Impact of the discriminators on the quality of the approximation? 13

35 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? Impact of the discriminators on the quality of the approximation? Statistical consistency? rates of convergence? 13

36 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? Impact of the discriminators on the quality of the approximation? Statistical consistency? rates of convergence? Play with simple examples. 13

37 Optimality properties

38 Reminders For P Q probability measures on E, D KL (P Q) = ln dp dq dp. Properties: D KL (P Q) 0 and D KL (P Q) = 0 P = Q. 14

39 Reminders For P Q probability measures on E, D KL (P Q) = ln dp dq dp. Properties: D KL (P Q) 0 and D KL (P Q) = 0 P = Q. If p = dp dµ dq and q = dµ, then D KL (P Q) = p ln p q dµ. 14

40 Reminders For P Q probability measures on E, D KL (P Q) = ln dp dq dp. Properties: D KL (P Q) 0 and D KL (P Q) = 0 P = Q. If p = dp dµ dq and q = dµ, then D KL (P Q) = p ln p q dµ. For P and Q probability measures on E, D JS (P, Q) = 1 ( 2 D KL P P + Q 2 ) + 1 ( 2 D KL Q P + Q Properties: 0 D JS (P, Q) ln 2 and D JS (P, Q) is a distance. 2 ). 14

41 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. 15

42 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. 15

43 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), 15

44 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ 15

45 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ optimal discriminator. 15

46 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ optimal discriminator. Conclusion: sup D D L(θ, D) = L(θ, D θ ) 15

47 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ optimal discriminator. Conclusion: sup D D L(θ, D) = L(θ, D θ ) = 2D JS(p, p θ ) ln 4. 15

48 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. 16

49 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. A possible interpretation: minimize D JS (p, p θ ) over Θ. 16

50 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. A possible interpretation: minimize D JS (p, p θ ) over Θ. Many GANs algorithms try to learn the optimal discriminator D θ. 16

51 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. A possible interpretation: minimize D JS (p, p θ ) over Θ. Many GANs algorithms try to learn the optimal discriminator D θ. Theorem Let θ Θ be such that p θ > 0 µ-almost everywhere. Then the function Dθ is the unique discriminator that achieves the supremum of the functional D L(θ, D) over D. 16

52 Oracle parameter By definition, for all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. 17

53 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. 17

54 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. θ is the oracle parameter in terms of Jensen-Shannon divergence. 17

55 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. θ is the oracle parameter in terms of Jensen-Shannon divergence. Whenever p P, then p = p θ, D JS (p, p θ ) = 0, and D θ = 1/2. This is, however, a very special case, which is of no interest. 17

56 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. θ is the oracle parameter in terms of Jensen-Shannon divergence. Whenever p P, then p = p θ, D JS (p, p θ ) = 0, and D θ = 1/2. This is, however, a very special case, which is of no interest. We need sufficient conditions for the existence and unicity of θ. 17

57 Oracle parameter Theorem Assume that the model {P θ } θ Θ is identifiable, convex, and compact for the metric δ. Assume, in addition, that there exist 0 < m M such that m p M and, for all θ Θ, p θ M. Then there exists a unique θ Θ such that {θ } =arg min θ Θ D JS (p, p θ ). 18

58 Oracle parameter Theorem Assume that the model {P θ } θ Θ is identifiable, convex, and compact for the metric δ. Assume, in addition, that there exist 0 < m M such that m p M and, for all θ Θ, p θ M. Then there exists a unique θ Θ such that {θ } =arg min θ Θ D JS (p, p θ ). Compactness conditions (i) Θ is compact and {P θ } θ Θ is convex. (ii) For all x E, the function θ p θ (x) is continuous on Θ. (iii) One has sup (θ,θ ) Θ p 2 θ ln p θ L 1 (µ). 18

59 Approximation properties

60 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. 19

61 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. 19

62 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. It is logical to consider the parameter θ Θ defined by sup L( θ, D) sup L(θ, D), θ Θ. D D D D 19

63 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. It is logical to consider the parameter θ Θ defined by sup L( θ, D) sup L(θ, D), θ Θ. D D D D The density p θ is the best candidate to imitate p θ. 19

64 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. It is logical to consider the parameter θ Θ defined by sup L( θ, D) sup L(θ, D), θ Θ. D D D D The density p θ is the best candidate to imitate p θ. Objective: quantify the proximity between p θ and p θ. 19

65 Approximation theorem Assumption (H 0 ) There exists a positive constant t (0, 1/2] such that min(d θ, 1 D θ ) t, θ Θ. 20

66 Approximation theorem Assumption (H 0 ) There exists a positive constant t (0, 1/2] such that min(d θ, 1 D θ ) t, θ Θ. Assumption (H ε ) There exists ε (0, t) and D D such that D D θ ε. 20

67 Approximation theorem Assumption (H 0 ) There exists a positive constant t (0, 1/2] such that min(d θ, 1 D θ ) t, θ Θ. Assumption (H ε ) There exists ε (0, t) and D D such that D D θ ε. Theorem Under Assumptions (H 0 ) and (H ε ), there exists a positive constant c such that 0 D JS (p, p θ) D JS (p, p θ ) cε 2. 20

68 Statistical analysis

69 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. 21

70 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D 21

71 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that Questions: sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D Can we say something about D JS(p, p ˆθ) D JS(p, p θ )? 21

72 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that Questions: sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D Can we say something about D JS(p, p ˆθ) D JS(p, p θ )? Convergence of ˆθ towards θ as n? 21

73 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that Questions: sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D Can we say something about D JS(p, p ˆθ) D JS(p, p θ )? Convergence of ˆθ towards θ as n? Asymptotic distribution of ˆθ θ? 21

74 Jensen-Shannon convergence Assumptions (H reg ) Regularity conditions of order 1 on the models. Assumption (H ε) There exists ε (0, t) such that: for all θ Θ, there exists D D such that D D θ ε. 22

75 Jensen-Shannon convergence Assumptions (H reg ) Regularity conditions of order 1 on the models. Assumption (H ε) There exists ε (0, t) such that: for all θ Θ, there exists D D such that D D θ ε. Theorem Under Assumptions (H 0 ), (H reg ), and (H ε), one has ( ED JS (p, pˆθ ) D JS(p, p θ ) = O ε ). n 22

76 Illustration p (x) = e x/s, s = s(1+e x/s ) 2 G and D: fully connected neural networks. n = is fixed and the number of layers varies. 23

77 Illustration p (x) = e x/s, s = s(1+e x/s ) 2 G and D: fully connected neural networks. n = is fixed and the number of layers varies. 23

78 Discriminator depth = 2, generator depth = 3 24

79 Discriminator depth = 5, generator depth = 3 25

80 video 26

81 Convergence of ˆθ Assumptions (H reg) Regularity conditions of order 2 on the models. Assumption (H 1 ) The pair ( θ, ᾱ) is unique and belongs to Θ Λ. 27

82 Convergence of ˆθ Assumptions (H reg) Regularity conditions of order 2 on the models. Assumption (H 1 ) The pair ( θ, ᾱ) is unique and belongs to Θ Λ. Theorem Under Assumptions (H reg) and (H 1 ), one has ˆθ θ almost surely and ˆα ᾱ almost surely. 27

83 Targets p 28

84 Generators and discriminators Model p P = {p θ } θ Θ D = {D α } α Λ x 1 1 Laplace-Gaussian 2b e b 2πθ e x2 2θ α 1 α e x2 2 (α 2 1 α 2 0 ) 0 b = 1.5 Θ = [10 1, 10 3 ] Λ = Θ Θ Claw-Gaussian p claw (x) 1 2πθ e x2 2θ α 1 α e x2 2 (α 2 1 α 2 0 ) 0 Θ = [10 1, 10 3 ] Λ = Θ Θ Exponential-Uniform λe λx 1 θ 1 1 [0,θ](x) 1+ α 1 α e x2 2 (α 2 1 α 2 0 ) 0 λ = 1 Θ = [10 3, 10 3 ] Λ = Θ Θ 29

85 Laplace-Gaussian 30

86 Laplace-Gaussian 31

87 Claw-Gaussian 32

88 Claw-Gaussian 33

89 Exponential-Uniform 34

90 Exponential-Uniform 35

91 Central limit theorem Assumptions (H loc ) Local smoothness conditions around ( θ, ᾱ). 36

92 Central limit theorem Assumptions (H loc ) Local smoothness conditions around ( θ, ᾱ). Theorem Under Assumptions (H reg), (H 1 ), and (H loc ), one has L n(ˆθ θ) Z, where Z is a Gaussian random variable with mean 0 and variance V. 36

93 Laplace-Gaussian 37

94 Claw-Gaussian 38

95 Exponential-Uniform 39

arxiv: v1 [stat.ml] 21 Mar 2018

arxiv: v1 [stat.ml] 21 Mar 2018 Some Theoretical Properties of GANs arxiv:1803.07819v1 [stat.ml] 21 Mar 2018 G. Biau Sorbonne Université, CNRS, LPSM Paris, France gerard.biau@upmc.fr M. Sangnier Sorbonne Université, CNRS, LPSM Paris,

More information

GENERATIVE ADVERSARIAL LEARNING

GENERATIVE ADVERSARIAL LEARNING GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate

More information

arxiv: v1 [cs.lg] 20 Apr 2017

arxiv: v1 [cs.lg] 20 Apr 2017 Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).

More information

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a

More information

Importance Reweighting Using Adversarial-Collaborative Training

Importance Reweighting Using Adversarial-Collaborative Training Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

GANs, GANs everywhere

GANs, GANs everywhere GANs, GANs everywhere particularly, in High Energy Physics Maxim Borisyak Yandex, NRU Higher School of Economics Generative Generative models Given samples of a random variable X find X such as: P X P

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks SIBGRAPI 2017 Tutorial Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask Presentation content inspired by Ian Goodfellow s tutorial

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

Generative Adversarial Networks. Presented by Yi Zhang

Generative Adversarial Networks. Presented by Yi Zhang Generative Adversarial Networks Presented by Yi Zhang Deep Generative Models N(O, I) Variational Auto-Encoders GANs Unreasonable Effectiveness of GANs GANs Discriminator tries to distinguish genuine data

More information

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test raining enerative Adversarial Networks Via uring est Jianlin Su School of Mathematics Sun Yat-sen University uangdong, China bojone@spaces.ac.cn Abstract In this article, we introduce a new mode for training

More information

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation

More information

Negative Momentum for Improved Game Dynamics

Negative Momentum for Improved Game Dynamics Negative Momentum for Improved Game Dynamics Gauthier Gidel Reyhane Askari Hemmat Mohammad Pezeshki Gabriel Huang Rémi Lepriol Simon Lacoste-Julien Ioannis Mitliagkas Mila & DIRO, Université de Montréal

More information

Singing Voice Separation using Generative Adversarial Networks

Singing Voice Separation using Generative Adversarial Networks Singing Voice Separation using Generative Adversarial Networks Hyeong-seok Choi, Kyogu Lee Music and Audio Research Group Graduate School of Convergence Science and Technology Seoul National University

More information

arxiv: v1 [cs.lg] 8 Dec 2016

arxiv: v1 [cs.lg] 8 Dec 2016 Improved generator objectives for GANs Ben Poole Stanford University poole@cs.stanford.edu Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova Google Brain {alemi, jaschasd, anelia}@google.com arxiv:1612.02780v1

More information

arxiv: v1 [cs.lg] 6 Dec 2018

arxiv: v1 [cs.lg] 6 Dec 2018 Embedding-reparameterization procedure for manifold-valued latent variables in generative models arxiv:1812.02769v1 [cs.lg] 6 Dec 2018 Eugene Golikov Neural Networks and Deep Learning Lab Moscow Institute

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Emily Denton 1, Soumith Chintala 2, Arthur Szlam 2, Rob Fergus 2 1 New York University 2 Facebook AI Research Denotes equal

More information

Adversarial Sequential Monte Carlo

Adversarial Sequential Monte Carlo Adversarial Sequential Monte Carlo Kira Kempinska Department of Security and Crime Science University College London London, WC1E 6BT kira.kowalska.13@ucl.ac.uk John Shawe-Taylor Department of Computer

More information

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group

More information

AdaGAN: Boosting Generative Models

AdaGAN: Boosting Generative Models AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo

More information

Generative adversarial networks

Generative adversarial networks 14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Notes on Adversarial Examples

Notes on Adversarial Examples Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking

More information

The power of two samples in generative adversarial networks

The power of two samples in generative adversarial networks The power of two samples in generative adversarial networks Sewoong Oh Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign / 2 Generative models learn

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Improving Visual Semantic Embedding By Adversarial Contrastive Estimation

Improving Visual Semantic Embedding By Adversarial Contrastive Estimation Improving Visual Semantic Embedding By Adversarial Contrastive Estimation Huan Ling Department of Computer Science University of Toronto huan.ling@mail.utoronto.ca Avishek Bose Department of Electrical

More information

Notes on Noise Contrastive Estimation (NCE)

Notes on Noise Contrastive Estimation (NCE) Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of

More information

Summary of A Few Recent Papers about Discrete Generative models

Summary of A Few Recent Papers about Discrete Generative models Summary of A Few Recent Papers about Discrete Generative models Presenter: Ji Gao Department of Computer Science, University of Virginia https://qdata.github.io/deep2read/ Outline SeqGAN BGAN: Boundary

More information

Experiments on the Consciousness Prior

Experiments on the Consciousness Prior Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

arxiv: v3 [stat.ml] 20 Feb 2018

arxiv: v3 [stat.ml] 20 Feb 2018 MANY PATHS TO EQUILIBRIUM: GANS DO NOT NEED TO DECREASE A DIVERGENCE AT EVERY STEP William Fedus 1, Mihaela Rosca 2, Balaji Lakshminarayanan 2, Andrew M. Dai 1, Shakir Mohamed 2 and Ian Goodfellow 1 1

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

arxiv: v1 [cs.lg] 10 Jun 2016

arxiv: v1 [cs.lg] 10 Jun 2016 Deep Directed Generative Models with Energy-Based Probability Estimation arxiv:1606.03439v1 [cs.lg] 10 Jun 2016 Taesup Kim, Yoshua Bengio Department of Computer Science and Operations Research Université

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Gradient descent GAN optimization is locally stable

Gradient descent GAN optimization is locally stable Gradient descent GAN optimization is locally stable Advances in Neural Information Processing Systems, 2017 Vaishnavh Nagarajan J. Zico Kolter Carnegie Mellon University 05 January 2018 Presented by: Kevin

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward

More information

A Unified View of Deep Generative Models

A Unified View of Deep Generative Models SAILING LAB Laboratory for Statistical Artificial InteLigence & INtegreative Genomics A Unified View of Deep Generative Models Zhiting Hu and Eric Xing Petuum Inc. Carnegie Mellon University 1 Deep generative

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Notes on policy gradients and the log derivative trick for reinforcement learning

Notes on policy gradients and the log derivative trick for reinforcement learning Notes on policy gradients and the log derivative trick for reinforcement learning David Meyer dmm@{1-4-5.net,uoregon.edu,brocade.com,...} June 3, 2016 1 Introduction The log derivative trick 1 is a widely

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

arxiv: v1 [cs.sd] 30 Oct 2017

arxiv: v1 [cs.sd] 30 Oct 2017 GENERATIVE ADVERSARIAL SOURCE SEPARATION Y.Cem Subaan, Paris Smaragdis, UIUC, Adobe Systems {subaan, paris}@illinois.edu arxiv:17.779v1 [cs.sd] 30 Oct 017 ABSTRACT Generative source separation methods

More information

arxiv: v1 [cs.lg] 28 Dec 2017

arxiv: v1 [cs.lg] 28 Dec 2017 PixelSNAIL: An Improved Autoregressive Generative Model arxiv:1712.09763v1 [cs.lg] 28 Dec 2017 Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel Embodied Intelligence UC Berkeley, Department of

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Training generative neural networks via Maximum Mean Discrepancy optimization

Training generative neural networks via Maximum Mean Discrepancy optimization Training generative neural networks via Maximum Mean Discrepancy optimization Gintare Karolina Dziugaite University of Cambridge Daniel M. Roy University of Toronto Zoubin Ghahramani University of Cambridge

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

arxiv: v2 [cs.lg] 21 Aug 2018

arxiv: v2 [cs.lg] 21 Aug 2018 CoT: Cooperative Training for Generative Modeling of Discrete Data arxiv:1804.03782v2 [cs.lg] 21 Aug 2018 Sidi Lu Shanghai Jiao Tong University steve_lu@apex.sjtu.edu.cn Weinan Zhang Shanghai Jiao Tong

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Energy-Based Generative Adversarial Network

Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Parametric Inverse Simulation

Parametric Inverse Simulation Parametric Inverse Simulation Zenna Tavares MIT zenna@mit.edu Armando Solar Lezama MIT asolar@csail.mit.edu Abstract We introduce the theory of parametric inversion, which generalizes function inversion

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

Provable Non-Convex Min-Max Optimization

Provable Non-Convex Min-Max Optimization Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The

More information

Variational Autoencoders for Classical Spin Models

Variational Autoencoders for Classical Spin Models Variational Autoencoders for Classical Spin Models Benjamin Nosarzewski Stanford University bln@stanford.edu Abstract Lattice spin models are used to study the magnetic behavior of interacting electrons

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

arxiv: v1 [stat.ml] 2 Sep 2014

arxiv: v1 [stat.ml] 2 Sep 2014 On the Equivalence Between Deep NADE and Generative Stochastic Networks Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université de Montréal

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Generative Kernel PCA

Generative Kernel PCA ESA 8 proceedings, European Symposium on Artificial eural etworks, Computational Intelligence and Machine Learning. Bruges (Belgium), 5-7 April 8, i6doc.com publ., ISB 978-8758747-6. Generative Kernel

More information

arxiv: v1 [cs.lg] 22 Mar 2016

arxiv: v1 [cs.lg] 22 Mar 2016 Information Theoretic-Learning Auto-Encoder arxiv:1603.06653v1 [cs.lg] 22 Mar 2016 Eder Santana University of Florida Jose C. Principe University of Florida Abstract Matthew Emigh University of Florida

More information

f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

f-gan: Training Generative Neural Samplers using Variational Divergence Minimization f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin,

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

arxiv: v3 [cs.lg] 2 Nov 2018

arxiv: v3 [cs.lg] 2 Nov 2018 PacGAN: The power of two samples in generative adversarial networks Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh Carnegie Mellon University, University of Illinois at Urbana-Champaign arxiv:72.486v3

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information