Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018
|
|
- Valentine Daniels
- 5 years ago
- Views:
Transcription
1 Some theoretical properties of GANs Gérard Biau Toulouse, September 2018
2 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1
3 video Source: Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation, ICLR
4 Source: 3
5 video Source: 4
6 Original paper Generative Adversarial Nets Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Département d informatique et de recherche opérationnelle Université de Montréal Montréal, QC H3C 3J7 Abstract We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1 2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples. 1 Introduction The promise of deep learning is to discover rich, hierarchical models [2] that represent probability distributions over the kinds of data encountered in artificial intelligence applications, such as natural images, audio waveforms containing speech, and symbols in natural language corpora. So far, the most striking successes in deep learning have involved discriminative models, usually those that map a high-dimensional, rich sensory input to a class label [14, 20]. These striking successes have primarily been based on the backpropagation and dropout algorithms, using piecewise linear units [17, 8, 9] which have a particularly well-behaved gradient. Deep generative models have had less of an impact, due to the difficulty of approximating many intractable probabilistic computations that arise in maximum likelihood estimation and related strategies, and due to difficulty of leveraging the benefits of piecewise linear units in the generative context. We propose a new generative model estimation procedure that sidesteps these difficulties. 1 In the proposed adversarial nets framework, the generative model is pitted against an adversary: a discriminative model that learns to determine whether a sample is from the model distribution or the data distribution. The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistiguishable from the genuine articles. Ian Goodfellow is now a research scientist at Google, but did this work earlier as a UdeM student Jean Pouget-Abadie did this work while visiting Université de Montréal from Ecole Polytechnique. Sherjil Ozair is visiting Université de Montréal from Indian Institute of Technology Delhi Yoshua Bengio is a CIFAR Senior Fellow. 1 All code and hyperparameters available at 1 5
7 Source: 6
8 Outline 1. Mathematical context of GANs 2. Optimality properties 3. Approximation properties 4. Statistical analysis 7
9 Mathematical context of GANs
10 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. 8
11 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. 8
12 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. Principle: Ω Z R d G θ E. By definition, G θ (Z) L = p θ dµ. 8
13 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. Principle: Ω Z R d G θ E. By definition, G θ (Z) L = p θ dµ. Associated family of densities: P = {p θ } θ Θ. 8
14 Generators Data: X 1,..., X n, i.i.d. according to some unknown density p on E R d. Generators: a parametric family of functions from R d to E (d d). Notation: G = {G θ } θ Θ, Θ R p. Principle: Ω Z R d G θ E. By definition, G θ (Z) L = p θ dµ. Associated family of densities: P = {p θ } θ Θ. Each density p θ is a potential candidate to represent p. 8
15 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. 9
16 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. 9
17 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. 9
18 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. Any strategy based on nonparametric density estimation techniques. 9
19 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. Any strategy based on nonparametric density estimation techniques. It is not assumed that p belongs to P. 9
20 A specific framework In GANs algorithms, G = {G θ } θ Θ is a neural network. Bad ideas: Exhaustive description of p by a classical parametric model. Estimation by a traditional maximum likelihood approach. Any strategy based on nonparametric density estimation techniques. It is not assumed that p belongs to P. Think parametric but forget classical rules. 9
21 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. 10
22 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. 10
23 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. In GANs algorithms, D is a neural network. 10
24 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. In GANs algorithms, D is a neural network. The higher D(x), the higher the probability that x is drawn from p. 10
25 Discriminators Discriminators: a family of functions from E to [0, 1]. Notation: D. Often (but not always): D = {D α } α Λ, Λ R q. In GANs algorithms, D is a neural network. The higher D(x), the higher the probability that x is drawn from p. Generators and discriminators have opposite objectives. Source: 10
26 Source: 11
27 Adversarial principle Auxiliary random variables: Z 1,..., Z n, i.i.d. according to Z. 12
28 Adversarial principle Auxiliary random variables: Z 1,..., Z n, i.i.d. according to Z. Objective: solve in θ inf θ Θ sup D D [ n D(X i ) i=1 n ] (1 D G θ (Z i )). i=1 12
29 Adversarial principle Auxiliary random variables: Z 1,..., Z n, i.i.d. according to Z. Objective: solve in θ inf θ Θ sup D D [ n D(X i ) i=1 Equivalently: find ˆθ Θ such that n ] (1 D G θ (Z i )). i=1 sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ, D D D D where ˆL(θ, D) def = n ln D(X i ) + i=1 n ln(1 D G θ (Z i )). i=1 12
30 Some comments The criterion ˆL(θ, D) is the original one. 13
31 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. 13
32 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: 13
33 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? 13
34 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? Impact of the discriminators on the quality of the approximation? 13
35 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? Impact of the discriminators on the quality of the approximation? Statistical consistency? rates of convergence? 13
36 Some comments The criterion ˆL(θ, D) is the original one. Many variants: MMD-GANs, f-gans, Wasserstein-GANs, scattering transforms, etc. They are based on different criteria galaxy of GANs. Our objective today: Roadmap: Basic properties of ˆL(θ, D)? Impact of the discriminators on the quality of the approximation? Statistical consistency? rates of convergence? Play with simple examples. 13
37 Optimality properties
38 Reminders For P Q probability measures on E, D KL (P Q) = ln dp dq dp. Properties: D KL (P Q) 0 and D KL (P Q) = 0 P = Q. 14
39 Reminders For P Q probability measures on E, D KL (P Q) = ln dp dq dp. Properties: D KL (P Q) 0 and D KL (P Q) = 0 P = Q. If p = dp dµ dq and q = dµ, then D KL (P Q) = p ln p q dµ. 14
40 Reminders For P Q probability measures on E, D KL (P Q) = ln dp dq dp. Properties: D KL (P Q) 0 and D KL (P Q) = 0 P = Q. If p = dp dµ dq and q = dµ, then D KL (P Q) = p ln p q dµ. For P and Q probability measures on E, D JS (P, Q) = 1 ( 2 D KL P P + Q 2 ) + 1 ( 2 D KL Q P + Q Properties: 0 D JS (P, Q) ln 2 and D JS (P, Q) is a distance. 2 ). 14
41 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. 15
42 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. 15
43 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), 15
44 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ 15
45 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ optimal discriminator. 15
46 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ optimal discriminator. Conclusion: sup D D L(θ, D) = L(θ, D θ ) 15
47 Role of the Jensen-Shannon divergence ˆL(θ, D) is the empirical version of L(θ, D) def = ln(d)p dµ + ln(1 D)p θ dµ. Idealization: D = D, the set of all functions from E to [0, 1]. In this case [ sup L(θ, D) = sup ln(d)p ] + ln(1 D)p θ dµ D D D D [ sup ln(d)p ] + ln(1 D)p θ dµ D D = L(θ, D θ ), where D θ def = p p + p θ optimal discriminator. Conclusion: sup D D L(θ, D) = L(θ, D θ ) = 2D JS(p, p θ ) ln 4. 15
48 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. 16
49 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. A possible interpretation: minimize D JS (p, p θ ) over Θ. 16
50 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. A possible interpretation: minimize D JS (p, p θ ) over Θ. Many GANs algorithms try to learn the optimal discriminator D θ. 16
51 Role of the Jensen-Shannon divergence For all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. A possible interpretation: minimize D JS (p, p θ ) over Θ. Many GANs algorithms try to learn the optimal discriminator D θ. Theorem Let θ Θ be such that p θ > 0 µ-almost everywhere. Then the function Dθ is the unique discriminator that achieves the supremum of the functional D L(θ, D) over D. 16
52 Oracle parameter By definition, for all θ Θ, sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. 17
53 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. 17
54 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. θ is the oracle parameter in terms of Jensen-Shannon divergence. 17
55 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. θ is the oracle parameter in terms of Jensen-Shannon divergence. Whenever p P, then p = p θ, D JS (p, p θ ) = 0, and D θ = 1/2. This is, however, a very special case, which is of no interest. 17
56 Oracle parameter By definition, for all θ Θ, We let θ Θ be defined as sup D D L(θ, D) = 2D JS (p, p θ ) ln 4. D JS (p, p θ ) D JS (p, p θ ), θ Θ. θ is the oracle parameter in terms of Jensen-Shannon divergence. Whenever p P, then p = p θ, D JS (p, p θ ) = 0, and D θ = 1/2. This is, however, a very special case, which is of no interest. We need sufficient conditions for the existence and unicity of θ. 17
57 Oracle parameter Theorem Assume that the model {P θ } θ Θ is identifiable, convex, and compact for the metric δ. Assume, in addition, that there exist 0 < m M such that m p M and, for all θ Θ, p θ M. Then there exists a unique θ Θ such that {θ } =arg min θ Θ D JS (p, p θ ). 18
58 Oracle parameter Theorem Assume that the model {P θ } θ Θ is identifiable, convex, and compact for the metric δ. Assume, in addition, that there exist 0 < m M such that m p M and, for all θ Θ, p θ M. Then there exists a unique θ Θ such that {θ } =arg min θ Θ D JS (p, p θ ). Compactness conditions (i) Θ is compact and {P θ } θ Θ is convex. (ii) For all x E, the function θ p θ (x) is continuous on Θ. (iii) One has sup (θ,θ ) Θ p 2 θ ln p θ L 1 (µ). 18
59 Approximation properties
60 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. 19
61 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. 19
62 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. It is logical to consider the parameter θ Θ defined by sup L( θ, D) sup L(θ, D), θ Θ. D D D D 19
63 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. It is logical to consider the parameter θ Θ defined by sup L( θ, D) sup L(θ, D), θ Θ. D D D D The density p θ is the best candidate to imitate p θ. 19
64 Parameterized discriminators The assumption D = D is comfortable but questionable. In practice, one has always D = {D α } α Λ, Λ R q. GANs are a likelihood-type problem involving two parametric families. It is logical to consider the parameter θ Θ defined by sup L( θ, D) sup L(θ, D), θ Θ. D D D D The density p θ is the best candidate to imitate p θ. Objective: quantify the proximity between p θ and p θ. 19
65 Approximation theorem Assumption (H 0 ) There exists a positive constant t (0, 1/2] such that min(d θ, 1 D θ ) t, θ Θ. 20
66 Approximation theorem Assumption (H 0 ) There exists a positive constant t (0, 1/2] such that min(d θ, 1 D θ ) t, θ Θ. Assumption (H ε ) There exists ε (0, t) and D D such that D D θ ε. 20
67 Approximation theorem Assumption (H 0 ) There exists a positive constant t (0, 1/2] such that min(d θ, 1 D θ ) t, θ Θ. Assumption (H ε ) There exists ε (0, t) and D D such that D D θ ε. Theorem Under Assumptions (H 0 ) and (H ε ), there exists a positive constant c such that 0 D JS (p, p θ) D JS (p, p θ ) cε 2. 20
68 Statistical analysis
69 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. 21
70 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D 21
71 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that Questions: sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D Can we say something about D JS(p, p ˆθ) D JS(p, p θ )? 21
72 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that Questions: sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D Can we say something about D JS(p, p ˆθ) D JS(p, p θ )? Convergence of ˆθ towards θ as n? 21
73 Context Parametric generators G = {G θ } θ Θ and discriminators D = {D α } α Λ. The data-dependent parameter ˆθ is such that Questions: sup ˆL(ˆθ, D) sup ˆL(θ, D), θ Θ. D D D D Can we say something about D JS(p, p ˆθ) D JS(p, p θ )? Convergence of ˆθ towards θ as n? Asymptotic distribution of ˆθ θ? 21
74 Jensen-Shannon convergence Assumptions (H reg ) Regularity conditions of order 1 on the models. Assumption (H ε) There exists ε (0, t) such that: for all θ Θ, there exists D D such that D D θ ε. 22
75 Jensen-Shannon convergence Assumptions (H reg ) Regularity conditions of order 1 on the models. Assumption (H ε) There exists ε (0, t) such that: for all θ Θ, there exists D D such that D D θ ε. Theorem Under Assumptions (H 0 ), (H reg ), and (H ε), one has ( ED JS (p, pˆθ ) D JS(p, p θ ) = O ε ). n 22
76 Illustration p (x) = e x/s, s = s(1+e x/s ) 2 G and D: fully connected neural networks. n = is fixed and the number of layers varies. 23
77 Illustration p (x) = e x/s, s = s(1+e x/s ) 2 G and D: fully connected neural networks. n = is fixed and the number of layers varies. 23
78 Discriminator depth = 2, generator depth = 3 24
79 Discriminator depth = 5, generator depth = 3 25
80 video 26
81 Convergence of ˆθ Assumptions (H reg) Regularity conditions of order 2 on the models. Assumption (H 1 ) The pair ( θ, ᾱ) is unique and belongs to Θ Λ. 27
82 Convergence of ˆθ Assumptions (H reg) Regularity conditions of order 2 on the models. Assumption (H 1 ) The pair ( θ, ᾱ) is unique and belongs to Θ Λ. Theorem Under Assumptions (H reg) and (H 1 ), one has ˆθ θ almost surely and ˆα ᾱ almost surely. 27
83 Targets p 28
84 Generators and discriminators Model p P = {p θ } θ Θ D = {D α } α Λ x 1 1 Laplace-Gaussian 2b e b 2πθ e x2 2θ α 1 α e x2 2 (α 2 1 α 2 0 ) 0 b = 1.5 Θ = [10 1, 10 3 ] Λ = Θ Θ Claw-Gaussian p claw (x) 1 2πθ e x2 2θ α 1 α e x2 2 (α 2 1 α 2 0 ) 0 Θ = [10 1, 10 3 ] Λ = Θ Θ Exponential-Uniform λe λx 1 θ 1 1 [0,θ](x) 1+ α 1 α e x2 2 (α 2 1 α 2 0 ) 0 λ = 1 Θ = [10 3, 10 3 ] Λ = Θ Θ 29
85 Laplace-Gaussian 30
86 Laplace-Gaussian 31
87 Claw-Gaussian 32
88 Claw-Gaussian 33
89 Exponential-Uniform 34
90 Exponential-Uniform 35
91 Central limit theorem Assumptions (H loc ) Local smoothness conditions around ( θ, ᾱ). 36
92 Central limit theorem Assumptions (H loc ) Local smoothness conditions around ( θ, ᾱ). Theorem Under Assumptions (H reg), (H 1 ), and (H loc ), one has L n(ˆθ θ) Z, where Z is a Gaussian random variable with mean 0 and variance V. 36
93 Laplace-Gaussian 37
94 Claw-Gaussian 38
95 Exponential-Uniform 39
arxiv: v1 [stat.ml] 21 Mar 2018
Some Theoretical Properties of GANs arxiv:1803.07819v1 [stat.ml] 21 Mar 2018 G. Biau Sorbonne Université, CNRS, LPSM Paris, France gerard.biau@upmc.fr M. Sangnier Sorbonne Université, CNRS, LPSM Paris,
More informationGENERATIVE ADVERSARIAL LEARNING
GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate
More informationarxiv: v1 [cs.lg] 20 Apr 2017
Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).
More informationA QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS
A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationGANs, GANs everywhere
GANs, GANs everywhere particularly, in High Energy Physics Maxim Borisyak Yandex, NRU Higher School of Economics Generative Generative models Given samples of a random variable X find X such as: P X P
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationGenerative Adversarial Networks
Generative Adversarial Networks SIBGRAPI 2017 Tutorial Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask Presentation content inspired by Ian Goodfellow s tutorial
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationGenerative Adversarial Networks. Presented by Yi Zhang
Generative Adversarial Networks Presented by Yi Zhang Deep Generative Models N(O, I) Variational Auto-Encoders GANs Unreasonable Effectiveness of GANs GANs Discriminator tries to distinguish genuine data
More informationTraining Generative Adversarial Networks Via Turing Test
raining enerative Adversarial Networks Via uring est Jianlin Su School of Mathematics Sun Yat-sen University uangdong, China bojone@spaces.ac.cn Abstract In this article, we introduce a new mode for training
More informationGenerative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,
Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation
More informationNegative Momentum for Improved Game Dynamics
Negative Momentum for Improved Game Dynamics Gauthier Gidel Reyhane Askari Hemmat Mohammad Pezeshki Gabriel Huang Rémi Lepriol Simon Lacoste-Julien Ioannis Mitliagkas Mila & DIRO, Université de Montréal
More informationSinging Voice Separation using Generative Adversarial Networks
Singing Voice Separation using Generative Adversarial Networks Hyeong-seok Choi, Kyogu Lee Music and Audio Research Group Graduate School of Convergence Science and Technology Seoul National University
More informationarxiv: v1 [cs.lg] 8 Dec 2016
Improved generator objectives for GANs Ben Poole Stanford University poole@cs.stanford.edu Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova Google Brain {alemi, jaschasd, anelia}@google.com arxiv:1612.02780v1
More informationarxiv: v1 [cs.lg] 6 Dec 2018
Embedding-reparameterization procedure for manifold-valued latent variables in generative models arxiv:1812.02769v1 [cs.lg] 6 Dec 2018 Eugene Golikov Neural Networks and Deep Learning Lab Moscow Institute
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationDeep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Emily Denton 1, Soumith Chintala 2, Arthur Szlam 2, Rob Fergus 2 1 New York University 2 Facebook AI Research Denotes equal
More informationAdversarial Sequential Monte Carlo
Adversarial Sequential Monte Carlo Kira Kempinska Department of Security and Crime Science University College London London, WC1E 6BT kira.kowalska.13@ucl.ac.uk John Shawe-Taylor Department of Computer
More informationSupplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization
Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group
More informationAdaGAN: Boosting Generative Models
AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationGenerative Adversarial Networks
Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationNotes on Adversarial Examples
Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking
More informationThe power of two samples in generative adversarial networks
The power of two samples in generative adversarial networks Sewoong Oh Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign / 2 Generative models learn
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationImproving Visual Semantic Embedding By Adversarial Contrastive Estimation
Improving Visual Semantic Embedding By Adversarial Contrastive Estimation Huan Ling Department of Computer Science University of Toronto huan.ling@mail.utoronto.ca Avishek Bose Department of Electrical
More informationNotes on Noise Contrastive Estimation (NCE)
Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of
More informationSummary of A Few Recent Papers about Discrete Generative models
Summary of A Few Recent Papers about Discrete Generative models Presenter: Ji Gao Department of Computer Science, University of Virginia https://qdata.github.io/deep2read/ Outline SeqGAN BGAN: Boundary
More informationExperiments on the Consciousness Prior
Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationarxiv: v3 [stat.ml] 20 Feb 2018
MANY PATHS TO EQUILIBRIUM: GANS DO NOT NEED TO DECREASE A DIVERGENCE AT EVERY STEP William Fedus 1, Mihaela Rosca 2, Balaji Lakshminarayanan 2, Andrew M. Dai 1, Shakir Mohamed 2 and Ian Goodfellow 1 1
More informationOPTIMIZATION METHODS IN DEEP LEARNING
Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationarxiv: v1 [cs.lg] 10 Jun 2016
Deep Directed Generative Models with Energy-Based Probability Estimation arxiv:1606.03439v1 [cs.lg] 10 Jun 2016 Taesup Kim, Yoshua Bengio Department of Computer Science and Operations Research Université
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationGradient descent GAN optimization is locally stable
Gradient descent GAN optimization is locally stable Advances in Neural Information Processing Systems, 2017 Vaishnavh Nagarajan J. Zico Kolter Carnegie Mellon University 05 January 2018 Presented by: Kevin
More informationEnergy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13
Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationDeep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward
More informationA Unified View of Deep Generative Models
SAILING LAB Laboratory for Statistical Artificial InteLigence & INtegreative Genomics A Unified View of Deep Generative Models Zhiting Hu and Eric Xing Petuum Inc. Carnegie Mellon University 1 Deep generative
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationNotes on policy gradients and the log derivative trick for reinforcement learning
Notes on policy gradients and the log derivative trick for reinforcement learning David Meyer dmm@{1-4-5.net,uoregon.edu,brocade.com,...} June 3, 2016 1 Introduction The log derivative trick 1 is a widely
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationarxiv: v1 [cs.sd] 30 Oct 2017
GENERATIVE ADVERSARIAL SOURCE SEPARATION Y.Cem Subaan, Paris Smaragdis, UIUC, Adobe Systems {subaan, paris}@illinois.edu arxiv:17.779v1 [cs.sd] 30 Oct 017 ABSTRACT Generative source separation methods
More informationarxiv: v1 [cs.lg] 28 Dec 2017
PixelSNAIL: An Improved Autoregressive Generative Model arxiv:1712.09763v1 [cs.lg] 28 Dec 2017 Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel Embodied Intelligence UC Berkeley, Department of
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationTraining generative neural networks via Maximum Mean Discrepancy optimization
Training generative neural networks via Maximum Mean Discrepancy optimization Gintare Karolina Dziugaite University of Cambridge Daniel M. Roy University of Toronto Zoubin Ghahramani University of Cambridge
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationarxiv: v2 [cs.lg] 21 Aug 2018
CoT: Cooperative Training for Generative Modeling of Discrete Data arxiv:1804.03782v2 [cs.lg] 21 Aug 2018 Sidi Lu Shanghai Jiao Tong University steve_lu@apex.sjtu.edu.cn Weinan Zhang Shanghai Jiao Tong
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationEnergy-Based Generative Adversarial Network
Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationParametric Inverse Simulation
Parametric Inverse Simulation Zenna Tavares MIT zenna@mit.edu Armando Solar Lezama MIT asolar@csail.mit.edu Abstract We introduce the theory of parametric inversion, which generalizes function inversion
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationProvable Non-Convex Min-Max Optimization
Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The
More informationVariational Autoencoders for Classical Spin Models
Variational Autoencoders for Classical Spin Models Benjamin Nosarzewski Stanford University bln@stanford.edu Abstract Lattice spin models are used to study the magnetic behavior of interacting electrons
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationarxiv: v1 [stat.ml] 2 Sep 2014
On the Equivalence Between Deep NADE and Generative Stochastic Networks Li Yao, Sherjil Ozair, Kyunghyun Cho, and Yoshua Bengio Département d Informatique et de Recherche Opérationelle Université de Montréal
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationGenerative Kernel PCA
ESA 8 proceedings, European Symposium on Artificial eural etworks, Computational Intelligence and Machine Learning. Bruges (Belgium), 5-7 April 8, i6doc.com publ., ISB 978-8758747-6. Generative Kernel
More informationarxiv: v1 [cs.lg] 22 Mar 2016
Information Theoretic-Learning Auto-Encoder arxiv:1603.06653v1 [cs.lg] 22 Mar 2016 Eder Santana University of Florida Jose C. Principe University of Florida Abstract Matthew Emigh University of Florida
More informationf-gan: Training Generative Neural Samplers using Variational Divergence Minimization
f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin,
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationarxiv: v3 [cs.lg] 2 Nov 2018
PacGAN: The power of two samples in generative adversarial networks Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh Carnegie Mellon University, University of Illinois at Urbana-Champaign arxiv:72.486v3
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More information