Variational Autoencoders (VAEs)
|
|
- Blaze Ferguson
- 5 years ago
- Views:
Transcription
1 September 26 & October 3, 2017
2 Section 1 Preliminaries
3 Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p q) = p(x) log p(x) dx. (1.1) q(x) By Jensen s Inequality, KL(p q) only if p = q, almost everywhere. 0, and the equation holds if and
4 Kullback-Leibler divergence Special case: multivariate Gaussian distribution Suppose k-dimensional variable p 1 N (µ 1, 1 ), p 2 N (µ 2, 2 ), then KL(p 1 p 2 )= 1 apple log det( 1) k +tr( det( 2 ) 1)+ (µ 2 µ 1 ) > 1 2 (µ 2 µ 1 ) i.
5 Variational Inference Suppose we want to use Q(z) toapproximatep(z X ), where p(z X ) does not have a explicit representation, then a good approximation would try to minimize Z KL(Q(z) P(z X )) = Q(z) log Q(z) P(z X ) dz. By Bayes formula, the above equation could be transferred into log P(X ) KL(Q(z) P(z X )) = Z Q(z) log P(X z)dz KL(Q(z) P(z)). (1.2)
6 1 Section 2 Variational Autoencoders 1 A. B. L. Larsen et al. (2015). Autoencoding beyond pixels using a learned similarity metric. In: arxiv preprint arxiv:
7 Original problem Given a dataset X from a distribution P(x), we want to generate new data that satisfies the unknown distribution P(x). We construct a model f (z; ) :Z! X,whereX is the space of observed variables (datas), Z the space of latent variables, the parameter space, and f a complex but deterministic mapping. Latent Variables: Variables that are not directly observed but are rather inferred from other directly observed variables. Given z, we can generate a sample X by f (z; ). We wish to optimize such that we can sample z from P(z) and, with high probability, f (z; ) willbelikethex s in our dataset.
8 Original problem Given a dataset X from a distribution P(x), we want to generate new data that satisfies the unknown distribution P(x). We construct a model f (z; ) :Z! X,whereX is the space of observed variables (datas), Z the space of latent variables, the parameter space, and f a complex but deterministic mapping. Latent Variables: Variables that are not directly observed but are rather inferred from other directly observed variables. Given z, we can generate a sample X by f (z; ). We wish to optimize such that we can sample z from P(z) and, with high probability, f (z; ) willbelikethex s in our dataset.
9 Original problem Given a dataset X from a distribution P(x), we want to generate new data that satisfies the unknown distribution P(x). We construct a model f (z; ) :Z! X,whereX is the space of observed variables (datas), Z the space of latent variables, the parameter space, and f a complex but deterministic mapping. Latent Variables: Variables that are not directly observed but are rather inferred from other directly observed variables. Given z, we can generate a sample X by f (z; ). We wish to optimize such that we can sample z from P(z) and, with high probability, f (z; ) willbelikethex s in our dataset.
10 Likelihood Z P(X ; ) = Choose to maximize the above integral. P(X z; ) P(z)dz (2.1) In VAEs, P(X z; ) N (f (z; ), 2 I ) in continuous case, and P(X z; ) B(f (z; )) in discrete case. In both cases, P(X z; ) is continuous with respect to theta, so we can use gradient ascent to maximize. Questions: How to define the latent variable z to capture latent information? How to deal with the integral over z, and its gradient with respect to?
11 Likelihood Z P(X ; ) = Choose to maximize the above integral. P(X z; ) P(z)dz (2.1) In VAEs, P(X z; ) N (f (z; ), 2 I ) in continuous case, and P(X z; ) B(f (z; )) in discrete case. In both cases, P(X z; ) is continuous with respect to theta, so we can use gradient ascent to maximize. Questions: How to define the latent variable z to capture latent information? How to deal with the integral over z, and its gradient with respect to?
12 Define latent variable We want the latent variable satisfies these two properties: The latent variables are chosen automatically, because we do not know too much about the intrinsic properties of X. Di erent components of z are mutually independent, in order to avoid the overlap in latent information. VAEs asserts that the latent variable could be drawn from standard Gaussian distribution, N (0, I ). Assertion Any distribution in d dimensions can be generated by taking a set of d variables that are normally distributed and mapping them through a su ciently complicated function. Since f (z, ) is complicated enough (trained by neural network), this choice of latent variable will not matter too much.
13 Deal with the integral P(X ; ) 1 X P(X z (i) ; ), z (i) N (0, I ). n i Figure: Contradict Example. We need to set need a very large dataset. very small, which will In this case, we need to choose a faster sampling procedure of z.
14 Deal with the integral Sampling in VAEs The key idea behind the variational autoencoder is to attempt to sample values of z that are likely to have produced X,and compute P(X ) just from those. New function Q(z): gives us a distribution over z values that are likely to produce X.ThenE P(z) [P(X z)]! E Q(z) [P(X z)]. We can see that P(z X ) is the optimum choice of Q(z), but P is intractable. Aim: Find a Q(z) which is an approximation of P(z X ), with Q(z) simple enough.
15 Recall: Variational Inference For any Q(z), use Q(z) to approximate P(z X ). According to Equation (1.2), log P(X ) KL(Q(z) P(z X )) = E Q(z) [log P(X z)] KL(Q(z) P(z)) Since were interested in inferring P(X ), it makes sense to construct a Q which does depend on X : log P(X ) KL(Q(z X ) P(z X )) = E Q(z X ) [log P(X z)] KL(Q(z X ) P(z)). (2.2) Aim: Maximize log P(x) (w.r.t. ), minimize KL(Q(z X ) P(z X ))., Maximize LHS, Maximize RHS.
16 Second term of RHS Aim: Minimize KL(Q(z X ) P(z)). We already have P(z) N (0, I ). The usual choice is to define Q(z X ) N (µ(x ; ), (X ; )), where µ and are deterministic functions of X with parameters. (We omit in the following equations.) Besides, we constrain to be a diagonal matrix. Minimization According to previous equation of KL-divergence of multivariate Gaussian distribution, KL(Q(z X ) P(z)) = 1 2 (tr (X )+(µ(x ))> (µ(x )) k log(det (X ))).
17 First term of RHS The maximization of the first item uses SGD. To approximate the distribution, take a sample ẑ from Q(z X ), and E Q(z X ) [log P(X z)] log P(X ẑ). General Maximization function E X D [log P(X ) KL(Q(z X ) P(z X ))] = E X D [E z Q X [log P(X z)] KL(Q(z X ) P(z))]. (2.3) To use SGD, sample a value X and a value z, then compute the gradient of RHS by backpropagation. Do this for m times and take the average to get the result converging to the gradient of RHS.
18 First term of RHS The maximization of the first item uses SGD. To approximate the distribution, take a sample ẑ from Q(z X ), and E Q(z X ) [log P(X z)] log P(X ẑ). General Maximization function E X D [log P(X ) KL(Q(z X ) P(z X ))] = E X D [E z Q X [log P(X z)] KL(Q(z X ) P(z))]. (2.3) To use SGD, sample a value X and a value z, then compute the gradient of RHS by backpropagation. Do this for m times and take the average to get the result converging to the gradient of RHS.
19 Figure: Flow chart for the VAE algorithm.
20 Significant Problems The algorithm seems to be perfect, but there are two significant problems during the calculation: The gradient of first term of RHS in Equation (2.3) should have included the parameters of both P and Q, but in our sampling method, we omit the parameters of Q. In this case, we cannot generate the true gradient of. The algorithm is separated into 2 parts: the first half train the model Q(z X ) by the given data X, the second half train the model f by the newly-sampling data z. Thus the backpropagation rule cannot cover this discontinuous point, making the algorthm fail.
21 Significant Problems The algorithm seems to be perfect, but there are two significant problems during the calculation: The gradient of first term of RHS in Equation (2.3) should have included the parameters of both P and Q, but in our sampling method, we omit the parameters of Q. In this case, we cannot generate the true gradient of. The algorithm is separated into 2 parts: the first half train the model Q(z X ) by the given data X, the second half train the model f by the newly-sampling data z. Thus the backpropagation rule cannot cover this discontinuous point, making the algorthm fail.
22 Modification by Reparameterization Trick To solve the first problem, we need to change the way of sampling. We firstly sample a N (0, I ), then define z = (X ) 1/2 + µ(x ). It is just the equivalent representation of the sample z in previous algorithm, but now the optimization function is changed into E X D [E N (0,I ) [log P(X µ(x )+ (X ) 1/2 )] KL(Q(z X ) P(z))]. This time the sampling function does not include our target function. Sample from Q(z X ) by evaluating a function h(, X ), where is an unobserved noise, and h continuous in X.(Discrete Q(z X ) fails in this case.) Then the backpropagation can be operated successfully.
23 Modification by Reparameterization Trick To solve the first problem, we need to change the way of sampling. We firstly sample a N (0, I ), then define z = (X ) 1/2 + µ(x ). It is just the equivalent representation of the sample z in previous algorithm, but now the optimization function is changed into E X D [E N (0,I ) [log P(X µ(x )+ (X ) 1/2 )] KL(Q(z X ) P(z))]. This time the sampling function does not include our target function. Sample from Q(z X ) by evaluating a function h(, X ), where is an unobserved noise, and h continuous in X.(Discrete Q(z X ) fails in this case.) Then the backpropagation can be operated successfully.
24 Figure: Flow chart for the corrected VAE algorithm.
25 Verification For decoder: We can just sample a random variable z N (0, I )andinputit into the decoder to find the f. The probability P(X )foratestingexamplex : This is not tractable, because P is implicit. However, according to Equation (2.2), since KL divergence is non-negative, we can find a lower bound of log P(X ), which is called Expectation of Lower BOund (ELBO) of P(X ). This lower bound can be a useful tool for getting a rough idea of how well our model is capturing a particular datapoint X,because its fast convergence.
26 Verification For decoder: We can just sample a random variable z N (0, I )andinputit into the decoder to find the f. The probability P(X )foratestingexamplex : This is not tractable, because P is implicit. However, according to Equation (2.2), since KL divergence is non-negative, we can find a lower bound of log P(X ), which is called Expectation of Lower BOund (ELBO) of P(X ). This lower bound can be a useful tool for getting a rough idea of how well our model is capturing a particular datapoint X,because its fast convergence.
27 Remarks Detailed remarks are not presented here. Interpretation of RHS. The two terms have their meanings in information theory. Separate the RHS by sample. Regularization term. It could be found by some transformation on RHS. Sampling for Q(z X ). The original paper expresses this distribution with g(x, ), where p independently. Restriction on p is needed. 2 2 D. P. Kingma and M. Welling (2013). Auto-encoding variational bayes. In: arxiv preprint arxiv:
28 Comparison Versus GAN Section 3 Extensions of VAEs
29 Comparison Versus GAN Both are newly deep generative models. The biggest advantage of VAEs is the nice probabilistic formulation they come with as a result of maximizing a lower bound on the log-likelihood. Also, VAE is usually easier to train and get working. Relatively easy to implement and robust to hyperparameter choices. GANs are better at generating visual features. Sometimes the output of VAEs is vague. More detailed discussions are shown on Reddit.
30 Conditional Variational Autoencoders Original problem: Given input dataset X and output Y,wewanttocreateamodel P(Y X ) which maximizes the probability of the ground truth distribution. Example: Generating Hand-write digits. We want to add digits to an existing string of digits written by a single person. A standard regression model will fail in this situation, because it will finally generate an average image with the minimum in distance, which may look like a meaningless blur. However, CVAEs allow us to tackle problems where the input-to-output mapping is one-to-many, without requiring us to explicitly specify the structure of the output distribution.
31 Conditional Variational Autoencoders Original problem: Given input dataset X and output Y,wewanttocreateamodel P(Y X ) which maximizes the probability of the ground truth distribution. Example: Generating Hand-write digits. We want to add digits to an existing string of digits written by a single person. A standard regression model will fail in this situation, because it will finally generate an average image with the minimum in distance, which may look like a meaningless blur. However, CVAEs allow us to tackle problems where the input-to-output mapping is one-to-many, without requiring us to explicitly specify the structure of the output distribution.
32 Conditional Variational Autoencoders Figure: Flow chart for the CVAE algorithm.
33 Conditional Variational Autoencoders P(Y X )=N(f(z, X ), 2 I ); log P(Y X ) KL(Q(z Y, X ) P(z Y, X )) = E Q(z Y,X ) [log P(Y z, X )] KL(Q(z Y, X ) P(z X )).
34 VAE-GAN 3 Combine a VAE with a GAN by collapsing the decoder and the generator into one, since they are both from standard Gaussian distribution to X. Figure: Overview of the VAE-GAN algorithm. 3 A. B. L. Larsen et al. (2015). Autoencoding beyond pixels using a learned similarity metric. In: arxiv preprint arxiv:
35 Instead of analyzing the error element-wise, VAE-GAN analyses the error feature-wise, where the feature is generated by Discriminator. Share the parameters of Generator and Decoder together. Optimize three kinds of errors simultaneously. Figure: Flow of the VAE-GAN algorithm. Grey arrows represents the terms in the training objective.
36 That s all. Thanks!
Variational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationVariational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain
Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational
More informationGenerative models for missing value completion
Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Variational Autoencoders
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2018 Variational Autoencoders 1 The Latent Variable Cross-Entropy Objective We will now drop the negation and switch to argmax. Φ = argmax
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationNatural Gradients via the Variational Predictive Distribution
Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference
More informationSandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse
Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationNonparametric Inference for Auto-Encoding Variational Bayes
Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationThe Success of Deep Generative Models
The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability
More informationREINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS
Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationA Unified View of Deep Generative Models
SAILING LAB Laboratory for Statistical Artificial InteLigence & INtegreative Genomics A Unified View of Deep Generative Models Zhiting Hu and Eric Xing Petuum Inc. Carnegie Mellon University 1 Deep generative
More informationVariational AutoEncoder: An Introduction and Recent Perspectives
Metode de optimizare Riemanniene pentru învățare profundă Proiect cofinanțat din Fondul European de Dezvoltare Regională prin Programul Operațional Competitivitate 2014-2020 Variational AutoEncoder: An
More information14 : Mean Field Assumption
10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationVariational Auto-Encoders (VAE)
Variational Auto-Encoders (VAE) Jonathan Pillow Lecture 21 slides NEU 560 Spring 2018 VAE Generative Model latent ~z N (0, I) data f aaacyhicdzhnbhmxemed5assxy0cuvhesfyidiskofb0ungoiktasnk0cpzzxoq/zm+mjsxfeasu8as8ew+dk64qswaks3/n7z9jj2dipfbyfl862z279+4/2huyp3r85omz/ypn5940jkofg2ncymi8skghjwildkwdpiyslibz4xw/widzwugvulqwuuxki1pwhik1qmcvzgdzel9b9ip10f1rtqjl2jgbh3r+vlpdgwuauwted8vc4igwh4jlihnvelcmz9kvdjputiefhfwdi32dmlnag5eorrro/l0rmpj+qsbjqrjo/dzbjf/fhg3wh0zbansgah57ud1iioaupqdt4ycjxcbbubpprztpmgmc0x/leaxhmhulmj6gsuhfdjfhbrnjjkbnywixfrob0xuuhdpegg+bnqnhjusyqjr1d2umhzun/ljwv55tpxjftcqzdkfbehgt6ai4vdkg8xbmy562w27vclech/bkold+ftc9+tjuey+8jk/ig1ks9+sinjaz0iecspknfcc/sk+zza6z5a0167q1l8hgzf9/a5t+5qk=
More informationGenerating Sentences by Editing Prototypes
Generating Sentences by Editing Prototypes K. Guu 2, T.B. Hashimoto 1,2, Y. Oren 1, P. Liang 1,2 1 Department of Computer Science Stanford University 2 Department of Statistics Stanford University arxiv
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationGenerative Adversarial Networks
Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationStochastic Backpropagation, Variational Inference, and Semi-Supervised Learning
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationGenerative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,
Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationInference Suboptimality in Variational Autoencoders
Inference Suboptimality in Variational Autoencoders Chris Cremer Department of Computer Science University of Toronto ccremer@cs.toronto.edu Xuechen Li Department of Computer Science University of Toronto
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationVariational Auto Encoders
Variational Auto Encoders 1 Recap Deep Neural Models consist of 2 primary components A feature extraction network Where important aspects of the data are amplified and projected into a linearly separable
More informationDeep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila
Deep Generative Models for Graph Generation Jian Tang HEC Montreal CIFAR AI Chair, Mila Email: jian.tang@hec.ca Deep Generative Models Goal: model data distribution p(x) explicitly or implicitly, where
More informationB PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018
A ADVERSARIAL DOMAIN ADAPTATION (ADA ADA aims to transfer prediction nowledge learned from a source domain with labeled data to a target domain without labels, by learning domain-invariant features. Let
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationDeep latent variable models
Deep latent variable models Pierre-Alexandre Mattei IT University of Copenhagen http://pamattei.github.io @pamattei 19 avril 2018 Séminaire de statistique du CNAM 1 Overview of talk A short introduction
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationSGVB Topic Modeling. by Otto Fabius. Supervised by Max Welling. A master s thesis for MSc in Artificial Intelligence. Track: Learning Systems
SGVB Topic Modeling by Otto Fabius 5619858 Supervised by Max Welling A master s thesis for MSc in Artificial Intelligence Track: Learning Systems University of Amsterdam the Netherlands May 17th 2017 Abstract
More informationarxiv: v1 [cs.lg] 6 Dec 2018
Embedding-reparameterization procedure for manifold-valued latent variables in generative models arxiv:1812.02769v1 [cs.lg] 6 Dec 2018 Eugene Golikov Neural Networks and Deep Learning Lab Moscow Institute
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationQuasi-Monte Carlo Flows
Quasi-Monte Carlo Flows Florian Wenzel TU Kaiserslautern Germany wenzelfl@hu-berlin.de Alexander Buchholz ENSAE-CREST, Paris France alexander.buchholz@ensae.fr Stephan Mandt Univ. of California, Irvine
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationAuto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Auto-Encoding Variational Bayes Diederik Kingma and Max Welling Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo J. Rezende, Shakir Mohamed, Daan Wierstra Neural Variational
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationarxiv: v1 [cs.lg] 15 Jun 2016
Improving Variational Inference with Inverse Autoregressive Flow arxiv:1606.04934v1 [cs.lg] 15 Jun 2016 Diederik P. Kingma, Tim Salimans and Max Welling OpenAI, San Francisco University of Amsterdam, University
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationEnergy-Based Generative Adversarial Network
Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial
More informationAn Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme
An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationVariational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed
Variational Autoencoders Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Contents 1. Why unsupervised learning, and why generative models? (Selected slides from Yann
More informationGENERATIVE ADVERSARIAL LEARNING
GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate
More informationDeep Generative Models
Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation
More informationCitation for published version (APA): Kingma, D. P. (2017). Variational inference & deep learning: A new synthesis
UvA-DARE (Digital Academic Repository) Variational inference & deep learning Kingma, D.P. Link to publication Citation for published version (APA): Kingma, D. P. (2017). Variational inference & deep learning:
More informationDenoising Criterion for Variational Auto-Encoding Framework
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Denoising Criterion for Variational Auto-Encoding Framework Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationPROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY. Arto Klami
PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY Arto Klami 1 PROBABILISTIC PROGRAMMING Probabilistic programming is to probabilistic modelling as deep learning is to neural networks (Antti Honkela,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationCAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING
CAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING (Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis & Sriram Vishwanath, 2017) Summer Term 2018 Created for the Seminar
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationAdaGAN: Boosting Generative Models
AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)
More informationPROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY
PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY Arto Klami Adapted from my talk in AIHelsinki seminar Dec 15, 2016 1 MOTIVATING INTRODUCTION Most of the artificial intelligence success stories
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationSupplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization
Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group
More information