Variational AutoEncoder: An Introduction and Recent Perspectives

Size: px
Start display at page:

Download "Variational AutoEncoder: An Introduction and Recent Perspectives"

Transcription

1 Metode de optimizare Riemanniene pentru învățare profundă Proiect cofinanțat din Fondul European de Dezvoltare Regională prin Programul Operațional Competitivitate Variational AutoEncoder: An Introduction and Recent Perspectives Luigi Malagò, Alexandra Peste, and Septimia Sarbu Romanian Institute of Science and Technology Politecnico di Milano 23 February 2018

2 Outline 1 Plan of the presentation 2 General View of Variational Autoencoders Introduction Research Directions 3 Work-in-progress Using Gaussian Graphical Models Geometry of the Latent Space 4 Future Work L.Malagò, A.Peşte (RIST) Variational Autoencoders 2/36

3 Plan of the presentation Overview of variational inference in deep learning: general algorithm, research directions and fast review of the existing literature Present some of the work we have done so far: the use of graphical models for introducing correlations between the latent variables analysis of the geometry of the latent space Present some ideas and questions that we have been thinking about, along with possible research directions L.Malagò, A.Peşte (RIST) Variational Autoencoders 3/36

4 Outline 1 Plan of the presentation 2 General View of Variational Autoencoders Introduction Research Directions 3 Work-in-progress Using Gaussian Graphical Models Geometry of the Latent Space 4 Future Work L.Malagò, A.Peşte (RIST) Variational Autoencoders 4/36

5 Generative Models in Deep Learning Real images Generated images [12] L.Malagò, A.Peşte (RIST) Variational Autoencoders 5/36

6 Notations for Bayesian Inference X, Z multivariate random variables, Z continuous, with probability density functions (pdf) p (x) andp (z) respectively,withparameters ; p (z) isthe prior and p (x) themarginal; p (x, z): pdf of the joint random variable (X, Z), with parameters ; p (x z), p (z x): pdfs of the random variables X Z = z and Z X = x; p (z x) istheposterior. L.Malagò, A.Peşte (RIST) Variational Autoencoders 6/36

7 General Setting Formulation of the problem: The continuous latent r.v. Z generates Z X, through f ( ) a di erentiable function, such that p (x z)p (z)dz is intractable. The goal is inference, i.e.,findingp (z x). Variational inference [1] approximates the true posterior p (z x) withq (z x), by minimizing the Kullback-Leibler divergence KL(q (z x) k p (z x)). z q (z x) f ( ) x Approach to the solution: maximizing a lower bound of the log likelihood. L.Malagò, A.Peşte (RIST) Variational Autoencoders 7/36

8 General Setting Formulation of the problem: The continuous latent r.v. Z generates Z X, through f ( ) a di erentiable function, such that p (x z)p (z)dz is intractable. The goal is inference, i.e.,findingp (z x). Variational inference [1] approximates the true posterior p (z x) withq (z x), by minimizing the Kullback-Leibler divergence KL(q (z x) k p (z x)). z q (z x) f ( ) x Approach to the solution: maximizing a lower bound of the log likelihood. L.Malagò, A.Peşte (RIST) Variational Autoencoders 7/36

9 General Setting Formulation of the problem: The continuous latent r.v. Z generates Z X, through f ( ) a di erentiable function, such that p (x z)p (z)dz is intractable. The goal is inference, i.e.,findingp (z x). Variational inference [1] approximates the true posterior p (z x) withq (z x), by minimizing the Kullback-Leibler divergence KL(q (z x) k p (z x)). z q (z x) f ( ) x Approach to the solution: maximizing a lower bound of the log likelihood. L.Malagò, A.Peşte (RIST) Variational Autoencoders 7/36

10 Variational Inference I Deriving the lower bound: Z ln p (x) =ln q (z x) p (x, z) q (z x) dz Evidence lower bound: L(, Z q (z x)ln p (x, z) dz (Jensen s inequality) q (z x) apple ; x) :=E q (z x) ln p (x, z) apple ln p (x) q (z x) Minimizing KL () maximizing the lower-bound: apple E q (z x) ln p (x, z) =lnp (x) KL(q (z x) k p (z x)) q (z x) The maximum of the lower-bound is the log-likelihood, and it is obtained when KL(q (z x) k p (z x)) = 0. Thus, the problems are equivalent. L.Malagò, A.Peşte (RIST) Variational Autoencoders 8/36

11 Variational Inference I Deriving the lower bound: Z ln p (x) =ln q (z x) p (x, z) q (z x) dz Evidence lower bound: L(, Z q (z x)ln p (x, z) dz (Jensen s inequality) q (z x) apple ; x) :=E q (z x) ln p (x, z) apple ln p (x) q (z x) Minimizing KL () maximizing the lower-bound: apple E q (z x) ln p (x, z) =lnp (x) KL(q (z x) k p (z x)) q (z x) The maximum of the lower-bound is the log-likelihood, and it is obtained when KL(q (z x) k p (z x)) = 0. Thus, the problems are equivalent. L.Malagò, A.Peşte (RIST) Variational Autoencoders 8/36

12 Variational Inference II Optimizing the lower bound maximizes the log likelihood. The distribution of X can be approximated with importance sampling: ln p (x) ln 1 S SX i=1 p (x z (i) )p (z (i) ) q (z (i) x) where z (i) q ( x). Fixing the family of distributions for the r.v., e.g. we assume they are Gaussians, we move from variational calculus to regular optimization of the parameters. The problem becomes: max E q (z x) [ln p (x, z), ln q (z x)] L.Malagò, A.Peşte (RIST) Variational Autoencoders 9/36

13 Variational Inference II Optimizing the lower bound maximizes the log likelihood. The distribution of X can be approximated with importance sampling: ln p (x) ln 1 S SX i=1 p (x z (i) )p (z (i) ) q (z (i) x) where z (i) q ( x). Fixing the family of distributions for the r.v., e.g. we assume they are Gaussians, we move from variational calculus to regular optimization of the parameters. The problem becomes: max E q (z x) [ln p (x, z), ln q (z x)] L.Malagò, A.Peşte (RIST) Variational Autoencoders 9/36

14 Variational Autoencoders Variational Autoencoders ([6], [11]) tackle the problem of variational inference in the context of neural networks. Theparameters and of q (z x) andp (x z) are learned through two di erent neural networks: encoder and decoder. sampling layer Encoder Network Decoder Network L.Malagò, A.Peşte (RIST) Variational Autoencoders 10 / 36

15 Applications Encode: learn a lower dimensional representation of the dataset, by sampling from q ( x). The dimension of the latent variable Z is assumed to be much smaller than the dimension of the dataset. Generate from noise examples that resemble the ones seen during training. The prior p (z) on the latent variable is assumed Gaussian N (0, I) and samples are fed through the network to output the conditional probabilities p (x z). L.Malagò, A.Peşte (RIST) Variational Autoencoders 11 / 36

16 Applications Encode: learn a lower dimensional representation of the dataset, by sampling from q ( x). The dimension of the latent variable Z is assumed to be much smaller than the dimension of the dataset. Generate from noise examples that resemble the ones seen during training. The prior p (z) on the latent variable is assumed Gaussian N (0, I) and samples are fed through the network to output the conditional probabilities p (x z). L.Malagò, A.Peşte (RIST) Variational Autoencoders 11 / 36

17 Details of the Algorithm Encoder: q (z x) - Gaussian N (µ, D) with diagonal covariance; - the set of parameters of the encoder Decoder: p (x z) - Gaussian with diagonal covariance (continuous data) or Bernoulli vector (discrete data); - the set of parameters of the decoder For a data point x, rewrite the lower bound L(, ; x) L(, ; x) = E q (z x) [ln p (x z)] KL(q (z x) p (z)) Reconstruction error Regularization Cost function to be optimized: 1 N NX L(, n=1 ; x n ), from dataset X = {x n } n=1,n L.Malagò, A.Peşte (RIST) Variational Autoencoders 12 / 36

18 Details of the Algorithm Encoder: q (z x) - Gaussian N (µ, D) with diagonal covariance; - the set of parameters of the encoder Decoder: p (x z) - Gaussian with diagonal covariance (continuous data) or Bernoulli vector (discrete data); - the set of parameters of the decoder For a data point x, rewrite the lower bound L(, ; x) L(, ; x) = E q (z x) [ln p (x z)] KL(q (z x) p (z)) Reconstruction error Regularization Cost function to be optimized: 1 N NX L(, n=1 ; x n ), from dataset X = {x n } n=1,n L.Malagò, A.Peşte (RIST) Variational Autoencoders 12 / 36

19 Backpropagating through Stochastic Layers Training neural networks requires computing the gradient of the cost function, using backpropagation Di culty when computing r E q (z x) [ln p (x z)] - Monte Carlo estimation of the gradient has high variance The reparameterization trick: find g ( ) di erentiable transformation and random variable with pdf p( ), such that Z = g ( ). E q (z x) [ln p (x z)] = E p( ) [ln p (x g ( ))] r E q (z x) [ln p (x z)] = E p( ) [r ln p (x g ( )] Example for X N(µ, ), with = LL T Cholesky decomposition: X = µ + L,with N(0, I). L.Malagò, A.Peşte (RIST) Variational Autoencoders 13 / 36

20 Backpropagating through Stochastic Layers Training neural networks requires computing the gradient of the cost function, using backpropagation Di culty when computing r E q (z x) [ln p (x z)] - Monte Carlo estimation of the gradient has high variance The reparameterization trick: find g ( ) di erentiable transformation and random variable with pdf p( ), such that Z = g ( ). E q (z x) [ln p (x z)] = E p( ) [ln p (x g ( ))] r E q (z x) [ln p (x z)] = E p( ) [r ln p (x g ( )] Example for X N(µ, ), with = LL T Cholesky decomposition: X = µ + L,with N(0, I). L.Malagò, A.Peşte (RIST) Variational Autoencoders 13 / 36

21 Limitations and Challenges Limitations The conditional independence assumption on the latent variables given the observations limits the expressive power of the approximate posterior Limitation on the number of active latent variables when using a hierarchy of stochastic layers [13] Challenges Di culty when training on text data: empiricalobservationthatthelearned latent representation is not meaningful [2] How to improve the quality of the generated samples, incaseofadatasetof images? How can we find a better correlation between the images generated and the maximization of the lower bound? How to estimate the tightness of the bound? L.Malagò, A.Peşte (RIST) Variational Autoencoders 14 / 36

22 Limitations and Challenges Limitations The conditional independence assumption on the latent variables given the observations limits the expressive power of the approximate posterior Limitation on the number of active latent variables when using a hierarchy of stochastic layers [13] Challenges Di culty when training on text data: empiricalobservationthatthelearned latent representation is not meaningful [2] How to improve the quality of the generated samples, incaseofadatasetof images? How can we find a better correlation between the images generated and the maximization of the lower bound? How to estimate the tightness of the bound? L.Malagò, A.Peşte (RIST) Variational Autoencoders 14 / 36

23 Research Directions More complex representations for q (z x), bytransformingasimple distribution through invertible di erentiable functions, as in [10] and [5] Increased complexity of the graphical models, e.g. a hierarchy of latent variables or auxiliary variables as in [13] and [9] Designing tighter bounds: importance weighting estimates of the log-likelihood [3] apple L K (, ; x) =E z 1,z 2,...z K q (z x) log 1 KX p (x, z k ) K q (z k x) minimizing di erent divergences (Renyi [8], -divergence [4]) Overcoming the challenge of training VAE on text data [2] k=1 L.Malagò, A.Peşte (RIST) Variational Autoencoders 15 / 36

24 Research Directions More complex representations for q (z x), bytransformingasimple distribution through invertible di erentiable functions, as in [10] and [5] Increased complexity of the graphical models, e.g. a hierarchy of latent variables or auxiliary variables as in [13] and [9] Designing tighter bounds: importance weighting estimates of the log-likelihood [3] apple L K (, ; x) =E z 1,z 2,...z K q (z x) log 1 KX p (x, z k ) K q (z k x) minimizing di erent divergences (Renyi [8], -divergence [4]) Overcoming the challenge of training VAE on text data [2] k=1 L.Malagò, A.Peşte (RIST) Variational Autoencoders 15 / 36

25 Research Directions More complex representations for q (z x), bytransformingasimple distribution through invertible di erentiable functions, as in [10] and [5] Increased complexity of the graphical models, e.g. a hierarchy of latent variables or auxiliary variables as in [13] and [9] Designing tighter bounds: importance weighting estimates of the log-likelihood [3] apple L K (, ; x) =E z 1,z 2,...z K q (z x) log 1 KX p (x, z k ) K q (z k x) minimizing di erent divergences (Renyi [8], -divergence [4]) Overcoming the challenge of training VAE on text data [2] k=1 L.Malagò, A.Peşte (RIST) Variational Autoencoders 15 / 36

26 Research Directions More complex representations for q (z x), bytransformingasimple distribution through invertible di erentiable functions, as in [10] and [5] Increased complexity of the graphical models, e.g. a hierarchy of latent variables or auxiliary variables as in [13] and [9] Designing tighter bounds: importance weighting estimates of the log-likelihood [3] apple L K (, ; x) =E z 1,z 2,...z K q (z x) log 1 KX p (x, z k ) K q (z k x) minimizing di erent divergences (Renyi [8], -divergence [4]) Overcoming the challenge of training VAE on text data [2] k=1 L.Malagò, A.Peşte (RIST) Variational Autoencoders 15 / 36

27 Outline 1 Plan of the presentation 2 General View of Variational Autoencoders Introduction Research Directions 3 Work-in-progress Using Gaussian Graphical Models Geometry of the Latent Space 4 Future Work L.Malagò, A.Peşte (RIST) Variational Autoencoders 16 / 36

28 Gaussian Graphical Models for VAE Gaussian Graphical Models [7] introduce correlations in the latent variables. Chain and 2D grid topologies =) sparse precision matrix P = 1,withthe number of non-zero components linear in the dimension of the latent variable The encoder network outputs the mean µ and the Cholesky factor L of the precision matrix. L will have a special sparse structure and will ensure the positive definiteness of. To sample from N (µ, ): solve linear system L T =, where N(0, I), and output z = µ +. Sampling from N (µ, ) and computing KL(N (µ, ) kn(0, I)) can be done in linear time =) introduce expressiveness without extra computational complexity. L.Malagò, A.Peşte (RIST) Variational Autoencoders 17 / 36

29 Gaussian Graphical Models for VAE Gaussian Graphical Models [7] introduce correlations in the latent variables. Chain and 2D grid topologies =) sparse precision matrix P = 1,withthe number of non-zero components linear in the dimension of the latent variable The encoder network outputs the mean µ and the Cholesky factor L of the precision matrix. L will have a special sparse structure and will ensure the positive definiteness of. To sample from N (µ, ): solve linear system L T =, where N(0, I), and output z = µ +. Sampling from N (µ, ) and computing KL(N (µ, ) kn(0, I)) can be done in linear time =) introduce expressiveness without extra computational complexity. L.Malagò, A.Peşte (RIST) Variational Autoencoders 17 / 36

30 Chain Topology z 1 z 2... z k Chain Model 0 P = k 1 k 1 C A The precision matrix P is tridiagonal; the Cholesky factor of such a matrix is lower-bidiagonal. L.Malagò, A.Peşte (RIST) Variational Autoencoders 18 / 36

31 Grid Topology z 1 z 2 z 3 z 1 z 2 z 3 z 4 z 5 z 6 z 4 z 5 z 6 z 7 z 8 z 9 z 7 z 8 z 9 Regular Grid Extended Grid We use in our experiments the extended grid, which corresponds to a block tridiagonal precision matrix P; we assume the Cholesky factor has a lower-block-bidiagonal structure. L.Malagò, A.Peşte (RIST) Variational Autoencoders 19 / 36

32 Motivation for Next Research Direction The purpose was to approximate the posterior with more complex distributions. Although the results show a slight improvement, they do not motivate the future use of these models. A more comprehensive analysis should be made to understand the geometry of the latent space. L.Malagò, A.Peşte (RIST) Variational Autoencoders 20 / 36

33 Analysis of the Representations in the Latent Space Experiments on MNIST dataset to understand the representation of the images in the learned latent space. Principal Components Analysis of the latent means will give us insights about which components are relevant for the representation. Claim: components with a low variation along the dataset are the ones not meaningful. PCA eigenvalues of the posterior samples are very close to 1 =) the KL minimization forces some components to be N (0, I). L.Malagò, A.Peşte (RIST) Variational Autoencoders 21 / 36

34 PCA Analysis 1/2 k = 20, RELU k = 30, RELU 1/5

35 PCA Analysis 2/2 k = 30, RELU k = 30, ELU 2/5

36 PCA Plots L.Malagò, A.Peşte (RIST) Variational Autoencoders 22 / 36

37 Interpretation of the Plot When training a VAE with latent size 20 on MNIST, only around 15 of the latent variables are relevant for the representation. The number remains constant when training with a larger latent size. This is a consequence of the KL regularization term in the cost function, which forces some components to be Gaussian noise. Is this number a particularity of the dataset? What is the impact on this number when using more complicated network architectures? Would we observe the same behavior with other bounds derived from di erent divergences (e.g. Rényi)? L.Malagò, A.Peşte (RIST) Variational Autoencoders 23 / 36

38 Interpretation of the Plot When training a VAE with latent size 20 on MNIST, only around 15 of the latent variables are relevant for the representation. The number remains constant when training with a larger latent size. This is a consequence of the KL regularization term in the cost function, which forces some components to be Gaussian noise. Is this number a particularity of the dataset? What is the impact on this number when using more complicated network architectures? Would we observe the same behavior with other bounds derived from di erent divergences (e.g. Rényi)? L.Malagò, A.Peşte (RIST) Variational Autoencoders 23 / 36

39 Correlations Plot L.Malagò, A.Peşte (RIST) Variational Autoencoders 24 / 36

40 Interpretation of the Plot With the previous plot we want to better understand the distribution of the latent means vector across the dataset to identify the inactive components. Distribution of (µ i,µ j ), samples corresponding to the points in the dataset. Inactive components are close to 0 and remain constant along the data set. L.Malagò, A.Peşte (RIST) Variational Autoencoders 25 / 36

41 Generated images Images generated by training VAE on MNIST, with the encoder and decoder feed-forward neural networks with two hidden layers: Samples from MNIST dataset Generated after 100 epochs Generated after 1000 epochs L.Malagò, A.Peşte (RIST) Variational Autoencoders 26 / 36

42 Linear Separability in the Latent Space VAE is trained on MNIST with 2 latent variables. The plot represents the means of the posterior for each point in the dataset, colored by corresponding class. Linear separability of the classes in the space of latent representations Sampling in the latent space from the empty regions =) images that are not digits Linear interpolation property =) continuous deformation in the latent space between two di erent images. L.Malagò, A.Peşte (RIST) Variational Autoencoders 27 / 36

43 Classification Performance 4/5

44 Outline 1 Plan of the presentation 2 General View of Variational Autoencoders Introduction Research Directions 3 Work-in-progress Using Gaussian Graphical Models Geometry of the Latent Space 4 Future Work L.Malagò, A.Peşte (RIST) Variational Autoencoders 28 / 36

45 Short-Term Linear separability of the dataset in the space of multi-dimensional latent representations Use skew distributions to model the posterior Study the behavior of the latent relevant components in the case of more complex posteriors, like the ones presented in [10] and [5] L.Malagò, A.Peşte (RIST) Variational Autoencoders 29 / 36

46 Medium-Term Bounds derived from di erent divergences (e.g. Rényi, -divergence) impact of the parameter on the tightness of the bounds relevant components in the latent space and see how their number changes Geometric methods for training VAE the use of natural gradient study the geometry of the latent space use Riemannian optimization methods that exploit some properties of the space of the latent variables Extend the study to di erent types of generative models, e.g. Generative Adversarial Networks (GANs), Restricted Boltzmann Machines (RBMs). L.Malagò, A.Peşte (RIST) Variational Autoencoders 30 / 36

47 References I [1] David M Blei, Alp Kucukelbir, and Jon D McAuli e. Variational inference: A review for statisticians. Journal of the American Statistical Association, [2] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. arxiv preprint arxiv: , [3] Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. arxiv preprint arxiv: , [4] José Miguel Hernández-Lobato, Yingzhen Li, Mark Rowland, Daniel Hernández-Lobato, Thang D Bui, and Richard E Turner. Black-box -divergence minimization L.Malagò, A.Peşte (RIST) Variational Autoencoders 31 / 36

48 References II [5] Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improving variational autoencoders with inverse autoregressive flow. In Advances In Neural Information Processing Systems, pages , [6] Diederik P Kingma and Max Welling. Auto-encoding variational bayes [7] Ste en L Lauritzen. Graphical models, volume 17. Clarendon Press, [8] Yingzhen Li and Richard E Turner. Rényi divergence variational inference. In Advances in Neural Information Processing Systems, pages , L.Malagò, A.Peşte (RIST) Variational Autoencoders 32 / 36

49 References III [9] Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary deep generative models. arxiv preprint arxiv: , [10] Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arxiv preprint arxiv: , [11] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models [12] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages , L.Malagò, A.Peşte (RIST) Variational Autoencoders 33 / 36

50 References IV [13] Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. Ladder variational autoencoders. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages Curran Associates, Inc., L.Malagò, A.Peşte (RIST) Variational Autoencoders 34 / 36

51 Questions? L.Malagò, A.Peşte (RIST) Variational Autoencoders 35 / 36

52 Transylvanian Machine Learning Summer School July 2018, Cluj-Napoca, Romania 5/5

53 Transylvanian Machine Learning Summer School July 2018, Cluj-Napoca, Romania Registration Deadline: 30 March 2018 More details at 5/5

54 Thank You! L.Malagò, A.Peşte (RIST) Variational Autoencoders 36 / 36

UvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication

UvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication UvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication Citation for published version (APA): Tomczak, J. M., &

More information

Inference Suboptimality in Variational Autoencoders

Inference Suboptimality in Variational Autoencoders Inference Suboptimality in Variational Autoencoders Chris Cremer Department of Computer Science University of Toronto ccremer@cs.toronto.edu Xuechen Li Department of Computer Science University of Toronto

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

INFERENCE SUBOPTIMALITY

INFERENCE SUBOPTIMALITY INFERENCE SUBOPTIMALITY IN VARIATIONAL AUTOENCODERS Anonymous authors Paper under double-blind review ABSTRACT Amortized inference has led to efficient approximate inference for large datasets. The quality

More information

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Natural Gradients via the Variational Predictive Distribution

Natural Gradients via the Variational Predictive Distribution Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Improving Variational Inference with Inverse Autoregressive Flow arxiv:1606.04934v1 [cs.lg] 15 Jun 2016 Diederik P. Kingma, Tim Salimans and Max Welling OpenAI, San Francisco University of Amsterdam, University

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

Nonparametric Inference for Auto-Encoding Variational Bayes

Nonparametric Inference for Auto-Encoding Variational Bayes Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an

More information

arxiv: v1 [stat.ml] 6 Dec 2018

arxiv: v1 [stat.ml] 6 Dec 2018 missiwae: Deep Generative Modelling and Imputation of Incomplete Data arxiv:1812.02633v1 [stat.ml] 6 Dec 2018 Pierre-Alexandre Mattei Department of Computer Science IT University of Copenhagen pima@itu.dk

More information

The Success of Deep Generative Models

The Success of Deep Generative Models The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability

More information

Improving Variational Inference with Inverse Autoregressive Flow

Improving Variational Inference with Inverse Autoregressive Flow Improving Variational Inference with Inverse Autoregressive Flow Diederik P. Kingma dpkingma@openai.com Tim Salimans tim@openai.com Rafal Jozefowicz rafal@openai.com Xi Chen peter@openai.com Ilya Sutskever

More information

Resampled Proposal Distributions for Variational Inference and Learning

Resampled Proposal Distributions for Variational Inference and Learning Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models

More information

Resampled Proposal Distributions for Variational Inference and Learning

Resampled Proposal Distributions for Variational Inference and Learning Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models

More information

GENERATIVE ADVERSARIAL LEARNING

GENERATIVE ADVERSARIAL LEARNING GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Sylvester Normalizing Flows for Variational Inference

Sylvester Normalizing Flows for Variational Inference Sylvester Normalizing Flows for Variational Inference Rianne van den Berg Informatics Institute University of Amsterdam Leonard Hasenclever Department of statistics University of Oxford Jakub M. Tomczak

More information

arxiv: v2 [stat.ml] 15 Aug 2017

arxiv: v2 [stat.ml] 15 Aug 2017 Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

arxiv: v3 [stat.ml] 3 Apr 2017

arxiv: v3 [stat.ml] 3 Apr 2017 STICK-BREAKING VARIATIONAL AUTOENCODERS Eric Nalisnick Department of Computer Science University of California, Irvine enalisni@uci.edu Padhraic Smyth Department of Computer Science University of California,

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Neural network ensembles and variational inference revisited

Neural network ensembles and variational inference revisited 1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 11 Neural network ensembles and variational inference revisited Marcin B. Tomczak marcin@prowler.io PROWLER.io Office, 72 Hills Road,

More information

Local Expectation Gradients for Doubly Stochastic. Variational Inference

Local Expectation Gradients for Doubly Stochastic. Variational Inference Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,

More information

A Unified View of Deep Generative Models

A Unified View of Deep Generative Models SAILING LAB Laboratory for Statistical Artificial InteLigence & INtegreative Genomics A Unified View of Deep Generative Models Zhiting Hu and Eric Xing Petuum Inc. Carnegie Mellon University 1 Deep generative

More information

Neural Variational Inference and Learning in Undirected Graphical Models

Neural Variational Inference and Learning in Undirected Graphical Models Neural Variational Inference and Learning in Undirected Graphical Models Volodymyr Kuleshov Stanford University Stanford, CA 94305 kuleshov@cs.stanford.edu Stefano Ermon Stanford University Stanford, CA

More information

Variational Dropout and the Local Reparameterization Trick

Variational Dropout and the Local Reparameterization Trick Variational ropout and the Local Reparameterization Trick iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and

More information

FAST ADAPTATION IN GENERATIVE MODELS WITH GENERATIVE MATCHING NETWORKS

FAST ADAPTATION IN GENERATIVE MODELS WITH GENERATIVE MATCHING NETWORKS FAST ADAPTATION IN GENERATIVE MODELS WITH GENERATIVE MATCHING NETWORKS Sergey Bartunov & Dmitry P. Vetrov National Research University Higher School of Economics (HSE), Yandex Moscow, Russia ABSTRACT We

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

arxiv: v1 [cs.lg] 8 Dec 2016

arxiv: v1 [cs.lg] 8 Dec 2016 Improved generator objectives for GANs Ben Poole Stanford University poole@cs.stanford.edu Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova Google Brain {alemi, jaschasd, anelia}@google.com arxiv:1612.02780v1

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models Auto-Encoding Variational Bayes Diederik Kingma and Max Welling Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo J. Rezende, Shakir Mohamed, Daan Wierstra Neural Variational

More information

arxiv: v8 [stat.ml] 3 Mar 2014

arxiv: v8 [stat.ml] 3 Mar 2014 Stochastic Gradient VB and the Variational Auto-Encoder arxiv:1312.6114v8 [stat.ml] 3 Mar 2014 Diederik P. Kingma Machine Learning Group Universiteit van Amsterdam dpkingma@gmail.com Abstract Max Welling

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Searching for the Principles of Reasoning and Intelligence

Searching for the Principles of Reasoning and Intelligence Searching for the Principles of Reasoning and Intelligence Shakir Mohamed shakir@deepmind.com DALI 2018 @shakir_a Statistical Operations Estimation and Learning Inference Hypothesis Testing Summarisation

More information

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Variational Inference for Monte Carlo Objectives

Variational Inference for Monte Carlo Objectives Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend

More information

VAE with a VampPrior. Abstract. 1 Introduction

VAE with a VampPrior. Abstract. 1 Introduction Jakub M. Tomczak University of Amsterdam Max Welling University of Amsterdam Abstract Many different methods to train deep generative models have been introduced in the past. In this paper, we propose

More information

Denoising Criterion for Variational Auto-Encoding Framework

Denoising Criterion for Variational Auto-Encoding Framework Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Denoising Criterion for Variational Auto-Encoding Framework Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Semi-Supervised Learning with the Deep Rendering Mixture Model

Semi-Supervised Learning with the Deep Rendering Mixture Model Semi-Supervised Learning with the Deep Rendering Mixture Model Tan Nguyen 1,2 Wanjia Liu 1 Ethan Perez 1 Richard G. Baraniuk 1 Ankit B. Patel 1,2 1 Rice University 2 Baylor College of Medicine 6100 Main

More information

arxiv: v1 [stat.ml] 8 Jun 2015

arxiv: v1 [stat.ml] 8 Jun 2015 Variational ropout and the Local Reparameterization Trick arxiv:1506.0557v1 [stat.ml] 8 Jun 015 iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Training Deep Generative Models: Variations on a Theme

Training Deep Generative Models: Variations on a Theme Training Deep Generative Models: Variations on a Theme Philip Bachman McGill University, School of Computer Science phil.bachman@gmail.com Doina Precup McGill University, School of Computer Science dprecup@cs.mcgill.ca

More information

arxiv: v1 [cs.lg] 6 Dec 2018

arxiv: v1 [cs.lg] 6 Dec 2018 Embedding-reparameterization procedure for manifold-valued latent variables in generative models arxiv:1812.02769v1 [cs.lg] 6 Dec 2018 Eugene Golikov Neural Networks and Deep Learning Lab Moscow Institute

More information

arxiv: v1 [cs.lg] 22 Mar 2016

arxiv: v1 [cs.lg] 22 Mar 2016 Information Theoretic-Learning Auto-Encoder arxiv:1603.06653v1 [cs.lg] 22 Mar 2016 Eder Santana University of Florida Jose C. Principe University of Florida Abstract Matthew Emigh University of Florida

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow, Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation

More information

Operator Variational Inference

Operator Variational Inference Operator Variational Inference Rajesh Ranganath Princeton University Jaan Altosaar Princeton University Dustin Tran Columbia University David M. Blei Columbia University Abstract Variational inference

More information

Quasi-Monte Carlo Flows

Quasi-Monte Carlo Flows Quasi-Monte Carlo Flows Florian Wenzel TU Kaiserslautern Germany wenzelfl@hu-berlin.de Alexander Buchholz ENSAE-CREST, Paris France alexander.buchholz@ensae.fr Stephan Mandt Univ. of California, Irvine

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Deep latent variable models

Deep latent variable models Deep latent variable models Pierre-Alexandre Mattei IT University of Copenhagen http://pamattei.github.io @pamattei 19 avril 2018 Séminaire de statistique du CNAM 1 Overview of talk A short introduction

More information

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models RBAR: Low-variance, unbiased gradient estimates for discrete latent variable models George Tucker 1,, Andriy Mnih 2, Chris J. Maddison 2,3, Dieterich Lawson 1,*, Jascha Sohl-Dickstein 1 1 Google Brain,

More information

Implicit Variational Inference with Kernel Density Ratio Fitting

Implicit Variational Inference with Kernel Density Ratio Fitting Jiaxin Shi * 1 Shengyang Sun * Jun Zhu 1 Abstract Recent progress in variational inference has paid much attention to the flexibility of variational posteriors. Work has been done to use implicit distributions,

More information

arxiv: v4 [stat.co] 19 May 2015

arxiv: v4 [stat.co] 19 May 2015 Markov Chain Monte Carlo and Variational Inference: Bridging the Gap arxiv:14.6460v4 [stat.co] 19 May 201 Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam Abstract Recent

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

17 : Optimization and Monte Carlo Methods

17 : Optimization and Monte Carlo Methods 10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

arxiv: v1 [cs.lg] 28 Dec 2017

arxiv: v1 [cs.lg] 28 Dec 2017 PixelSNAIL: An Improved Autoregressive Generative Model arxiv:1712.09763v1 [cs.lg] 28 Dec 2017 Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel Embodied Intelligence UC Berkeley, Department of

More information

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap Markov Chain Monte Carlo and Variational Inference: Bridging the Gap Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam TIM@ALGORITMICA.NL [D.P.KINGMA,M.WELLING]@UVA.NL

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

arxiv: v1 [cs.lg] 20 Apr 2017

arxiv: v1 [cs.lg] 20 Apr 2017 Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).

More information

arxiv: v1 [stat.ml] 10 Dec 2017

arxiv: v1 [stat.ml] 10 Dec 2017 Sensitivity Analysis for Predictive Uncertainty in Bayesian Neural Networks Stefan Depeweg 1,2, José Miguel Hernández-Lobato 3, Steffen Udluft 2, Thomas Runkler 1,2 arxiv:1712.03605v1 [stat.ml] 10 Dec

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Variational Autoencoders

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Variational Autoencoders TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2018 Variational Autoencoders 1 The Latent Variable Cross-Entropy Objective We will now drop the negation and switch to argmax. Φ = argmax

More information

arxiv: v4 [cs.lg] 11 Jun 2018

arxiv: v4 [cs.lg] 11 Jun 2018 : Unifying Variational Autoencoders and Generative Adversarial Networks Lars Mescheder 1 Sebastian Nowozin 2 Andreas Geiger 1 3 arxiv:1701.04722v4 [cs.lg] 11 Jun 2018 Abstract Variational Autoencoders

More information

Gradient Estimation Using Stochastic Computation Graphs

Gradient Estimation Using Stochastic Computation Graphs Gradient Estimation Using Stochastic Computation Graphs Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology May 9, 2017 Outline Stochastic Gradient Estimation

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

arxiv: v1 [cs.lg] 7 Jun 2017

arxiv: v1 [cs.lg] 7 Jun 2017 InfoVAE: Information Maximizing Variational Autoencoders Shengjia Zhao, Jiaming Song, Stefano Ermon Stanford University {zhaosj12, tsong, ermon}@stanford.edu arxiv:1706.02262v1 [cs.lg] 7 Jun 2017 Abstract

More information

Wild Variational Approximations

Wild Variational Approximations Wild Variational Approximations Yingzhen Li University of Cambridge Cambridge, CB2 1PZ, UK yl494@cam.ac.uk Qiang Liu Dartmouth College Hanover, NH 03755, USA qiang.liu@dartmouth.edu Abstract We formalise

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Variational Dropout via Empirical Bayes

Variational Dropout via Empirical Bayes Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Evaluating the Variance of

Evaluating the Variance of Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic

More information

arxiv: v3 [cs.lg] 25 Feb 2016 ABSTRACT

arxiv: v3 [cs.lg] 25 Feb 2016 ABSTRACT MUPROP: UNBIASED BACKPROPAGATION FOR STOCHASTIC NEURAL NETWORKS Shixiang Gu 1 2, Sergey Levine 3, Ilya Sutskever 3, and Andriy Mnih 4 1 University of Cambridge 2 MPI for Intelligent Systems, Tübingen,

More information

DeepCP: Flexible Nonlinear Tensor Decomposition

DeepCP: Flexible Nonlinear Tensor Decomposition DeepCP: Flexible Nonlinear Tensor Decomposition Bin Liu, Lirong He SMILELab, Sch. of Comp. Sci.& Eng., University of Electronic Science and Technology of China {liu, lirong_he}@std.uestc.edu.cn, Shandian

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Scalable Non-linear Beta Process Factor Analysis

Scalable Non-linear Beta Process Factor Analysis Scalable Non-linear Beta Process Factor Analysis Kai Fan Duke University kai.fan@stat.duke.edu Katherine Heller Duke University kheller@stat.duke.com Abstract We propose a non-linear extension of the factor

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Variational Auto Encoders

Variational Auto Encoders Variational Auto Encoders 1 Recap Deep Neural Models consist of 2 primary components A feature extraction network Where important aspects of the data are amplified and projected into a linearly separable

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

arxiv: v3 [cs.lg] 13 Feb 2019

arxiv: v3 [cs.lg] 13 Feb 2019 Sparsity in Variational Autoencoders Andrea Asperti arxiv:82.07238v3 [cs.lg] 3 Feb 209 Abstract Working in high-dimensional latent spaces, the internal encoding of data in Variational Autoencoders becomes

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

Deep Neural Networks as Gaussian Processes

Deep Neural Networks as Gaussian Processes Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,

More information

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Variational Autoencoders Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Contents 1. Why unsupervised learning, and why generative models? (Selected slides from Yann

More information

A Recurrent Variational Autoencoder for Human Motion Synthesis

A Recurrent Variational Autoencoder for Human Motion Synthesis HABIBIE ET AL.: A RECURRENT VAE FOR HUMAN MOTION SYNTHESIS 1 A Recurrent Variational Autoencoder for Human Motion Synthesis Ikhsanul Habibie abie.ikhsan@gmail.com Daniel Holden contact@theorangeduck.com

More information