arxiv: v2 [stat.ml] 15 Aug 2017

Size: px
Start display at page:

Download "arxiv: v2 [stat.ml] 15 Aug 2017"

Transcription

1 Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu {quaid.morris}@utoronto.ca arxiv: v2 [stat.ml] 5 Aug 207 ABSTRACT The standard interpretation of importance-weighted autoencoders is that they maximie a tighter lower bound on the marginal lielihood than the standard evidence lower bound. We give an alternate interpretation of this procedure: that it optimies the standard variational lower bound, but using a more complex distribution. We formally derive this result, present a tighter lower bound, and visualie the implicit importance-weighted distribution. BACKGROUND The importance-weighted autoencoder IWAE; Burda et al. 206)) is a variational inference strategy capable of producing arbitrarily tight evidence lower bounds. IWAE maximies the following multisample evidence lower bound ELBO): log px) E... q x) [ log i )] px, i ) L IW AE [q] IWAE ELBO) q i x) which is a tighter lower bound than the ELBO maximied by the variational autoencoder VAE; Kingma & Welling 204)): )] px, ) log px) E q x) [log L V AE [q]. VAE ELBO) q x) 2 DEFINING THE IMPLICIT DISTRIBUTION q IW In this section, we derive the implicit distribution that arises from importance sampling from a distribution p using q as a proposal distribution. Given a batch of samples 2... from q x), the following is the unnormalied importance-weighted distribution: q IW x, 2: ) px,) q x) px, j) j q j x) q x) px, ) px,) q x) + ) ) px, j) j2 q j x) True posterior 0 00 Figure : Approximations to a complex true distribution, defined via q EW. As grows, this approximation approaches the true distribution.

2 Worshop trac - ICLR 207 Here are some properties of the approximate IWAE posterior: When, q IW x, 2: ) equals q x). When >, the form of q IW x, 2: ) depends on the true posterior p x). As, E 2... [ q IW x, 2: )] approaches the true posterior p x) pointwise. See the appendix for details. Importantly, q IW x, 2: ) is dependent on the batch of samples See Fig. 3 in the appendix for a visualiation of q IW with different batches of RECOVERING THE IWAE BOUND FROM THE VAE BOUND Here we show that the IWAE ELBO is equivalent to the VAE ELBO in expectation, but with a more flexible, unnormalied q IW distribution, implicitly defined by importance reweighting. If we replace q x) with q IW x, 2: ) and tae an expectation over 2..., then we recover the IWAE ELBO: [ ) ] px, ) E 2... q x) [L V AE [ q IW 2: )]] E 2... q x) q IW 2: ) log d q IW x, 2: ) [ ) ] px, i ) E 2... q x) q IW 2: ) log d q i i x) [ )] px, i ) E... q x) log L IW AE [q] q i x) For a more detailed derivation, see the appendix. Note that we are abusing the VAE lower bound notation because this implies an expectation over an unnormalied distribution. Consequently, we replace the expectation with an equivalent integral. i 2.2 EXPECTED IMPORTANCE WEIGHTED DISTRIBUTION q EW We can achieve a tighter lower bound than L IW AE [q] by taing the expectation over 2... of q IW. The expected importance-weighted distribution q EW x) is a distribution given by: q EW x) E 2... q x) [ q IW x, 2: )] E 2... px, ) q x) px,) q x) + ) px, j) j2 q j x) 2) See section 5.2 for a proof that q EW is a normalied distribution. Using q EW in the VAE ELBO, L V AE [q EW ], results in an upper bound of L IW AE [q]. See section 5.3 for the proof, which is a special case of the proof in Naesseth et al. 207). The procedure to sample from q EW x) is shown in Algorithm. It is equivalent to sampling-importance-resampling SIR). 2.3 VISUALIZING THE NONPARAMETERIC APPROXIMATE POSTERIOR The IWAE approximating distribution is nonparametric in the sense that, as the true posterior grows more complex, so does the shape of q IW and q EW. This maes plotting these distributions challenging. A ernel-density-estimation approach could be used, but requires many samples. Thanfully, equations ) and 2) give us a simple and fast way to approximately plot q IW and q EW without introducing artifacts due to ernel density smoothing. Figure visualies q EW on a 2D distribution approximation problem using Algorithm 2. The base distribution q is a Gaussian. As we increase the number of samples and eep the base distribution fixed, we see that the approximation approaches the true distribution. See section 5.6 for D visualiations of q IW and q EW with 2. 3 RESAMPLING FOR PREDICTION During training, we sample the q distribution and implicitly weight them with the IWAE ELBO. After training, we need to explicitly reweight samples from q. 2

3 Worshop trac - ICLR 207 Algorithm Sampling q EW x) : number of importance samples 2: for i in... do 3: i q x) 4: w i px,i) q i x) 5: Each w i w i / i w i 6: j Categorical w) 7: Return j Algorithm 2 Plotting q EW x) : number of importance samples 2: S number of function samples 3: L locations to plot 4: ˆf eros L ) 5: for s in...s do 6: 2... q x) 7: ˆpx) i2 8: for in L do px, i) q i x) 9: ˆf[] + px,) 0: Return ˆf/S px,) +ˆpx)) q x) Real Sample q x) Sample q EW x) Figure 2: Reconstructions of MNIST samples from q x) and q EW. The model was trained by maximiing the IWAE ELBO with K50 and 2 latent dimensions. The reconstructions from q x) are greatly improved with the sampling-resampling step of q EW. In figure 2, we demonstrate the need to sample from q EW rather than q x) for reconstructing MNIST digits. We trained the model to maximie the IWAE ELBO with K50 and 2 latent dimensions, similar to Appendix C in Burda et al. 206). When we sample from q x) and reconstruct the samples, we see a number of anomalies. However, if we perform the sampling-resampling step Alg. ), then the reconstructions are much more accurate. The intuition here is that we trained the model with q EW with K 50 then sampled from q x) q EW with K ), which are very different distributions, as seen in Fig.. 4 DISCUSSION Bachman & Precup 205) also showed that the IWAE objective is equivalent to stochastic variational inference with a proposal distribution corrected towards the true posterior via normalied importance sampling. We build on this idea by further examining q IW and by providing visualiations to help better grasp the interpretation. To summarie our observations, the following is the ordering of lower bounds given specific proposal distributions, log px) L V AE [q EW ] E 2... q x) [L V AE [ q IW 2: )]] L IW AE [q] L V AE [q] In light of this, IWAE can be seen as increasing the complexity of the approximate distribution q, similar to other methods that increase the complexity of q, such as Normaliing Flows Jimene Reende & Mohamed, 205), Variational Boosting Miller et al., 206) or Hamiltonian variational inference Salimans et al., 205). With this interpretation in mind, we can possibly generalie q IW to be applicable to other divergence measures. An interesting avenue of future wor is the comparison of IW-based variational families with alpha-divergences or operator variational objectives. 3

4 Worshop trac - ICLR 207 ACKNOWLEDGMENTS We d lie to than an anonymous ICLR reviewer for providing insightful future directions for this wor. We d lie to than Yuri Burda, Christian Naesseth, and Scott Linderman for bringing our attention to oversights in the paper. We d also lie to than Christian Naesseth for the derivation in section 5.3 and for providing many helpful comments. REFERENCES Philip Bachman and Doina Precup. Training Deep Generative Models: Variations on a Theme. NIPS Approximate Inference Worshop, 205. Yuri Burda, Roger Grosse, and Ruslan Salahutdinov. Importance weighted autoencoders. In ICLR, 206. Danilo Jimene Reende and Shair Mohamed. Variational Inference with Normaliing Flows. In ICML, 205. Diederi P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In ICLR, 204. Andrew C. Miller, Nicholas Foti, and Ryan P. Adams. Variational Boosting: Iteratively Refining Posterior Approximations. Advances in Approximate Bayesian Inference, NIPS Worshop, 206. C. A. Naesseth, S. W. Linderman, R. Ranganath, and D. M. Blei. Variational Sequential Monte Carlo. ArXiv preprint, 207. Tim Salimans, Diederi P. Kingma, and Max Welling. Marov chain monte carlo and variational inference: Bridging the gap. In ICML,

5 Worshop trac - ICLR APPENDIX 5. DETAILED DERIVATION OF THE EQUIVALENCE OF VAE AND IWAE BOUND Here we show that the expectation over 2... of the VAE lower bound with the unnomalied importance-weighted distribution q IW, L VAE [ q IW 2: )], is equivalent to the IWAE bound with the original q distribution, L IWAE [q]. E 2... q x)[l VAE [ q IW 2: )]] E 2... q x) [ E 2... q x) E 2... q x) E 2... q x) [ [ E 2... q x) E 2... q x) E... q x) E... q x) E... q x) q IW x, 2: ) log q IW x, 2: ) log q IW x, 2: ) log q IW x, 2: ) log px,) q x) px, j) j q j x) j px, ) q x) px, j) q j x) px, ) q x) px, j) j q j x) px, j) j q j x) px, j) j q j x) [ log i log log px, i ) q i x) px, ) q IW x, 2: ) px, ) px,) px, i ) i q i x) i ) ] d 3) d ) px, i ) d q i x) 4) ] 5) ) ] px, i ) d q i x) i q x) log q x) log )] i i i i 6) px, i ) q i x) 7) px, i ) q i x) 8) ) px, i ) 9) q i x) px, i ) q i x) ) 0) ) L IWAE [q] 2) 8): Change of notation. 0): i has the same expectation as so we can replace with the sum of terms. ) d ) d 5

6 Worshop trac - ICLR PROOF THAT q EW IS A NORMALIZED DISTRIBUTION q EW x)d E 2... q x) [ q IW x, 2: )] d 3) E 2... px, ) q x) px,) q x) + ) d 4) px, j) j2 q j x) q x) q x) E 2... px, ) q x) px,) q x) + ) d 5) px, j) j2 q j x) E q x) E 2... q x) E... q x) E... q x) i E... q x) px, ) q x) j i E... q x) j px,) px,) q x) q x) + px, j) q j x) px, ) q x) px, j) j q j x) px, i) q i x) px, j) j q j x) px, i) q i x) px, j) q j x) j2 px, j) q j x) ) 6) ) 7) 8) 9) 20) E... q x) [] 2) 22) 7): Change of notation. 9): i has the same expectation as so we can replace with the sum of terms. 20): Linearity of expectation. 6

7 Worshop trac - ICLR PROOF THAT L V AE [q EW ] IS AN UPPER BOUND OF L IW AE [q] Proof provided by Christian Naesseth. Let ˆpx : ) px,) q x) + j2 px, j) q j x) )] px, ) L V AE [q EW ] E qew [log q EW x) E qew log px, ) E qew log ) E q2: x) E q2: x) [ px,) ˆpx : ) [ ˆpx : ) 23) ] 24) ] 25) [ E qew log Eq2: x) [ˆpx : ) ])] 26) px, )E q2: x) [ˆpx : ) ] log E q2: x) [ˆpx : ) ]) d 27) px, )E q2: x) [ˆpx : ) log ˆpx : ) )] d 28) px, ) q 2: x)ˆpx : ) log ˆpx : ) ) d 29) 2: q x) : q x) px, )q 2: x)ˆpx : ) log ˆpx : ) ) d 30) px, ) : q x) q : x)ˆpx : ) log ˆpx : ) ) d 3) px, ) q x) : ˆpx : ) q : x)log ˆpx : )) d 32) px, ) q x) q px, : x)log px, j ) d 33) j) : q j q j x) j j x) px, i) q i x) q px, : x)log px, j ) d 34) j) i : q j q j x) j j x) px, i) i q i x) q px, : x)log px, j ) d 35) j) : q j q j x) j j x) q : x)log px, j ) d 36) : q j j x) E q: ) log px, j ) 37) q j x) j L IW AE [q] 38) 28): Given that fa) AlogA is concave for A > 0, and fe[x]) E[fx)], then fe[x]) E[x]logE[x] E[ xlogx]. 30): Change of notation. 34): i has the same expectation as so we can replace with the sum of terms. 7

8 Worshop trac - ICLR PROOF THAT q EW IS CLOSER TO THE TRUE POSTERIOR THAN q The previous section showed that L IW AE q) L V AE q EW ). That is, the IWAE ELBO with the base q is a lower bound to the VAE ELBO with the importance weighted q EW. Due to Jensen's inequality and as shown in Burda et al. 206), we now that the IWAE ELBO is an upper bound of the VAE ELBO: L IW AE q) L V AE q). Furthermore, the log marginal lielihood can be factoried into: logpx)) L V AE q) + KLq p), and rearranged to: KLq p) logpx)) L V AE q). Following the observations above and substituting q EW for q: KLq EW p) logpx)) L V AE [q EW ] 39) logpx)) L IW AE [q] 40) logpx)) L V AE [q] KLq p) 4) Thus, KLq EW p) KLq p), meaning q EW is closer to the true posterior than q in terms of KL divergence. 5.5 IN THE LIMIT OF THE NUMBER OF SAMPLES Another perspective is in the limit of. Recall that the marginal lielihood can be approximated by importance sampling: [ ] px, ) px) E q x) px, i ) 42) q x) q i x) where i is sampled from q x). We see that the denominator of q IW is approximating px). If px,) q x) is bounded, then it follows from the strong law of large numbers that, as approaches infinity, q IW converges to the true posterior p x) almost surely. This interpretation becomes clearer when we factor out the true posterior from q IW : q IW x, 2: ) i px) px,) q x) + )p x) 43) px, j) j2 q j x) We see that the closer the denominator becomes to px), the closer q IW is to the true posterior. 5.6 VISUALIZING q IW AND q EW IN D Figure 3: Visualiation of D q IW and q EW distributions. The blue p) distribution and the green q) distribution are both normalied. The three instances of q IW 2 ) 2) have different 2 samples from q) and we can see that they are unnormalied. q EW ) is normalied and is the expectation over 30 q IW 2 ) distributions. The q IW distributions were plotted using Algorithm 2 with S. 8

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

Inference Suboptimality in Variational Autoencoders

Inference Suboptimality in Variational Autoencoders Inference Suboptimality in Variational Autoencoders Chris Cremer Department of Computer Science University of Toronto ccremer@cs.toronto.edu Xuechen Li Department of Computer Science University of Toronto

More information

INFERENCE SUBOPTIMALITY

INFERENCE SUBOPTIMALITY INFERENCE SUBOPTIMALITY IN VARIATIONAL AUTOENCODERS Anonymous authors Paper under double-blind review ABSTRACT Amortized inference has led to efficient approximate inference for large datasets. The quality

More information

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion

More information

Nonparametric Inference for Auto-Encoding Variational Bayes

Nonparametric Inference for Auto-Encoding Variational Bayes Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Training Deep Generative Models: Variations on a Theme

Training Deep Generative Models: Variations on a Theme Training Deep Generative Models: Variations on a Theme Philip Bachman McGill University, School of Computer Science phil.bachman@gmail.com Doina Precup McGill University, School of Computer Science dprecup@cs.mcgill.ca

More information

Natural Gradients via the Variational Predictive Distribution

Natural Gradients via the Variational Predictive Distribution Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

DEBIASING EVIDENCE APPROXIMATIONS: ON IMPORTANCE-WEIGHTED AUTOENCODERS AND JACKKNIFE VARIATIONAL INFERENCE

DEBIASING EVIDENCE APPROXIMATIONS: ON IMPORTANCE-WEIGHTED AUTOENCODERS AND JACKKNIFE VARIATIONAL INFERENCE DEBIASING EVIDENCE APPROXIMATIONS: ON IMPORTANCE-WEIGHTED AUTOENCODERS AND JACNIFE VARIATIONAL INFERENCE Sebastian Nowozin Machine Intelligence and Perception Microsoft Research, Cambridge, U Sebastian.Nowozin@microsoft.com

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

arxiv: v1 [stat.ml] 6 Dec 2018

arxiv: v1 [stat.ml] 6 Dec 2018 missiwae: Deep Generative Modelling and Imputation of Incomplete Data arxiv:1812.02633v1 [stat.ml] 6 Dec 2018 Pierre-Alexandre Mattei Department of Computer Science IT University of Copenhagen pima@itu.dk

More information

The Success of Deep Generative Models

The Success of Deep Generative Models The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Approximate Bayesian inference

Approximate Bayesian inference Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic

More information

Denoising Criterion for Variational Auto-Encoding Framework

Denoising Criterion for Variational Auto-Encoding Framework Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Denoising Criterion for Variational Auto-Encoding Framework Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua

More information

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business

More information

Tighter Variational Bounds are Not Necessarily Better

Tighter Variational Bounds are Not Necessarily Better Tom Rainforth Adam R. osiorek 2 Tuan Anh Le 2 Chris J. Maddison Maximilian Igl 2 Frank Wood 3 Yee Whye Teh & amp, 988; Hinton & emel, 994; Gregor et al., 26; Chen et al., 27, for which the inference network

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

17 : Optimization and Monte Carlo Methods

17 : Optimization and Monte Carlo Methods 10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Bayesian Semi-supervised Learning with Deep Generative Models

Bayesian Semi-supervised Learning with Deep Generative Models Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Resampled Proposal Distributions for Variational Inference and Learning

Resampled Proposal Distributions for Variational Inference and Learning Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models

More information

Resampled Proposal Distributions for Variational Inference and Learning

Resampled Proposal Distributions for Variational Inference and Learning Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

arxiv: v1 [cs.lg] 6 Dec 2018

arxiv: v1 [cs.lg] 6 Dec 2018 Embedding-reparameterization procedure for manifold-valued latent variables in generative models arxiv:1812.02769v1 [cs.lg] 6 Dec 2018 Eugene Golikov Neural Networks and Deep Learning Lab Moscow Institute

More information

Variational AutoEncoder: An Introduction and Recent Perspectives

Variational AutoEncoder: An Introduction and Recent Perspectives Metode de optimizare Riemanniene pentru învățare profundă Proiect cofinanțat din Fondul European de Dezvoltare Regională prin Programul Operațional Competitivitate 2014-2020 Variational AutoEncoder: An

More information

GENERATIVE ADVERSARIAL LEARNING

GENERATIVE ADVERSARIAL LEARNING GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate

More information

Deep Generative Models

Deep Generative Models Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation

More information

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic

More information

Variational Inference for Monte Carlo Objectives

Variational Inference for Monte Carlo Objectives Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Variational Sequential Monte Carlo

Variational Sequential Monte Carlo Variational Sequential Monte Carlo Christian A. Naesseth Scott W. Linderman Rajesh Ranganath David M. Blei Linköping University Columbia University New York University Columbia University Abstract Many

More information

Local Expectation Gradients for Doubly Stochastic. Variational Inference

Local Expectation Gradients for Doubly Stochastic. Variational Inference Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

CLOSE-TO-CLEAN REGULARIZATION RELATES

CLOSE-TO-CLEAN REGULARIZATION RELATES Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of

More information

B PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018

B PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018 A ADVERSARIAL DOMAIN ADAPTATION (ADA ADA aims to transfer prediction nowledge learned from a source domain with labeled data to a target domain without labels, by learning domain-invariant features. Let

More information

Variational Bayes on Monte Carlo Steroids

Variational Bayes on Monte Carlo Steroids Variational Bayes on Monte Carlo Steroids Aditya Grover, Stefano Ermon Department of Computer Science Stanford University {adityag,ermon}@cs.stanford.edu Abstract Variational approaches are often used

More information

Deep latent variable models

Deep latent variable models Deep latent variable models Pierre-Alexandre Mattei IT University of Copenhagen http://pamattei.github.io @pamattei 19 avril 2018 Séminaire de statistique du CNAM 1 Overview of talk A short introduction

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Searching for the Principles of Reasoning and Intelligence

Searching for the Principles of Reasoning and Intelligence Searching for the Principles of Reasoning and Intelligence Shakir Mohamed shakir@deepmind.com DALI 2018 @shakir_a Statistical Operations Estimation and Learning Inference Hypothesis Testing Summarisation

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Fault Detection and Diagnosis Using Information Measures

Fault Detection and Diagnosis Using Information Measures Fault Detection and Diagnosis Using Information Measures Rudolf Kulhavý Honewell Technolog Center & Institute of Information Theor and Automation Prague, Cech Republic Outline Probabilit-based inference

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

UvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication

UvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication UvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication Citation for published version (APA): Tomczak, J. M., &

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

arxiv: v4 [stat.co] 19 May 2015

arxiv: v4 [stat.co] 19 May 2015 Markov Chain Monte Carlo and Variational Inference: Bridging the Gap arxiv:14.6460v4 [stat.co] 19 May 201 Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam Abstract Recent

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

Variational Inference via χ Upper Bound Minimization

Variational Inference via χ Upper Bound Minimization Variational Inference via χ Upper Bound Minimization Adji B. Dieng Columbia University Dustin Tran Columbia University Rajesh Ranganath Princeton University John Paisley Columbia University David M. Blei

More information

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap Markov Chain Monte Carlo and Variational Inference: Bridging the Gap Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam TIM@ALGORITMICA.NL [D.P.KINGMA,M.WELLING]@UVA.NL

More information

arxiv: v3 [stat.ml] 30 Jun 2017

arxiv: v3 [stat.ml] 30 Jun 2017 Rui Shu 1 Hung H. Bui 2 Mohammad Ghavamadeh 3 arxiv:1611.08568v3 stat.ml] 30 Jun 2017 Abstract We introduce a new framework for training deep generative models for high-dimensional conditional density

More information

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Quasi-Monte Carlo Flows

Quasi-Monte Carlo Flows Quasi-Monte Carlo Flows Florian Wenzel TU Kaiserslautern Germany wenzelfl@hu-berlin.de Alexander Buchholz ENSAE-CREST, Paris France alexander.buchholz@ensae.fr Stephan Mandt Univ. of California, Irvine

More information

doubly semi-implicit variational inference

doubly semi-implicit variational inference Dmitry Molchanov 1,, Valery Kharitonov, Artem Sobolev 1 Dmitry Vetrov 1, 1 Samsung AI Center in Moscow National Research University Higher School of Economics, Joint Samsung-HSE lab Another way to overcome

More information

Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling

Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18 Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Ximing Li, Changchun

More information

Adversarial Sequential Monte Carlo

Adversarial Sequential Monte Carlo Adversarial Sequential Monte Carlo Kira Kempinska Department of Security and Crime Science University College London London, WC1E 6BT kira.kowalska.13@ucl.ac.uk John Shawe-Taylor Department of Computer

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Variational Inference with Stein Mixtures

Variational Inference with Stein Mixtures Variational Inference with Stein Mixtures Eric Nalisnick Department of Computer Science University of California, Irvine enalisni@uci.edu Padhraic Smyth Department of Computer Science University of California,

More information

Gray-box Inference for Structured Gaussian Process Models: Supplementary Material

Gray-box Inference for Structured Gaussian Process Models: Supplementary Material Gray-box Inference for Structured Gaussian Process Models: Supplementary Material Pietro Galliani, Amir Dezfouli, Edwin V. Bonilla and Novi Quadrianto February 8, 017 1 Proof of Theorem Here we proof the

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

arxiv: v3 [stat.ml] 3 Apr 2017

arxiv: v3 [stat.ml] 3 Apr 2017 STICK-BREAKING VARIATIONAL AUTOENCODERS Eric Nalisnick Department of Computer Science University of California, Irvine enalisni@uci.edu Padhraic Smyth Department of Computer Science University of California,

More information

Gradient Estimation Using Stochastic Computation Graphs

Gradient Estimation Using Stochastic Computation Graphs Gradient Estimation Using Stochastic Computation Graphs Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology May 9, 2017 Outline Stochastic Gradient Estimation

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM) Algorithm Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Improving Variational Inference with Inverse Autoregressive Flow

Improving Variational Inference with Inverse Autoregressive Flow Improving Variational Inference with Inverse Autoregressive Flow Diederik P. Kingma dpkingma@openai.com Tim Salimans tim@openai.com Rafal Jozefowicz rafal@openai.com Xi Chen peter@openai.com Ilya Sutskever

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

arxiv: v5 [cs.lg] 11 Jul 2018

arxiv: v5 [cs.lg] 11 Jul 2018 On Unifying Deep Generative Models Zhiting Hu 1,2, Zichao Yang 1, Ruslan Salakhutdinov 1, Eric Xing 2 {zhitingh,zichaoy,rsalakhu}@cs.cmu.edu, eric.xing@petuum.com Carnegie Mellon University 1, Petuum Inc.

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

arxiv: v2 [cs.ne] 22 Feb 2013

arxiv: v2 [cs.ne] 22 Feb 2013 Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA

More information

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014

More information

Image recognition based on hidden Markov eigen-image models using variational Bayesian method

Image recognition based on hidden Markov eigen-image models using variational Bayesian method Image recognition based on hidden Marov eigen-image models using variational Bayesian method Kei Sawada, Kei Hashimoto, Yoshihio Nanau and Keiichi Touda Department of Scientific and Engineering Simulation,

More information

Operator Variational Inference

Operator Variational Inference Operator Variational Inference Rajesh Ranganath Princeton University Jaan Altosaar Princeton University Dustin Tran Columbia University David M. Blei Columbia University Abstract Variational inference

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

On the Quantitative Analysis of Decoder-Based Generative Models

On the Quantitative Analysis of Decoder-Based Generative Models On the Quantitative Analysis of Decoder-Based Generative Models Yuhuai Wu Department of Computer Science University of Toronto ywu@cs.toronto.edu Yuri Burda OpenAI yburda@openai.com Ruslan Salakhutdinov

More information

arxiv:submit/ [stat.ml] 12 Nov 2017

arxiv:submit/ [stat.ml] 12 Nov 2017 Variational Inference via χ Upper Bound Minimization arxiv:submit/26872 [stat.ml] 12 Nov 217 Adji B. Dieng Columbia University John Paisley Columbia University Dustin Tran Columbia University Abstract

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models RBAR: Low-variance, unbiased gradient estimates for discrete latent variable models George Tucker 1,, Andriy Mnih 2, Chris J. Maddison 2,3, Dieterich Lawson 1,*, Jascha Sohl-Dickstein 1 1 Google Brain,

More information

Sequential Monte Carlo in the machine learning toolbox

Sequential Monte Carlo in the machine learning toolbox Sequential Monte Carlo in the machine learning toolbox Working with the trend of blending Thomas Schön Uppsala University Sweden. Symposium on Advances in Approximate Bayesian Inference (AABI) Montréal,

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Improving Variational Inference with Inverse Autoregressive Flow arxiv:1606.04934v1 [cs.lg] 15 Jun 2016 Diederik P. Kingma, Tim Salimans and Max Welling OpenAI, San Francisco University of Amsterdam, University

More information

Variational Auto Encoders

Variational Auto Encoders Variational Auto Encoders 1 Recap Deep Neural Models consist of 2 primary components A feature extraction network Where important aspects of the data are amplified and projected into a linearly separable

More information

Variational Inference with Copula Augmentation

Variational Inference with Copula Augmentation Variational Inference with Copula Augmentation Dustin Tran 1 David M. Blei 2 Edoardo M. Airoldi 1 1 Department of Statistics, Harvard University 2 Department of Statistics & Computer Science, Columbia

More information

arxiv: v3 [stat.ml] 15 Mar 2018

arxiv: v3 [stat.ml] 15 Mar 2018 Operator Variational Inference Rajesh Ranganath Princeton University Jaan ltosaar Princeton University Dustin Tran Columbia University David M. Blei Columbia University arxiv:1610.09033v3 [stat.ml] 15

More information

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients Yu Chen University of Bristol yc14600@bristol.ac.u Tom Diethe Amazon tdiethe@amazon.com Neil Lawrence Amazon lawrennd@amazon.com

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Wild Variational Approximations

Wild Variational Approximations Wild Variational Approximations Yingzhen Li University of Cambridge Cambridge, CB2 1PZ, UK yl494@cam.ac.uk Qiang Liu Dartmouth College Hanover, NH 03755, USA qiang.liu@dartmouth.edu Abstract We formalise

More information