arxiv: v2 [stat.ml] 15 Aug 2017
|
|
- Marvin Merritt
- 5 years ago
- Views:
Transcription
1 Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu {quaid.morris}@utoronto.ca arxiv: v2 [stat.ml] 5 Aug 207 ABSTRACT The standard interpretation of importance-weighted autoencoders is that they maximie a tighter lower bound on the marginal lielihood than the standard evidence lower bound. We give an alternate interpretation of this procedure: that it optimies the standard variational lower bound, but using a more complex distribution. We formally derive this result, present a tighter lower bound, and visualie the implicit importance-weighted distribution. BACKGROUND The importance-weighted autoencoder IWAE; Burda et al. 206)) is a variational inference strategy capable of producing arbitrarily tight evidence lower bounds. IWAE maximies the following multisample evidence lower bound ELBO): log px) E... q x) [ log i )] px, i ) L IW AE [q] IWAE ELBO) q i x) which is a tighter lower bound than the ELBO maximied by the variational autoencoder VAE; Kingma & Welling 204)): )] px, ) log px) E q x) [log L V AE [q]. VAE ELBO) q x) 2 DEFINING THE IMPLICIT DISTRIBUTION q IW In this section, we derive the implicit distribution that arises from importance sampling from a distribution p using q as a proposal distribution. Given a batch of samples 2... from q x), the following is the unnormalied importance-weighted distribution: q IW x, 2: ) px,) q x) px, j) j q j x) q x) px, ) px,) q x) + ) ) px, j) j2 q j x) True posterior 0 00 Figure : Approximations to a complex true distribution, defined via q EW. As grows, this approximation approaches the true distribution.
2 Worshop trac - ICLR 207 Here are some properties of the approximate IWAE posterior: When, q IW x, 2: ) equals q x). When >, the form of q IW x, 2: ) depends on the true posterior p x). As, E 2... [ q IW x, 2: )] approaches the true posterior p x) pointwise. See the appendix for details. Importantly, q IW x, 2: ) is dependent on the batch of samples See Fig. 3 in the appendix for a visualiation of q IW with different batches of RECOVERING THE IWAE BOUND FROM THE VAE BOUND Here we show that the IWAE ELBO is equivalent to the VAE ELBO in expectation, but with a more flexible, unnormalied q IW distribution, implicitly defined by importance reweighting. If we replace q x) with q IW x, 2: ) and tae an expectation over 2..., then we recover the IWAE ELBO: [ ) ] px, ) E 2... q x) [L V AE [ q IW 2: )]] E 2... q x) q IW 2: ) log d q IW x, 2: ) [ ) ] px, i ) E 2... q x) q IW 2: ) log d q i i x) [ )] px, i ) E... q x) log L IW AE [q] q i x) For a more detailed derivation, see the appendix. Note that we are abusing the VAE lower bound notation because this implies an expectation over an unnormalied distribution. Consequently, we replace the expectation with an equivalent integral. i 2.2 EXPECTED IMPORTANCE WEIGHTED DISTRIBUTION q EW We can achieve a tighter lower bound than L IW AE [q] by taing the expectation over 2... of q IW. The expected importance-weighted distribution q EW x) is a distribution given by: q EW x) E 2... q x) [ q IW x, 2: )] E 2... px, ) q x) px,) q x) + ) px, j) j2 q j x) 2) See section 5.2 for a proof that q EW is a normalied distribution. Using q EW in the VAE ELBO, L V AE [q EW ], results in an upper bound of L IW AE [q]. See section 5.3 for the proof, which is a special case of the proof in Naesseth et al. 207). The procedure to sample from q EW x) is shown in Algorithm. It is equivalent to sampling-importance-resampling SIR). 2.3 VISUALIZING THE NONPARAMETERIC APPROXIMATE POSTERIOR The IWAE approximating distribution is nonparametric in the sense that, as the true posterior grows more complex, so does the shape of q IW and q EW. This maes plotting these distributions challenging. A ernel-density-estimation approach could be used, but requires many samples. Thanfully, equations ) and 2) give us a simple and fast way to approximately plot q IW and q EW without introducing artifacts due to ernel density smoothing. Figure visualies q EW on a 2D distribution approximation problem using Algorithm 2. The base distribution q is a Gaussian. As we increase the number of samples and eep the base distribution fixed, we see that the approximation approaches the true distribution. See section 5.6 for D visualiations of q IW and q EW with 2. 3 RESAMPLING FOR PREDICTION During training, we sample the q distribution and implicitly weight them with the IWAE ELBO. After training, we need to explicitly reweight samples from q. 2
3 Worshop trac - ICLR 207 Algorithm Sampling q EW x) : number of importance samples 2: for i in... do 3: i q x) 4: w i px,i) q i x) 5: Each w i w i / i w i 6: j Categorical w) 7: Return j Algorithm 2 Plotting q EW x) : number of importance samples 2: S number of function samples 3: L locations to plot 4: ˆf eros L ) 5: for s in...s do 6: 2... q x) 7: ˆpx) i2 8: for in L do px, i) q i x) 9: ˆf[] + px,) 0: Return ˆf/S px,) +ˆpx)) q x) Real Sample q x) Sample q EW x) Figure 2: Reconstructions of MNIST samples from q x) and q EW. The model was trained by maximiing the IWAE ELBO with K50 and 2 latent dimensions. The reconstructions from q x) are greatly improved with the sampling-resampling step of q EW. In figure 2, we demonstrate the need to sample from q EW rather than q x) for reconstructing MNIST digits. We trained the model to maximie the IWAE ELBO with K50 and 2 latent dimensions, similar to Appendix C in Burda et al. 206). When we sample from q x) and reconstruct the samples, we see a number of anomalies. However, if we perform the sampling-resampling step Alg. ), then the reconstructions are much more accurate. The intuition here is that we trained the model with q EW with K 50 then sampled from q x) q EW with K ), which are very different distributions, as seen in Fig.. 4 DISCUSSION Bachman & Precup 205) also showed that the IWAE objective is equivalent to stochastic variational inference with a proposal distribution corrected towards the true posterior via normalied importance sampling. We build on this idea by further examining q IW and by providing visualiations to help better grasp the interpretation. To summarie our observations, the following is the ordering of lower bounds given specific proposal distributions, log px) L V AE [q EW ] E 2... q x) [L V AE [ q IW 2: )]] L IW AE [q] L V AE [q] In light of this, IWAE can be seen as increasing the complexity of the approximate distribution q, similar to other methods that increase the complexity of q, such as Normaliing Flows Jimene Reende & Mohamed, 205), Variational Boosting Miller et al., 206) or Hamiltonian variational inference Salimans et al., 205). With this interpretation in mind, we can possibly generalie q IW to be applicable to other divergence measures. An interesting avenue of future wor is the comparison of IW-based variational families with alpha-divergences or operator variational objectives. 3
4 Worshop trac - ICLR 207 ACKNOWLEDGMENTS We d lie to than an anonymous ICLR reviewer for providing insightful future directions for this wor. We d lie to than Yuri Burda, Christian Naesseth, and Scott Linderman for bringing our attention to oversights in the paper. We d also lie to than Christian Naesseth for the derivation in section 5.3 and for providing many helpful comments. REFERENCES Philip Bachman and Doina Precup. Training Deep Generative Models: Variations on a Theme. NIPS Approximate Inference Worshop, 205. Yuri Burda, Roger Grosse, and Ruslan Salahutdinov. Importance weighted autoencoders. In ICLR, 206. Danilo Jimene Reende and Shair Mohamed. Variational Inference with Normaliing Flows. In ICML, 205. Diederi P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In ICLR, 204. Andrew C. Miller, Nicholas Foti, and Ryan P. Adams. Variational Boosting: Iteratively Refining Posterior Approximations. Advances in Approximate Bayesian Inference, NIPS Worshop, 206. C. A. Naesseth, S. W. Linderman, R. Ranganath, and D. M. Blei. Variational Sequential Monte Carlo. ArXiv preprint, 207. Tim Salimans, Diederi P. Kingma, and Max Welling. Marov chain monte carlo and variational inference: Bridging the gap. In ICML,
5 Worshop trac - ICLR APPENDIX 5. DETAILED DERIVATION OF THE EQUIVALENCE OF VAE AND IWAE BOUND Here we show that the expectation over 2... of the VAE lower bound with the unnomalied importance-weighted distribution q IW, L VAE [ q IW 2: )], is equivalent to the IWAE bound with the original q distribution, L IWAE [q]. E 2... q x)[l VAE [ q IW 2: )]] E 2... q x) [ E 2... q x) E 2... q x) E 2... q x) [ [ E 2... q x) E 2... q x) E... q x) E... q x) E... q x) q IW x, 2: ) log q IW x, 2: ) log q IW x, 2: ) log q IW x, 2: ) log px,) q x) px, j) j q j x) j px, ) q x) px, j) q j x) px, ) q x) px, j) j q j x) px, j) j q j x) px, j) j q j x) [ log i log log px, i ) q i x) px, ) q IW x, 2: ) px, ) px,) px, i ) i q i x) i ) ] d 3) d ) px, i ) d q i x) 4) ] 5) ) ] px, i ) d q i x) i q x) log q x) log )] i i i i 6) px, i ) q i x) 7) px, i ) q i x) 8) ) px, i ) 9) q i x) px, i ) q i x) ) 0) ) L IWAE [q] 2) 8): Change of notation. 0): i has the same expectation as so we can replace with the sum of terms. ) d ) d 5
6 Worshop trac - ICLR PROOF THAT q EW IS A NORMALIZED DISTRIBUTION q EW x)d E 2... q x) [ q IW x, 2: )] d 3) E 2... px, ) q x) px,) q x) + ) d 4) px, j) j2 q j x) q x) q x) E 2... px, ) q x) px,) q x) + ) d 5) px, j) j2 q j x) E q x) E 2... q x) E... q x) E... q x) i E... q x) px, ) q x) j i E... q x) j px,) px,) q x) q x) + px, j) q j x) px, ) q x) px, j) j q j x) px, i) q i x) px, j) j q j x) px, i) q i x) px, j) q j x) j2 px, j) q j x) ) 6) ) 7) 8) 9) 20) E... q x) [] 2) 22) 7): Change of notation. 9): i has the same expectation as so we can replace with the sum of terms. 20): Linearity of expectation. 6
7 Worshop trac - ICLR PROOF THAT L V AE [q EW ] IS AN UPPER BOUND OF L IW AE [q] Proof provided by Christian Naesseth. Let ˆpx : ) px,) q x) + j2 px, j) q j x) )] px, ) L V AE [q EW ] E qew [log q EW x) E qew log px, ) E qew log ) E q2: x) E q2: x) [ px,) ˆpx : ) [ ˆpx : ) 23) ] 24) ] 25) [ E qew log Eq2: x) [ˆpx : ) ])] 26) px, )E q2: x) [ˆpx : ) ] log E q2: x) [ˆpx : ) ]) d 27) px, )E q2: x) [ˆpx : ) log ˆpx : ) )] d 28) px, ) q 2: x)ˆpx : ) log ˆpx : ) ) d 29) 2: q x) : q x) px, )q 2: x)ˆpx : ) log ˆpx : ) ) d 30) px, ) : q x) q : x)ˆpx : ) log ˆpx : ) ) d 3) px, ) q x) : ˆpx : ) q : x)log ˆpx : )) d 32) px, ) q x) q px, : x)log px, j ) d 33) j) : q j q j x) j j x) px, i) q i x) q px, : x)log px, j ) d 34) j) i : q j q j x) j j x) px, i) i q i x) q px, : x)log px, j ) d 35) j) : q j q j x) j j x) q : x)log px, j ) d 36) : q j j x) E q: ) log px, j ) 37) q j x) j L IW AE [q] 38) 28): Given that fa) AlogA is concave for A > 0, and fe[x]) E[fx)], then fe[x]) E[x]logE[x] E[ xlogx]. 30): Change of notation. 34): i has the same expectation as so we can replace with the sum of terms. 7
8 Worshop trac - ICLR PROOF THAT q EW IS CLOSER TO THE TRUE POSTERIOR THAN q The previous section showed that L IW AE q) L V AE q EW ). That is, the IWAE ELBO with the base q is a lower bound to the VAE ELBO with the importance weighted q EW. Due to Jensen's inequality and as shown in Burda et al. 206), we now that the IWAE ELBO is an upper bound of the VAE ELBO: L IW AE q) L V AE q). Furthermore, the log marginal lielihood can be factoried into: logpx)) L V AE q) + KLq p), and rearranged to: KLq p) logpx)) L V AE q). Following the observations above and substituting q EW for q: KLq EW p) logpx)) L V AE [q EW ] 39) logpx)) L IW AE [q] 40) logpx)) L V AE [q] KLq p) 4) Thus, KLq EW p) KLq p), meaning q EW is closer to the true posterior than q in terms of KL divergence. 5.5 IN THE LIMIT OF THE NUMBER OF SAMPLES Another perspective is in the limit of. Recall that the marginal lielihood can be approximated by importance sampling: [ ] px, ) px) E q x) px, i ) 42) q x) q i x) where i is sampled from q x). We see that the denominator of q IW is approximating px). If px,) q x) is bounded, then it follows from the strong law of large numbers that, as approaches infinity, q IW converges to the true posterior p x) almost surely. This interpretation becomes clearer when we factor out the true posterior from q IW : q IW x, 2: ) i px) px,) q x) + )p x) 43) px, j) j2 q j x) We see that the closer the denominator becomes to px), the closer q IW is to the true posterior. 5.6 VISUALIZING q IW AND q EW IN D Figure 3: Visualiation of D q IW and q EW distributions. The blue p) distribution and the green q) distribution are both normalied. The three instances of q IW 2 ) 2) have different 2 samples from q) and we can see that they are unnormalied. q EW ) is normalied and is the expectation over 30 q IW 2 ) distributions. The q IW distributions were plotted using Algorithm 2 with S. 8
REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS
Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu
More informationInference Suboptimality in Variational Autoencoders
Inference Suboptimality in Variational Autoencoders Chris Cremer Department of Computer Science University of Toronto ccremer@cs.toronto.edu Xuechen Li Department of Computer Science University of Toronto
More informationINFERENCE SUBOPTIMALITY
INFERENCE SUBOPTIMALITY IN VARIATIONAL AUTOENCODERS Anonymous authors Paper under double-blind review ABSTRACT Amortized inference has led to efficient approximate inference for large datasets. The quality
More informationSandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse
Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion
More informationNonparametric Inference for Auto-Encoding Variational Bayes
Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationTraining Deep Generative Models: Variations on a Theme
Training Deep Generative Models: Variations on a Theme Philip Bachman McGill University, School of Computer Science phil.bachman@gmail.com Doina Precup McGill University, School of Computer Science dprecup@cs.mcgill.ca
More informationNatural Gradients via the Variational Predictive Distribution
Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationDEBIASING EVIDENCE APPROXIMATIONS: ON IMPORTANCE-WEIGHTED AUTOENCODERS AND JACKKNIFE VARIATIONAL INFERENCE
DEBIASING EVIDENCE APPROXIMATIONS: ON IMPORTANCE-WEIGHTED AUTOENCODERS AND JACNIFE VARIATIONAL INFERENCE Sebastian Nowozin Machine Intelligence and Perception Microsoft Research, Cambridge, U Sebastian.Nowozin@microsoft.com
More informationGenerative models for missing value completion
Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative
More informationarxiv: v1 [stat.ml] 6 Dec 2018
missiwae: Deep Generative Modelling and Imputation of Incomplete Data arxiv:1812.02633v1 [stat.ml] 6 Dec 2018 Pierre-Alexandre Mattei Department of Computer Science IT University of Copenhagen pima@itu.dk
More informationThe Success of Deep Generative Models
The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationApproximate Bayesian inference
Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic
More informationDenoising Criterion for Variational Auto-Encoding Framework
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Denoising Criterion for Variational Auto-Encoding Framework Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua
More informationCombine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning
Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business
More informationTighter Variational Bounds are Not Necessarily Better
Tom Rainforth Adam R. osiorek 2 Tuan Anh Le 2 Chris J. Maddison Maximilian Igl 2 Frank Wood 3 Yee Whye Teh & amp, 988; Hinton & emel, 994; Gregor et al., 26; Chen et al., 27, for which the inference network
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More information17 : Optimization and Monte Carlo Methods
10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationResampled Proposal Distributions for Variational Inference and Learning
Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models
More informationResampled Proposal Distributions for Variational Inference and Learning
Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationNote 1: Varitional Methods for Latent Dirichlet Allocation
Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content
More informationarxiv: v1 [cs.lg] 6 Dec 2018
Embedding-reparameterization procedure for manifold-valued latent variables in generative models arxiv:1812.02769v1 [cs.lg] 6 Dec 2018 Eugene Golikov Neural Networks and Deep Learning Lab Moscow Institute
More informationVariational AutoEncoder: An Introduction and Recent Perspectives
Metode de optimizare Riemanniene pentru învățare profundă Proiect cofinanțat din Fondul European de Dezvoltare Regională prin Programul Operațional Competitivitate 2014-2020 Variational AutoEncoder: An
More informationGENERATIVE ADVERSARIAL LEARNING
GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate
More informationDeep Generative Models
Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation
More informationAn Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme
An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic
More informationVariational Inference for Monte Carlo Objectives
Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationVariational Sequential Monte Carlo
Variational Sequential Monte Carlo Christian A. Naesseth Scott W. Linderman Rajesh Ranganath David M. Blei Linköping University Columbia University New York University Columbia University Abstract Many
More informationLocal Expectation Gradients for Doubly Stochastic. Variational Inference
Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationCLOSE-TO-CLEAN REGULARIZATION RELATES
Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of
More informationB PROOF OF LEMMA 1. Published as a conference paper at ICLR 2018
A ADVERSARIAL DOMAIN ADAPTATION (ADA ADA aims to transfer prediction nowledge learned from a source domain with labeled data to a target domain without labels, by learning domain-invariant features. Let
More informationVariational Bayes on Monte Carlo Steroids
Variational Bayes on Monte Carlo Steroids Aditya Grover, Stefano Ermon Department of Computer Science Stanford University {adityag,ermon}@cs.stanford.edu Abstract Variational approaches are often used
More informationDeep latent variable models
Deep latent variable models Pierre-Alexandre Mattei IT University of Copenhagen http://pamattei.github.io @pamattei 19 avril 2018 Séminaire de statistique du CNAM 1 Overview of talk A short introduction
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationSearching for the Principles of Reasoning and Intelligence
Searching for the Principles of Reasoning and Intelligence Shakir Mohamed shakir@deepmind.com DALI 2018 @shakir_a Statistical Operations Estimation and Learning Inference Hypothesis Testing Summarisation
More informationStochastic Backpropagation, Variational Inference, and Semi-Supervised Learning
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationFault Detection and Diagnosis Using Information Measures
Fault Detection and Diagnosis Using Information Measures Rudolf Kulhavý Honewell Technolog Center & Institute of Information Theor and Automation Prague, Cech Republic Outline Probabilit-based inference
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationUvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication
UvA-DARE (Digital Academic Repository) Improving Variational Auto-Encoders using Householder Flow Tomczak, J.M.; Welling, M. Link to publication Citation for published version (APA): Tomczak, J. M., &
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective
More informationarxiv: v4 [stat.co] 19 May 2015
Markov Chain Monte Carlo and Variational Inference: Bridging the Gap arxiv:14.6460v4 [stat.co] 19 May 201 Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam Abstract Recent
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationVariational Inference via χ Upper Bound Minimization
Variational Inference via χ Upper Bound Minimization Adji B. Dieng Columbia University Dustin Tran Columbia University Rajesh Ranganath Princeton University John Paisley Columbia University David M. Blei
More informationMarkov Chain Monte Carlo and Variational Inference: Bridging the Gap
Markov Chain Monte Carlo and Variational Inference: Bridging the Gap Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam TIM@ALGORITMICA.NL [D.P.KINGMA,M.WELLING]@UVA.NL
More informationarxiv: v3 [stat.ml] 30 Jun 2017
Rui Shu 1 Hung H. Bui 2 Mohammad Ghavamadeh 3 arxiv:1611.08568v3 stat.ml] 30 Jun 2017 Abstract We introduce a new framework for training deep generative models for high-dimensional conditional density
More informationAn Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University
An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationQuasi-Monte Carlo Flows
Quasi-Monte Carlo Flows Florian Wenzel TU Kaiserslautern Germany wenzelfl@hu-berlin.de Alexander Buchholz ENSAE-CREST, Paris France alexander.buchholz@ensae.fr Stephan Mandt Univ. of California, Irvine
More informationdoubly semi-implicit variational inference
Dmitry Molchanov 1,, Valery Kharitonov, Artem Sobolev 1 Dmitry Vetrov 1, 1 Samsung AI Center in Moscow National Research University Higher School of Economics, Joint Samsung-HSE lab Another way to overcome
More informationVariance Reduction in Black-box Variational Inference by Adaptive Importance Sampling
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18 Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Ximing Li, Changchun
More informationAdversarial Sequential Monte Carlo
Adversarial Sequential Monte Carlo Kira Kempinska Department of Security and Crime Science University College London London, WC1E 6BT kira.kowalska.13@ucl.ac.uk John Shawe-Taylor Department of Computer
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationVariational Inference with Stein Mixtures
Variational Inference with Stein Mixtures Eric Nalisnick Department of Computer Science University of California, Irvine enalisni@uci.edu Padhraic Smyth Department of Computer Science University of California,
More informationGray-box Inference for Structured Gaussian Process Models: Supplementary Material
Gray-box Inference for Structured Gaussian Process Models: Supplementary Material Pietro Galliani, Amir Dezfouli, Edwin V. Bonilla and Novi Quadrianto February 8, 017 1 Proof of Theorem Here we proof the
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationarxiv: v3 [stat.ml] 3 Apr 2017
STICK-BREAKING VARIATIONAL AUTOENCODERS Eric Nalisnick Department of Computer Science University of California, Irvine enalisni@uci.edu Padhraic Smyth Department of Computer Science University of California,
More informationGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology May 9, 2017 Outline Stochastic Gradient Estimation
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow Diederik P. Kingma dpkingma@openai.com Tim Salimans tim@openai.com Rafal Jozefowicz rafal@openai.com Xi Chen peter@openai.com Ilya Sutskever
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationarxiv: v5 [cs.lg] 11 Jul 2018
On Unifying Deep Generative Models Zhiting Hu 1,2, Zichao Yang 1, Ruslan Salakhutdinov 1, Eric Xing 2 {zhitingh,zichaoy,rsalakhu}@cs.cmu.edu, eric.xing@petuum.com Carnegie Mellon University 1, Petuum Inc.
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationarxiv: v2 [cs.ne] 22 Feb 2013
Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA
More informationRAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS
RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014
More informationImage recognition based on hidden Markov eigen-image models using variational Bayesian method
Image recognition based on hidden Marov eigen-image models using variational Bayesian method Kei Sawada, Kei Hashimoto, Yoshihio Nanau and Keiichi Touda Department of Scientific and Engineering Simulation,
More informationOperator Variational Inference
Operator Variational Inference Rajesh Ranganath Princeton University Jaan Altosaar Princeton University Dustin Tran Columbia University David M. Blei Columbia University Abstract Variational inference
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationOn the Quantitative Analysis of Decoder-Based Generative Models
On the Quantitative Analysis of Decoder-Based Generative Models Yuhuai Wu Department of Computer Science University of Toronto ywu@cs.toronto.edu Yuri Burda OpenAI yburda@openai.com Ruslan Salakhutdinov
More informationarxiv:submit/ [stat.ml] 12 Nov 2017
Variational Inference via χ Upper Bound Minimization arxiv:submit/26872 [stat.ml] 12 Nov 217 Adji B. Dieng Columbia University John Paisley Columbia University Dustin Tran Columbia University Abstract
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationREBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
RBAR: Low-variance, unbiased gradient estimates for discrete latent variable models George Tucker 1,, Andriy Mnih 2, Chris J. Maddison 2,3, Dieterich Lawson 1,*, Jascha Sohl-Dickstein 1 1 Google Brain,
More informationSequential Monte Carlo in the machine learning toolbox
Sequential Monte Carlo in the machine learning toolbox Working with the trend of blending Thomas Schön Uppsala University Sweden. Symposium on Advances in Approximate Bayesian Inference (AABI) Montréal,
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationarxiv: v1 [cs.lg] 15 Jun 2016
Improving Variational Inference with Inverse Autoregressive Flow arxiv:1606.04934v1 [cs.lg] 15 Jun 2016 Diederik P. Kingma, Tim Salimans and Max Welling OpenAI, San Francisco University of Amsterdam, University
More informationVariational Auto Encoders
Variational Auto Encoders 1 Recap Deep Neural Models consist of 2 primary components A feature extraction network Where important aspects of the data are amplified and projected into a linearly separable
More informationVariational Inference with Copula Augmentation
Variational Inference with Copula Augmentation Dustin Tran 1 David M. Blei 2 Edoardo M. Airoldi 1 1 Department of Statistics, Harvard University 2 Department of Statistics & Computer Science, Columbia
More informationarxiv: v3 [stat.ml] 15 Mar 2018
Operator Variational Inference Rajesh Ranganath Princeton University Jaan ltosaar Princeton University Dustin Tran Columbia University David M. Blei Columbia University arxiv:1610.09033v3 [stat.ml] 15
More informationFacilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients
Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients Yu Chen University of Bristol yc14600@bristol.ac.u Tom Diethe Amazon tdiethe@amazon.com Neil Lawrence Amazon lawrennd@amazon.com
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationWild Variational Approximations
Wild Variational Approximations Yingzhen Li University of Cambridge Cambridge, CB2 1PZ, UK yl494@cam.ac.uk Qiang Liu Dartmouth College Hanover, NH 03755, USA qiang.liu@dartmouth.edu Abstract We formalise
More information