Note 1: Varitional Methods for Latent Dirichlet Allocation

Size: px
Start display at page:

Download "Note 1: Varitional Methods for Latent Dirichlet Allocation"

Transcription

1 Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao Disclaimer: The focus of this note was to reorganie the content in the original Blei s paper and add more detailed derivations. For convenience, in some part, I fully copied Blei s content. I hope it can help the beginners to vem of LDA. First of all, let us make some claims about the parameters and variables in the model. Let K be the number of topics, D be the number of documents and V be the number of terms in the vocabulary. We use i to index a topic 1, d to index a document 2, n index a word 3 and w or v to denote a word. In LDA, α K 1 and β K V are model parameters, while θ D K and 4 are hidden variables. As a variational distribution q, we use a fully factoried model, where all the variables are independently governed by a different distribution, qθ, γ, φ qθ γq φ, 1.1 where Dirichlet parameter γ D K and the multinomial parameters φ 5 are variational parameters. The topic assignments of words and the documents are exchangeable, i.e., conditionally independent on the parameters either model parameters or variational parameters. Note that all the variational distributions q here are conditional distributions and should be written as q w, for simplicity we write it as q. The main idea is that we use variational expectation-maximiation EM: In the E-step variational EM, we use the variational approximation to the posterior described in the previous section and find the optimal values of variational parameters. In the M-step, we maximie the bound with respect to the model parameters. In a more condense way, we perform variational inference for learning variational parameters in E-step while perform parameter estimation in M-step. These two steps alternate in a iteration. We will optimie the lower bound w.r.t variational parameters and model parameters one by one, and this is to perform optimiation using a coordinate ascent algorithm. 1.1 Variational objective function Finding a lower bound for log pw α, β Jensens inequality. Let X be a random variable, and f is a convex function. Then we have fex Efx. If f is a concave function we have fex Efx. 1 i means K 2 d means D d1 3 n means N d n1, where N d is the length of current document 4 It is represented as a three dimension matrix, each entry is indexed by a triplet < d, n, i >, indicating whether the topic assignment of the nth word in the dth document is the ith topic. 5 Corresponding to, it is represented as a three dimension matrix, each entry is indexed by a triplet < d, n, i >, indicating the probability of the nth word in the dth document in the ith topic. 1-1

2 1-2 Lecture 1: Varitional Methods for Latent Dirichlet Allocation We use Jensens inequality to bound the log probability of a document 6, log pw α, β log pθ,, w α, βdθ, log pθ,, w α, βqθ, dθ, qθ, qθ, log pθ,, w α, βdθ qθ, log qθ, dθ, qθ, log pθ αdθ + qθ, log p θdθ + qθ, log pw, βdθ qθ, log qθ, dθ, E q [log pθ α] + E q [log p θ] + E q [log pw, β] + Hq, We introduce a function to denote the right side of the last line We can easily verify that Lγ, φ α, β E q [log pθ α] + E q [log p θ] + E q [log pw, β] + Hq. 1.2 log pw α, β Lγ, φ α, β + Dqθ, γ, φ pθ, α, β, w. 1.3 We indeed find a lower bound for log pw α, β, i.e., Lγ, φ α, β. This shows that maximiing the lower bound Lγ, φ α, β with respect to γ and φ is equivalent to minimiing the KL divergence between the variational posterior probability and the true posterior probability, the optimiation problem presented earlier in Eq Expanding the lower bound In the lower bound defined above, we have four items to specify. We will present a detailed derivations for them next. For the first item, we have E q [log pθ α] { K } E q [log exp α i 1 log θ i + log Γ α i log Γα i ], K α i 1E q [log θ i ] + log Γ α i log Γα i, K α i 1Ψγ i Ψ j γ j + log Γ α i log Γα i. 6 Currently, we ignore the document index d, since all the hidden variables and variational parameters are document-specific. But we will use the document index d explicitly in the part of parameter estimation since these model parameters are related to all the documents. 7 When we learn the variational parameters, we fix all the model parameters. So that log pw α, β can be considered as a fixed value and it is the sum of the lower bound and the KL divergence.

3 Lecture 1: Varitional Methods for Latent Dirichlet Allocation 1-3 Let n denote the topic assignment of the nth word in current document,and it is a vector. 1 when the topic assignment is the ith topic, otherwise, 0. E q [log p θ] n E q [log p n θ], E q [log p θ i ], E q [log θ i ], E q [ log θ i ], E q [ ]E q [log θ i ], φ Ψγ i Ψ γ i. For the third item, we have E q [log pw, β] n E q [log pw n n, β] E q [log β i,w n ] φ log β i,wn. Hq qθ, log qθ, dη, qθ log qθdθ q log q, γ i 1E q [log θ i ] log Γ γ i + log Γγ i φ log φ, K γ i 1 Ψγ i Ψ γ i log Γ γ i + log Γγ i φ log φ, K Having these detailed derivations of these four items, we can have an expanded formulation for the lower bound

4 1-4 Lecture 1: Varitional Methods for Latent Dirichlet Allocation Lγ, φ α, β 1.4 K i 1Ψγ i α Ψ γ j + log Γ α i log Γα i j + φ Ψγ i Ψ γ i + φ log β i,wn K γ i 1 Ψγ i Ψ γ i log Γ γ i + log Γγ i φ log φ. 1.2 Variational inference The aim of variational inference is to learn the values of variational parameters γ, φ. With the learnt variational parameters, we can evaluate the posterior probabilities of hidden variables. Having specified a simplified family of probability distributions, the next step is to set up an optimiation problem that determines the values of the variational parameters: γ, φ. We can obtain a solution for these variational variables by solve the following optimiation problem: γ, φ arg min Dqθ, γ, φ pθ, α, β, w. 1.5 γ,φ With Eq. 1.3, we can achieve minimie Dqθ, γ, φ pθ, α, β, w by maximiing the lower bound Lγ, φ α, β. Note that in this section, variational parameters are learnt in terms of a document, so we ignore the document index d here Learning the variational parameters We have expanded each item of the lower bound Lγ, φ α, β in Eq Then we maximie the bound with respect to the variational parameters: γ and φ. We first maximie Eq. 1.4 with respect to φ, the probability that the nth word is generated by latent topic i. Observe that this is a constrained maximiation since i φ 1, so we incorporate Lagrange Multipliers λ n s for that

5 Lecture 1: Varitional Methods for Latent Dirichlet Allocation 1-5 L φ 1.6 φ Ψγ i Ψ γ i + φ log β i,wn φ log φ +λ n i φ 1. Taking the derivative of the L φ dl φ /dφ 1.7 Ψγ i Ψ γ i + log β i,wn log φ 1 +λ n. By setting dl [γ] /dγ i to ero, we can get φ β i,wn expψγ i Ψ γ i + λ n, where we do not need to compute λ n for φ since λ n is the same for all is. Thus we have φ β i,wn expψγ i. Next, we maximie Eq. 1.4 with respect to γ i, the ith component of the posterior Dirichlet parameter. The terms containing γ i s are:

6 1-6 Lecture 1: Varitional Methods for Latent Dirichlet Allocation L [γ] K i 1Ψγ i α Ψ γ j j + φ Ψγ i Ψ γ i K γ i 1 Ψγ i Ψ γ i log Γ γ i + log Γγ i. Taking the derivative of the L [γ] dl [γ] /dγ i α i 1Ψ γ i Ψ j γ j + n φ Ψ γ i Ψ γ i Ψγ i Ψ j γ j γ i 1Ψ γ i Ψ j γ j Ψ j γ j + Ψγ i Ψ γ i Ψ j γ j α i 1 + n φ γ i 1. By setting dl [γ] /dγ i to ero, we can get γ i α i + n φ Parameter estimation In the previous section, we have discussed how to estimate the variational parameters. Next, we continue to estimate our model parameters, i.e., β and α. We solve this problem by using the variational lower bound as a surrogate for the intractable marginal log likelihood, with the variational parameters. Note that we need to first aggregate document-specific lower bounds defined in Eq.1.2. And in this part, we will use the document index d. We first rewrite the lower bound by only keeping the items which contain β with the lagrange multipliers ρ i s L [β] V φ d, log β i,wn + ρ i β i,v d, v1

7 Lecture 1: Varitional Methods for Latent Dirichlet Allocation 1-7 By taking the derivative L [β], we can have dl [β] /dβ i,v d,n φ d, 1v w n β i,v + ρ i, 1.10 where 1v w n is an indicator function which returns 1 when the condition is true otherwise returns 0. We can set φ d, 1vw n d,n β i,v + ρ i to ero, and solve ρ i : ρ i d,n,v φ d,1v w n. Since we have v β i,v 1, we can ignore ρ i to estimate an un-normalied value of β i,v βˆ i,v φ d, 1v w n d,n Similarly, we can rewrite the lower bound by only keeping the items which contain α. L α D K d1 α i 1Ψγ d,i Ψ j γ d,j + log Γ α i log Γα i By taking the derivative L [α], we can have dl [α] /dα i d Ψγ d,i Ψ j γ d,j + D Ψ α i Ψα i, This derivative of α i depends on α j s, where j i, and we therefore must use an iterative method to find the maximal α. In particular, the Hessian is in the form: dl [α] /dα i α j DΨ α i DΨ α i 1[i j], having these derivatives, we can use the optimiation algorithm in Blei s paper, and I will not discuss it in current version. 1.4 Summary of Variational EM algorithm for LDA Having the variational inference part and parameter estimation, now we present a summary of the variational EM algorithm for LDA. The derivation yields the following iterative algorithm: E-step: For each document, running the following iterative algorithm to find the optimiing values of the variational parameters

8 1-8 Lecture 1: Varitional Methods for Latent Dirichlet Allocation initialie φ 0 ni 1 K initialie γ 0 i for all i and n; α i + N d K for all i and n; while not converge do for n 1 to N d do for i 1 to K do φ t+1 β i,wn expψγ i ; end end normalie φ t+1 sum to 1; end γ t+1 i α i + n φt+1 ; Algorithm 1: Iterative variational inference algorithm for a document. M-step Maximie the resulting lower bound on the log likelihood with respect to the model parameters α and β. This corresponds to finding maximum likelihood estimates with expected sufficient statistics for each document under the approximate posterior which is computed in the E-step. Appendix In this part, we will go over some preliminary points for previous derivations. This part is copied from Blei paper The need to compute the expected value of the log of a single probability component under the Dirichlet arises repeatedly in deriving the inference and parameter estimation procedures for LDA. This value can be easily computed from the natural parameteriation of the exponential family representation of the Dirichlet distribution. Recall that a distribution is in the exponential family if it can be written in the form: px η hx expη T T x Aη, where η is the natural parameter, T x is the sufficient statistic, and Aη is the log of the normaliation factor. We can write the Dirichlet in this form by exponentiating the log: K pθ α exp α i 1 log θ i + log Γ α i log Γα i, From this form, we immediately see that the natural parameter of the Dirichlet is η i α i 1 and the sufficient statistic is T θ i log θ i. Furthermore, using the general fact that the derivative of the log normaliation factor with respect to the natural parameter is equal to the expectation of the sufficient statistic, we obtain: E[log θ i α i ] Ψα i Ψ j α j, where Ψ is the digamma function, the first derivative of the log Gamma function. This is a very important point for the derivations in this note.

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Probablistic Graphical odels, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Instructions There are four questions in this homework. The last question involves some programming which

More information

Expectation Maximization and Mixtures of Gaussians

Expectation Maximization and Mixtures of Gaussians Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Clustering problems, mixture models and Bayesian nonparametrics

Clustering problems, mixture models and Bayesian nonparametrics Clustering problems, mixture models and Bayesian nonparametrics Nguyễn Xuân Long Department of Statistics Department of Electrical Engineering and Computer Science University of Michigan Vietnam Institute

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Assignment 2: Twitter Topic Modeling with Latent Dirichlet Allocation

Assignment 2: Twitter Topic Modeling with Latent Dirichlet Allocation Assignment 2: Twitter Topic Modeling with Latent Dirichlet Allocation Background In this assignment we are going to implement a parallel MapReduce version of a popular topic modeling algorithm called Latent

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Inference Methods for Latent Dirichlet Allocation

Inference Methods for Latent Dirichlet Allocation Inference Methods for Latent Dirichlet Allocation Chase Geigle University of Illinois at Urbana-Champaign Department of Computer Science geigle1@illinois.edu October 15, 2016 Abstract Latent Dirichlet

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

LSI, plsi, LDA and inference methods

LSI, plsi, LDA and inference methods LSI, plsi, LDA and inference methods Guillaume Obozinski INRIA - Ecole Normale Supérieure - Paris RussIR summer school Yaroslavl, August 6-10th 2012 Guillaume Obozinski LSI, plsi, LDA and inference methods

More information

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644: Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis

More information

Approximate Bayesian inference

Approximate Bayesian inference Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability

Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon

More information

CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm

CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm Jan-Willem van de Meent, 19 November 2016 1 Hidden Markov Models A hidden Markov model (HMM)

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Understanding Covariance Estimates in Expectation Propagation

Understanding Covariance Estimates in Expectation Propagation Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Streaming Variational Bayes

Streaming Variational Bayes Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson University of California, Berkeley {tab@stat, nickboyd@eecs, wibisono@eecs, ashia@stat}.berkeley.edu Michael

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Introduction To Machine Learning

Introduction To Machine Learning Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, 2016 1 / 14 Expectation maximization

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Statistical Debugging with Latent Topic Models

Statistical Debugging with Latent Topic Models Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

arxiv: v2 [stat.ml] 15 Aug 2017

arxiv: v2 [stat.ml] 15 Aug 2017 Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Collapsed Variational Inference for Sum-Product Networks

Collapsed Variational Inference for Sum-Product Networks for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Maximum Likelihood Estimation of Dirichlet Distribution Parameters

Maximum Likelihood Estimation of Dirichlet Distribution Parameters Maximum Lielihood Estimation of Dirichlet Distribution Parameters Jonathan Huang Abstract. Dirichlet distributions are commonly used as priors over proportional data. In this paper, I will introduce this

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM) Algorithm Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

Motif representation using position weight matrix

Motif representation using position weight matrix Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

A concave regularization technique for sparse mixture models

A concave regularization technique for sparse mixture models A concave regulariation technique for sparse mixture models Martin Larsson School of Operations Research and Information Engineering Cornell University mol23@cornell.edu Johan Ugander Center for Applied

More information

CS Lecture 18. Expectation Maximization

CS Lecture 18. Expectation Maximization CS 6347 Lecture 18 Expectation Maximization Unobserved Variables Latent or hidden variables in the model are never observed We may or may not be interested in their values, but their existence is crucial

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Programming Assignment 4: Image Completion using Mixture of Bernoullis Programming Assignment 4: Image Completion using Mixture of Bernoullis Deadline: Tuesday, April 4, at 11:59pm TA: Renie Liao (csc321ta@cs.toronto.edu) Submission: You must submit two files through MarkUs

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

Introduction: exponential family, conjugacy, and sufficiency (9/2/13) STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review

More information

SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION

SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION SUPERVISE MULTI-MOAL TOPIC MOEL FOR IMAGE AOTATIO Thu Hoai Tran 2 and Seungjin Choi 12 1 epartment of Computer Science and Engineering POSTECH Korea 2 ivision of IT Convergence Engineering POSTECH Korea

More information