Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Size: px
Start display at page:

Download "Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:"

Transcription

1 Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644:

2 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60

3 Latent semantic analysis We have a set of documents D Each document modeled as a bag of words (bow) over dictionary W. : the number of times word appears in document. x w,d w W d D 2 / 60

4 Latent semantic analysis Let's start with a simple model based on the frequency of word occurrences. Each document is modeled as with parameters θ d = {θ1,d,, θ W,d} n d draws from a Multinomial distribution Note θ w,d 0 and w θ w,d = 1. 3 / 60

5 Latent semantic analysis Probability of observed corpus D P r(d {θ d }) D d=1 W w=1 θ x w,d w,d 4 / 60

6 Latent semantic analysis Problem 1: Prove MLE ^θ w,d = x w,d n d 5 / 60

7 Constrained optimization We have a problem of type min x s.t. f 0 (x) f i (x) 0 i = 1,, m h i (x) = 0 i = 1,, p Note: This discussion follows Boyd and Vandenberghe, Convex Optimization 6 / 60

8 Constrained optimization To solve these type of problems we will look at the Lagrangian function: L(x, λ, ν) = f 0 (x) + m p λ i f i (x) + ν i g i (x) i=1 i=1 7 / 60

9 Constrained optimization We'll see these in more detail later, but there is a beautiful result giving optimality conditions based on the Lagrangian: Suppose ~x, ~ λ and ν ~ are optimal, then f i ( ~ x) 0 h i ( ~ x) = 0 ~ λ i 0 ~ λ i f i ( ~ x) = 0 L( ~ x, ~ λ, ~ ν) = 0 8 / 60

10 Constrained optimization We can use the gradient and feasibility conditions to prove the MLE result. 9 / 60

11 Probablistic Latent Semantic Analysis Let's change our document model to introduce topics. The key idea is that the probability of observing a word in a document is given by two pieces: The probability of observing a topic in a document, and The probability of observing a word given a topic P r(w, d) = T P r(w t)p r(t d) t=1 10 / 60

12 Probablistic Latent Semantic Analysis So, we rewrite corpus probability as P r(d {p d }{θ t }) D d=1 W w=1 ( T x w,d p t,d θ w,t ) t=1 11 / 60

13 Probablistic Latent Semantic Analysis So, we rewrite corpus probability as P r(d {p d }{θ t }) D d=1 W w=1 ( T x w,d p t,d θ w,t ) t=1 Mixture of topics!! 12 / 60

14 Probablistic Latent Semantic Analysis A fully observed model Assume you know the latent number of occurences of word w in document d generated from topic t: Δ w,d,t t Δ w,d,t = x w,d, such that. In that case we can rewrite corpus probability: P r(d {p d }, {θ t }) D W T d=1 w=1 t=1 (p t,d θ w,t ) Δ w,d,t 13 / 60

15 Probablistic Latent Semantic Analysis Problem 2 Show MLEs given by ^p t,d = W w=1 Δ w,d,t T t=1 W w=1 Δ w,d,t ^θ t,d = D d=1 Δ w,d,t W w=1 D d=1 Δ w,d,t 14 / 60

16 Probablistic Latent Semantic Analysis Since we don't observe Δ w,d,t we use the EM algorithm At each iteration (given current parameters {p d } and {θ d } find responsibility γ w,d,t = E[Δ w,d,t {p d }, {θ t }] and maximize fully observed likelihood plugging in γ w,d,t for Δ w,d,t 15 / 60

17 Probablistic Latent Semantic Analysis Problem 4: Show γ w,d,t = x w,d p t,d θ w,t T t =1 p t,dθ w,t 16 / 60

18 The EM Algorithm in General So, why does that work? Why does plugging in γ w,d,t for the latent variables Δ w,d,t work? Why does that maximize log likelihood l({p d }, {θ t }; D)? 17 / 60

19 The EM Algorithm in General Think of it as follows: Z: observed data : missing latent data : complete data (observed and missing) Z m T = (Z,Z m ) 18 / 60

20 The EM Algorithm in General Think of it as follows: Z : observed data : missing latent data : complete data (observed and missing) Z m T = (Z, Z m ) l(θ ; Z) l 0 (θ ; T ) : log likehood w.r.t. observed data : log likelihood w.r.t. complete data 19 / 60

21 The EM Algorithm in General Next, notice that Pr(Z θ ) = Pr(T θ ) Pr(Z m Z, θ ) 20 / 60

22 The EM Algorithm in General Next, notice that Pr(Z θ ) = Pr(T θ ) Pr(Z m Z, θ ) As likelihood: l(θ ; Z) = l 0 (θ ; T ) l 1 (θ ; Z m Z) 21 / 60

23 The EM Algorithm in General Iterative approach: given parameters θ take expectation of log likelihoods l(θ ; Z) = E[l 0 (θ ; T ) Z, θ] E[l 1 (θ ; Z m Z) Z, θ] Q(θ, θ) R(θ, θ) 22 / 60

24 The EM Algorithm in General Iterative approach: given parameters θ take expectation of log likelihoods l(θ ; Z) = E[l 0 (θ ; T ) Z, θ] E[l 1 (θ ; Z m Z) Z, θ] Q(θ, θ) R(θ, θ) In plsa, Q(θ is the log likelihood of complete data with replaced by, θ) Δ w,d,t γ w,d,t 23 / 60

25 The EM Algorithm in General The general EM algorithm 1. Initialize parameters 2. Construct function 3. Find next set of parameters 4. Iterate steps 2 and 3 until convergence θ (0) Q(θ, θ (j) ) θ (j+1) = arg max θ Q(θ, θ (j) ) 24 / 60

26 The EM Algorithm in General So, why does that work? l(θ (j+1) ; Z) l(θ (j) ; Z) = [Q(θ (j+1), θ (j) ) Q(θ (j), θ (j) )] [R(θ (j+1), θ (j) ) R(θ (j), θ (j) )] 0 25 / 60

27 The EM Algorithm in General So, why does that work? l(θ (j+1) ; Z) l(θ (j) ; Z) = [Q(θ (j+1), θ (j) ) Q(θ (j), θ (j) )] [R(θ (j+1), θ (j) ) R(θ (j), θ (j) )] 0 I.E., every step makes log likehood larger 26 / 60

28 The EM Algorithm in General Why else does it work? Q(θ minorizes, θ) l(θ ; Z) 27 / 60

29 The EM Algorithm in General General algorithmic concept: Iterative approach: Initialize parameters Construct bound based on current parameters Optimize bound 28 / 60

30 The EM Algorithm in General General algorithmic concept: Iterative approach: Initialize parameters Construct bound based on current parameters Optimize bound We will see this again when we look at variational methods 29 / 60

31 Approximate Inference by Sampling Ultimately, what we are interested in is learning topics Perhaps instead of finding parameters θ that maximize likelihood Sample from a distribution P r(θ D) that gives us topic estimates 30 / 60

32 Approximate Inference by Sampling Ultimately, what we are interested in is learning topics Perhaps instead of finding parameters θ that maximize likelihood Sample from a distribution P r(θ D) that gives us topic estimates But, we only have talked about P r(d θ) how can we sample parameters? 31 / 60

33 Approximate Inference by Sampling Like EM, the trick here is to expand model with latent data Z m And sample from distribution Pr(θ, Z m Z) 32 / 60

34 Approximate Inference by Sampling Like EM, the trick here is to expand model with latent data Z m And sample from distribution Pr(θ, Z m Z) This is challenging, but sampling from Pr(θ Z m, Z) and Pr(Z m θ, Z) is easier 33 / 60

35 Approximate Inference by Sampling The Gibbs Sampler does exactly that Property: After some rounds, samples from the conditional distributions Pr(θ Z m, Z) Correspond to samples from marginal Pr(θ Z) = Z m Pr(θ, Z m Z) 34 / 60

36 Approximate Inference by Sampling Quick aside, how to simulate data for plsa? Generate parameters Generate Δ w,d,t and {p d } {θ t } 35 / 60

37 Approximate Inference by Sampling Let's go backwards, let's deal with Δ w,d,t 36 / 60

38 Approximate Inference by Sampling Let's go backwards, let's deal with Δ w,d,t Δ w,d,t Mult xw,d (γ w,d,1,, γ w,d,t) Where γ w,d,t was as given by E step 37 / 60

39 Approximate Inference by Sampling Let's go backwards, let's deal with Δ w,d,t Δ w,d,t Mult xw,d (γ w,d,1,, γ w,d,t) Where γ w,d,t was as given by E step for d in range(num_docs): delta[d,w,:] = np.random.multinomial(doc_mat[d,w], gamma[d,w,:]) 38 / 60

40 Approximate Inference by Sampling Hmm, that's a problem since we need x w,d... But, we know P r(w, d) = t p t,dθ w,t so, let's use that to generate each x w,d as x w,d Mult nd (P r(1, d),, P r(w, d)) 39 / 60

41 Approximate Inference by Sampling Hmm, that's a problem since we need x w,d... But, we know P r(w, d) = t p t,dθ w,t so, let's use that to generate each x w,d as x w,d Mult nd (P r(1, d),, P r(w, d)) for d in range(num_docs): doc_mat[d,:] = np.random.multinomial(nw[d], np.sum(p[:,d] * theta), axis=0) 40 / 60

42 Approximate Inference by Sampling Now, how about p d? How do we generate the parameters of a Multinomial distribution? 41 / 60

43 Approximate Inference by Sampling Now, how about p d? How do we generate the parameters of a Multinomial distribution? This is where the Dirichlet distribution comes in... If p d Dir(α), then P r(p d ) T p α t 1 t,d t=1 42 / 60

44 Approximate Inference by Sampling Some interesting properties: E[p t,d ] = α t t α t So, if we set all α t = 1 we will tend to have uniform probability over topics ( 1/t each on average) If we increase α t = 100 little variance (it will almost always be 1/t) it will also have uniform probability but will have very 43 / 60

45 Approximate Inference by Sampling So, we can say p d Dir(α) and θ t Dir(β) 44 / 60

46 Approximate Inference by Sampling So, we can say p d Dir(α) and θ t Dir(β) And generate data as (with α t = 1) for d in range(num_docs): p[:,d] = np.random.dirichlet(1. * np.ones(num_topics)) 45 / 60

47 Approximate Inference by Sampling So what we have is a prior over parameters and : and And we can formulate a distribution for missing data Δ w,d,t : {p d } {θ t } Pr(p d α) Pr(θ t β) Pr(Δ w,d,t p d, θ t, α, β) = Pr(Δ w,d,t p d, θ t )Pr(p d α)pr(θ t β) 46 / 60

48 Approximate Inference by Sampling However, what we care about is the posterior distribution Pr(p d Δ w,d,t, θ t, α, β) What do we do??? 47 / 60

49 Approximate Inference by Sampling Another neat property of the Dirichlet distribution is that it is conjugate to the Multinomial If θ α Dir(α) and X θ Multinomial(θ), then θ X, α Dir(X + α) 48 / 60

50 Approximate Inference by Sampling That means we can sample p d from p t,d Dir( w Δ w,d,t + α) and θ w,t Dir( Δ w,d,t + β) d 49 / 60

51 Approximate Inference by Sampling Coincidentally, we have just specified the Latent Dirichlet Allocation method for topic modeling. This is the most commonly used method for topic modeling Blei, Ng, Jordan (2003), JMLR 50 / 60

52 Approximate Inference by Sampling We can now specify a full Gibbs Sampler for an LDA mixture model. Given: Word document counts Number of topics K Prior parameters α and x w,d Do: Learn parameters {p d } and {θ t } for K topics β 51 / 60

53 Approximate Inference by Sampling Step 0: Initialize parameters and {p d } {θ t } p d Dir(α) and θ t Dir(β) 52 / 60

54 Approximate Inference by Sampling Step 1: Sample Δ w,d,t based on current parameters {p d } and {θ t } Δ w,d,. Mult xw,d(γ w,d,1,, γ w,d,t) 53 / 60

55 Approximate Inference by Sampling Step 2: Sample parameters from p t,d Dir( w Δ w,d,t + α) and θ w,t Dir( Δ w,d,t + β) d 54 / 60

56 Approximate Inference by Sampling Step 3: Get samples for a few iterations (e.g., 200), we want to reach a stationary distribution / 60

57 Approximate Inference by Sampling Step 4: ^Δ w,d,t Estimate as the average of the estimates from the last m iterations (e.g., m=500) 56 / 60

58 Approximate Inference by Sampling Step 5: Estimate parameters p d and θ t based on estimated ^Δw,d,t ^p t,d = w t w ^Δ w,d,t + α ^Δw,d,t + α ^θ w,t = d w d ^Δw,d,t + β ^Δw,d,t + β 57 / 60

59 Mixture models We have now seen two different mixture models: soft k means and topic models 58 / 60

60 Mixture models We have now seen two different mixture models: soft k means and topic models Two inference procedures: Exact Inference with Maximum Likelihood using the EM algorithm Approximate Inference using Gibbs Sampling 59 / 60

61 Mixture models We have now seen two different mixture models: soft k means and topic models Two inference procedures: Exact Inference with Maximum Likelihood using the EM algorithm Approximate Inference using Gibbs Sampling Next, we will go back to Maximum Likelihood but learn about Approximate Inference using Variational Methods 60 / 60

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Introduction To Machine Learning

Introduction To Machine Learning Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, 2016 1 / 14 Expectation maximization

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

Mixtures of Multinomials

Mixtures of Multinomials Mixtures of Multinomials Jason D. M. Rennie jrennie@gmail.com September, 25 Abstract We consider two different types of multinomial mixtures, () a wordlevel mixture, and (2) a document-level mixture. We

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51 Topic Models Material adapted from David Mimno University of Maryland INTRODUCTION Material adapted from David Mimno UMD Topic Models 1 / 51 Why topic models? Suppose you have a huge number of documents

More information

Statistical Debugging with Latent Topic Models

Statistical Debugging with Latent Topic Models Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Streaming Variational Bayes

Streaming Variational Bayes Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

A concave regularization technique for sparse mixture models

A concave regularization technique for sparse mixture models A concave regulariation technique for sparse mixture models Martin Larsson School of Operations Research and Information Engineering Cornell University mol23@cornell.edu Johan Ugander Center for Applied

More information

Distributed ML for DOSNs: giving power back to users

Distributed ML for DOSNs: giving power back to users Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for

More information

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media Hybrid Models for Text and Graphs 10/23/2012 Analysis of Social Media Newswire Text Formal Primary purpose: Inform typical reader about recent events Broad audience: Explicitly establish shared context

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Applying LDA topic model to a corpus of Italian Supreme Court decisions Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding

More information

Clustering problems, mixture models and Bayesian nonparametrics

Clustering problems, mixture models and Bayesian nonparametrics Clustering problems, mixture models and Bayesian nonparametrics Nguyễn Xuân Long Department of Statistics Department of Electrical Engineering and Computer Science University of Michigan Vietnam Institute

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Language Information Processing, Advanced. Topic Models

Language Information Processing, Advanced. Topic Models Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Topic Modeling Using Latent Dirichlet Allocation (LDA) Topic Modeling Using Latent Dirichlet Allocation (LDA) Porter Jenkins and Mimi Brinberg Penn State University prj3@psu.edu mjb6504@psu.edu October 23, 2017 Porter Jenkins and Mimi Brinberg (PSU) LDA October

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Latent Variable View of EM. Sargur Srihari

Latent Variable View of EM. Sargur Srihari Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time

More information

Online Bayesian Passive-Aggressive Learning"

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning" Tianlin Shi! stl501@gmail.com! Jun Zhu! dcszj@mail.tsinghua.edu.cn! The BIG DATA challenge" Large amounts of data.! Big data:!! Big Science: 25 PB annual data.!

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Mixed-membership Models (and an introduction to variational inference)

Mixed-membership Models (and an introduction to variational inference) Mixed-membership Models (and an introduction to variational inference) David M. Blei Columbia University November 24, 2015 Introduction We studied mixture models in detail, models that partition data into

More information

Maximum Likelihood Estimation of Dirichlet Distribution Parameters

Maximum Likelihood Estimation of Dirichlet Distribution Parameters Maximum Lielihood Estimation of Dirichlet Distribution Parameters Jonathan Huang Abstract. Dirichlet distributions are commonly used as priors over proportional data. In this paper, I will introduce this

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

Improving Topic Models with Latent Feature Word Representations

Improving Topic Models with Latent Feature Word Representations Improving Topic Models with Latent Feature Word Representations Dat Quoc Nguyen Joint work with Richard Billingsley, Lan Du and Mark Johnson Department of Computing Macquarie University Sydney, Australia

More information

Classical Predictive Models

Classical Predictive Models Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information