Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

Similar documents
Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

LSI, plsi, LDA and inference methods

Latent Dirichlet Allocation (LDA)

Topic Modelling and Latent Dirichlet Allocation

Document and Topic Models: plsa and LDA

Evaluation Methods for Topic Models

Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation

CS Lecture 18. Topic Models and LDA

16 : Approximate Inference: Markov Chain Monte Carlo

Applying LDA topic model to a corpus of Italian Supreme Court decisions

AN INTRODUCTION TO TOPIC MODELS

Introduction to Bayesian inference

Generative Clustering, Topic Modeling, & Bayesian Inference

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Statistical Debugging with Latent Topic Models

Gaussian Mixture Model

CS145: INTRODUCTION TO DATA MINING

Study Notes on the Latent Dirichlet Allocation

Lecture 13 : Variational Inference: Mean Field Approximation

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Note for plsa and LDA-Version 1.1

Topic Models. Charles Elkan November 20, 2008

Lecture 22 Exploratory Text Analysis & Topic Models

Latent Dirichlet Allocation Introduction/Overview

Topic Models and Applications to Short Documents

topic modeling hanna m. wallach

Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce

Latent Dirichlet Allocation

Efficient Tree-Based Topic Modeling

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Probabilistic Latent Semantic Analysis

Applying hlda to Practical Topic Modeling

Distributed ML for DOSNs: giving power back to users

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Modeling Environment

Mixture Models and Expectation-Maximization

Lecture 19, November 19, 2012

Bayesian Inference for Dirichlet-Multinomials

Measuring Topic Quality in Latent Dirichlet Allocation

Dirichlet Enhanced Latent Semantic Analysis

Latent Dirichlet Bayesian Co-Clustering

Web Search and Text Mining. Lecture 16: Topics and Communities

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

arxiv: v6 [stat.ml] 11 Apr 2017

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Introduction to Machine Learning

LDA with Amortized Inference

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07

CS 572: Information Retrieval

Knowledge Discovery and Data Mining 1 (VO) ( )

Content-based Recommendation

Collapsed Variational Bayesian Inference for Hidden Markov Models

Latent Dirichlet Allocation (LDA)

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED.

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Mixed-membership Models (and an introduction to variational inference)

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Online Bayesian Passive-Agressive Learning

Note 1: Varitional Methods for Latent Dirichlet Allocation

Latent Dirichlet Alloca/on

Graphical Models for Collaborative Filtering

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Nonparametric Models

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Gentle Introduction to Infinite Gaussian Mixture Modeling

Interactive Topic Modeling

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Mixtures of Multinomials

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Variable View of EM. Sargur Srihari

STA 4273H: Statistical Machine Learning

Part 1: Expectation Propagation

Kernel Density Topic Models: Visual Topics Without Visual Words

Scaling Neighbourhood Methods

Probabilistic Matrix Factorization

Language Information Processing, Advanced. Topic Models

Bayesian Methods for Machine Learning

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

AUTOMATIC DETECTION OF WORDS NOT SIGNIFICANT TO TOPIC CLASSIFICATION IN LATENT DIRICHLET ALLOCATION

STA 4273H: Statistical Machine Learning

A Tutorial on Learning with Bayesian Networks

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Streaming Variational Bayes

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Mixed Membership Matrix Factorization

Modeling User Rating Profiles For Collaborative Filtering

Gaussian Models

Non-parametric Clustering with Dirichlet Processes

Present and Future of Text Modeling

Time-Sensitive Dirichlet Process Mixture Models

Topic Discovery Project Report

Probabilistic Topic Models Tutorial: COMAD 2011

Hierarchical Bayesian Nonparametrics

Recent Advances in Bayesian Inference Techniques

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Transcription:

Topic Models Material adapted from David Mimno University of Maryland INTRODUCTION Material adapted from David Mimno UMD Topic Models 1 / 51

Why topic models? Suppose you have a huge number of documents Want to know what s going on Can t read them all (e.g. every New York Times article from the 90 s) Topic models offer a way to get a corpus-level view of major themes Material adapted from David Mimno UMD Topic Models 2 / 51

Why topic models? Suppose you have a huge number of documents Want to know what s going on Can t read them all (e.g. every New York Times article from the 90 s) Topic models offer a way to get a corpus-level view of major themes Unsupervised Material adapted from David Mimno UMD Topic Models 2 / 51

Roadmap What are topic models How to know if you have good topic model How to go from raw data to topics Material adapted from David Mimno UMD Topic Models 3 / 51

What are Topic Models? Conceptual Approach From an input corpus and number of topics K words to topics Corpus Forget the Bootleg, Just Download Multiplex the Heralded Movie Legally As Linchpin The Shape To of Growth Cinema, Transformed A Peaceful At Crew the Click Puts of Muppets a Where Mouse Stock Trades: A Its Better Mouth Deal Is For The Investors three Isn't big Internet Simple portals Red Light, begin Green to distinguish Light: A among 2-Tone themselves L.E.D. to as shopping Simplify Screens malls Material adapted from David Mimno UMD Topic Models 4 / 51

What are Topic Models? Conceptual Approach From an input corpus and number of topics K words to topics TOPIC 1 TOPIC 2 TOPIC 3 computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Material adapted from David Mimno UMD Topic Models 4 / 51

What are Topic Models? Conceptual Approach For each document, what topics are expressed by that document? Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens The three big Internet portals begin to distinguish among themselves as shopping malls Stock Trades: A Better Deal For Investors Isn't Simple TOPIC 1 Forget the Bootleg, Just Download the Movie Legally TOPIC 2 The Shape of Cinema, Transformed At the Click of a Mouse Multiplex Heralded As Linchpin To Growth TOPIC 3 A Peaceful Crew Puts Muppets Where Its Mouth Is Material adapted from David Mimno UMD Topic Models 5 / 51

What are Topic Models? Topics from Science Material adapted from David Mimno UMD Topic Models 6 / 51

What are Topic Models? Why should you care? Neat way to explore / understand corpus collections E-discovery Social media Scientific data NLP Applications Word Sense Disambiguation Discourse Segmentation Machine Translation Psychology: word meaning, polysemy Inference is (relatively) simple Material adapted from David Mimno UMD Topic Models 7 / 51

What are Topic Models? Matrix Factorization Approach M K K V M V Topics Topic Assignment Dataset K Number of topics M Number of documents V Size of vocabulary Material adapted from David Mimno UMD Topic Models 8 / 51

What are Topic Models? Matrix Factorization Approach M K K V M V Topics Topic Assignment Dataset K Number of topics M Number of documents V Size of vocabulary If you use singular value decomposition (SVD), this technique is called latent semantic analysis. Popular in information retrieval. Material adapted from David Mimno UMD Topic Models 8 / 51

What are Topic Models? Alternative: Generative Model How your data came to be Sequence of Probabilistic Steps Posterior Inference Material adapted from David Mimno UMD Topic Models 9 / 51

What are Topic Models? Alternative: Generative Model How your data came to be Sequence of Probabilistic Steps Posterior Inference Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003. Material adapted from David Mimno UMD Topic Models 9 / 51

What are Topic Models? Multinomial Distribution Distribution over discrete outcomes Represented by non-negative vector that sums to one Picture representation (1,0,0) (0,0,1) (0,1,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) Material adapted from David Mimno UMD Topic Models 10 / 51

What are Topic Models? Multinomial Distribution Distribution over discrete outcomes Represented by non-negative vector that sums to one Picture representation (1,0,0) (0,0,1) (0,1,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) Come from a Dirichlet distribution Material adapted from David Mimno UMD Topic Models 10 / 51

What are Topic Models? Dirichlet Distribution Material adapted from David Mimno UMD Topic Models 11 / 51

What are Topic Models? Dirichlet Distribution Material adapted from David Mimno UMD Topic Models 11 / 51

What are Topic Models? Dirichlet Distribution Material adapted from David Mimno UMD Topic Models 11 / 51

What are Topic Models? Dirichlet Distribution Material adapted from David Mimno UMD Topic Models 12 / 51

What are Topic Models? Dirichlet Distribution If φ Dir(()α), w Mult(()φ), and n k = {w i : w i = k} then p(φ α, w) p( w φ)p(φ α) (1) φ n k φ α k 1 (2) k k k φ α k+n k 1 Conjugacy: this posterior has the same form as the prior (3)

What are Topic Models? Dirichlet Distribution If φ Dir(()α), w Mult(()φ), and n k = {w i : w i = k} then p(φ α, w) p( w φ)p(φ α) (1) φ n k φ α k 1 (2) k k k φ α k+n k 1 (3) Conjugacy: this posterior has the same form as the prior Material adapted from David Mimno UMD Topic Models 13 / 51

What are Topic Models? Generative Model TOPIC 1 computer, technology, system, service, site, phone, internet, machine TOPIC 2 sell, sale, store, product, business, advertising, market, consumer TOPIC 3 play, film, movie, theater, production, star, director, stage Material adapted from David Mimno UMD Topic Models 14 / 51

What are Topic Models? Generative Model Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens The three big Internet portals begin to distinguish among themselves as shopping malls Stock Trades: A Better Deal For Investors Isn't Simple TOPIC 1 Forget the Bootleg, Just Download the Movie Legally TOPIC 2 The Shape of Cinema, Transformed At the Click of a Mouse Multiplex Heralded As Linchpin To Growth TOPIC 3 A Peaceful Crew Puts Muppets Where Its Mouth Is Material adapted from David Mimno UMD Topic Models 14 / 51

What are Topic Models? Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 14 / 51

What are Topic Models? Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 14 / 51

What are Topic Models? Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 14 / 51

What are Topic Models? Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 14 / 51

What are Topic Models? Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,...,K }, draw a multinomial distribution βk from a Dirichlet distribution with parameter λ Material adapted from David Mimno UMD Topic Models 15 / 51

What are Topic Models? Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,...,K }, draw a multinomial distribution βk from a Dirichlet distribution with parameter λ For each document d {1,...,M}, draw a multinomial distribution θd from a Dirichlet distribution with parameter α Material adapted from David Mimno UMD Topic Models 15 / 51

What are Topic Models? Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,...,K }, draw a multinomial distribution βk from a Dirichlet distribution with parameter λ For each document d {1,...,M}, draw a multinomial distribution θd from a Dirichlet distribution with parameter α For each word position n {1,...,N}, select a hidden topic zn from the multinomial distribution parameterized by θ. Material adapted from David Mimno UMD Topic Models 15 / 51

What are Topic Models? Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,...,K }, draw a multinomial distribution βk from a Dirichlet distribution with parameter λ For each document d {1,...,M}, draw a multinomial distribution θd from a Dirichlet distribution with parameter α For each word position n {1,...,N}, select a hidden topic zn from the multinomial distribution parameterized by θ. Choose the observed word wn from the distribution β zn. Material adapted from David Mimno UMD Topic Models 15 / 51

What are Topic Models? Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,...,K }, draw a multinomial distribution βk from a Dirichlet distribution with parameter λ For each document d {1,...,M}, draw a multinomial distribution θd from a Dirichlet distribution with parameter α For each word position n {1,...,N}, select a hidden topic zn from the multinomial distribution parameterized by θ. Choose the observed word wn from the distribution β zn. We use statistical inference to uncover the most likely unobserved Material adapted from David Mimno UMD Topic Models 15 / 51

What are Topic Models? Topic Models: What s Important Topic models Topics to word types Documents to topics Topics to word types multinomial distribution Documents to topics multinomial distribution Focus in this talk: statistical methods Model: story of how your data came to be Latent variables: missing pieces of your story Statistical inference: filling in those missing pieces We use latent Dirichlet allocation (LDA), a fully Bayesian version of plsi, probabilistic version of LSA Material adapted from David Mimno UMD Topic Models 16 / 51

What are Topic Models? Topic Models: What s Important Topic models (latent variables) Topics to word types Documents to topics Topics to word types multinomial distribution Documents to topics multinomial distribution Focus in this talk: statistical methods Model: story of how your data came to be Latent variables: missing pieces of your story Statistical inference: filling in those missing pieces We use latent Dirichlet allocation (LDA), a fully Bayesian version of plsi, probabilistic version of LSA Material adapted from David Mimno UMD Topic Models 16 / 51

Evaluation Evaluation Corpus Forget the Bootleg, Just Download Multiplex the Heralded Movie Legally As Linchpin The Shape To of Growth Cinema, Transformed A Peaceful At Crew the Click Puts of Muppets a Where Mouse Stock Trades: A Its Better Mouth Deal Is For The Investors three Isn't big Internet Simple portals Red Light, begin Green to distinguish Light: A among 2-Tone themselves L.E.D. to as shopping Simplify Screens malls Model A Model B Model C -4.8-15.16-23.42 Held-out Data Sony Ericsson's Infinite Hope for a Turnaround For Search, Murdoch Looks to a Deal With Microsoft Price War Brews Between Amazon and Wal-Mart How you compute it is important too (Wallach et al. 2009) Material adapted from David Mimno UMD Topic Models 17 / 51

Evaluation Evaluation Corpus Forget the Bootleg, Just Download Multiplex the Heralded Movie Legally As Linchpin The Shape To of Growth Cinema, Transformed A Peaceful At Crew the Click Puts of Muppets a Where Mouse Stock Trades: A Its Better Mouth Deal Is For The Investors three Isn't big Internet Simple portals Red Light, begin Green to distinguish Light: A among 2-Tone themselves L.E.D. to as shopping Simplify Screens malls Model A Model B Model C Held-out Log Likelihood -4.8-15.16-23.42 Held-out Data Sony Ericsson's Infinite Hope for a Turnaround For Search, Murdoch Looks to a Deal With Microsoft Price War Brews Between Amazon and Wal-Mart Measures predictive power, not what the topics are How you compute it is important too (Wallach et al. 2009) Material adapted from David Mimno UMD Topic Models 17 / 51

Evaluation Word Intrusion TOPIC 1 TOPIC 2 TOPIC 3 computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Material adapted from David Mimno UMD Topic Models 18 / 51

Evaluation Word Intrusion 1. Take the highest probability words from a topic Original Topic dog, cat, horse, pig, cow Material adapted from David Mimno UMD Topic Models 19 / 51

Evaluation Word Intrusion 1. Take the highest probability words from a topic Original Topic dog, cat, horse, pig, cow 2. Take a high-probability word from another topic and add it Topic with Intruder dog, cat, apple, horse, pig, cow Material adapted from David Mimno UMD Topic Models 19 / 51

Evaluation Word Intrusion 1. Take the highest probability words from a topic Original Topic dog, cat, horse, pig, cow 2. Take a high-probability word from another topic and add it Topic with Intruder dog, cat, apple, horse, pig, cow 3. We ask users to find the word that doesn t belong Hypothesis If the topics are interpretable, users will consistently choose true intruder Material adapted from David Mimno UMD Topic Models 19 / 51

Evaluation Word Intrusion Material adapted from David Mimno UMD Topic Models 20 / 51

Evaluation Word Intrusion Order of words was shuffled Which intruder was selected varied Model precision: percentage of users who clicked on intruder Material adapted from David Mimno UMD Topic Models 20 / 51

Evaluation Word Intrusion: Which Topics are Interpretable? New York Times, 50 LDA Topics Number of Topics 0 5 10 15 committee legislation proposal republican taxis fireplace garage house kitchen list americans japanese jewish states terrorist artist exhibition gallery museum painting 0.000 0.125 0.250 0.375 0.500 0.625 0.750 0.875 1.000 Model Precision Model Precision: percentage of correct intruders found Material adapted from David Mimno UMD Topic Models 21 / 51

Evaluation York Times Interpretability and Likelihood Better Model Precision 0.80 0.75 0.70 0.65 1.0 0.80 0.75 1.5 0.70 2.0 0.65 2.5 1.0 1.5 Model Precision on New York Times New York Times 7.32 7.30 7.28 7.26 7.24 7.52 7.50 7.4 Held-out Likelihood Predictive Log Likelihood Better 7.28 7.26 7.24 7.52 7.50 7.48 7.46 7.44 7.42 7.40 Predictive 2.0 Log Likelihood within a model, higher likelihood higher interpretability Wikipedia Model Precision Topic Log Odds Model CTM LDA plsi Number of topics 50 100 Material adapted from David Mimno UMD Topic Models 22 / 51 150

Evaluation 0.75 Wikipedia Interpretability and Likelihood 0.70 Topic Log Odds on Wikipedia 0.65 1.0 York Times Better Topic Log Odds 1.5 2.0 2.5 0.80 Number Model of topics CTM 50 100 LDA 150 plsi Number of topics 50 100 150 7.28 7.26 7.24 7.52 7.32 7.50 7.30 7.48 7.46 7.28 7.44 7.26 7.42 7.40 7.24 7.52 7.50 7.4 Predictive Log Likelihood Held-out Likelihood Predictive Log Likelihood Better Model Precision Topic Log Odds Model Precision Topic Log Odds Model CTM LDA plsi 7.28 7.26 7.24 7.52 7.50 7.48 7.46 7.44 7.42 7.40 Predictive Log Likelihood across models, higher likelihood higher interpretability Material adapted from David Mimno UMD Topic Models 22 / 51

Evaluation Evaluation Takeaway Measure what you care about If you care about prediction, likelihood is good If you care about a particular task, measure that Material adapted from David Mimno UMD Topic Models 23 / 51

Inference We are interested in posterior distribution p(z X,Θ) (4) Material adapted from David Mimno UMD Topic Models 24 / 51

Inference We are interested in posterior distribution p(z X,Θ) (4) Here, latent variables are topic assignments z and topics θ. X is the words (divided into documents), and Θ are hyperparameters to Dirichlet distributions: α for topic proportion, λ for topics. p( z, β, θ w,α,λ) (5) Material adapted from David Mimno UMD Topic Models 24 / 51

Inference We are interested in posterior distribution p(z X,Θ) (4) Here, latent variables are topic assignments z and topics θ. X is the words (divided into documents), and Θ are hyperparameters to Dirichlet distributions: α for topic proportion, λ for topics. p( z, β, θ w,α,λ) (5) p( w, z, θ, β α,λ) = p(β k λ) p(θ d α) p(z d,n θ d )p(w d,n β zd,n ) k d n Material adapted from David Mimno UMD Topic Models 24 / 51

Gibbs Sampling A form of Markov Chain Monte Carlo Chain is a sequence of random variable states Given a state {z1,...z N } given certain technical conditions, drawing z k p(z 1,...z k 1,z k+1,...z N X,Θ) for all k (repeatedly) results in a Markov Chain whose stationary distribution is the posterior. For notational convenience, call z with zd,n removed z d,n Material adapted from David Mimno UMD Topic Models 25 / 51

Inference computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 26 / 51

Inference computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 26 / 51

Inference computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 26 / 51

Inference computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 26 / 51

Inference computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 26 / 51

Inference computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 26 / 51

Inference computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Material adapted from David Mimno UMD Topic Models 26 / 51

Gibbs Sampling For LDA, we will sample the topic assignments Thus, we want: p(z d,n = k z d,n, w,α,λ) = p(z d,n = k, z d,n w,α,λ) p( z d,n w,α,λ) Material adapted from David Mimno UMD Topic Models 27 / 51

Gibbs Sampling For LDA, we will sample the topic assignments Thus, we want: p(z d,n = k z d,n, w,α,λ) = p(z d,n = k, z d,n w,α,λ) p( z d,n w,α,λ) The topics and per-document topic proportions are integrated out / marginalized Let nd,i be the number of words taking topic i in document d. Let v k,w be the number of times word w is used in topic k. θd i k θ α i+n d,i 1 d θ α k+n d,i d dθ d β = k θd i θ α i+n d,i 1 dθ d βk d i w d,n β λ i+v k,i 1 k,i i β λ i+v k,i 1 k,i dβ k β λ i+v k,i k,w d,n dβ k Material adapted from David Mimno UMD Topic Models 27 / 51

Gibbs Sampling For LDA, we will sample the topic assignments The topics and per-document topic proportions are integrated out / marginalized / Rao-Blackwellized Thus, we want: p(z d,n = k z d,n, w,α,λ) = n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v k,i + λ i K i Material adapted from David Mimno UMD Topic Models 28 / 51

Gibbs Sampling Integral is normalizer of Dirichlet distribution V β λ i+v k,i 1 Γ (β i i + v k,i ) k,i dβ k = β k i Γ V β i i + v k,i Material adapted from David Mimno UMD Topic Models 29 / 51

Gibbs Sampling Integral is normalizer of Dirichlet distribution V β λ i+v k,i 1 Γ (β i i + v k,i ) k,i dβ k = β k i Γ V β i i + v k,i So we can simplify θd i k θ α i +n d,i 1 d θ α k +n d,i d dθ d β k θd i θ α i +n d,i 1 dθ d βk Γ d dβ k = dβ k i w d,n β λ i +v k,i 1 k,i β λ i +v k,i k,w d,n β λ i +v k,i 1 i k,i Γ(α k + n d,k + 1) K α i + n d,i + 1 Γ (α Γ(λ wd,n + v k,wd,n + 1) V i k k + n d,k ) Γ λ i + v k,i + 1 Γ λ i wd,n k + v k,wd,n K i K i Γ(α i + n d,i) K α i i + n d,i Γ V i V i Γ(λ i + v k,i) V λ i i + v k,i Γ Material adapted from David Mimno UMD Topic Models 29 / 51

Gamma Function Identity z = Γ (z + 1) Γ (z) (6) Γ Γ(α k + n d,k + 1) K α i + n d,i + 1 Γ (α Γ(λ wd,n + v k,wd,n + 1) V i k k + n d,k ) Γ λ i + v k,i + 1 Γ λ i wd,n k + v k,wd,n K i K i Γ(α i + n d,i) K α i i + n d,i Γ = n d,k + α k v k,wd,n + λ wd,n K n i d,i + α v i i k,i + λ i V i V i Γ(λ i + v k,i) V λ i i + v k,i Γ Material adapted from David Mimno UMD Topic Models 30 / 51

Gibbs Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v (7) k,i + λ i K i Number of times document d uses topic k Number of times topic k uses word type wd,n Dirichlet parameter for document to topic distribution Dirichlet parameter for topic to word distribution How much this document likes topic k How much this topic likes word wd,n Material adapted from David Mimno UMD Topic Models 31 / 51

Gibbs Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v (7) k,i + λ i K i Number of times document d uses topic k Number of times topic k uses word type wd,n Dirichlet parameter for document to topic distribution Dirichlet parameter for topic to word distribution How much this document likes topic k How much this topic likes word wd,n Material adapted from David Mimno UMD Topic Models 31 / 51

Gibbs Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v (7) k,i + λ i K i Number of times document d uses topic k Number of times topic k uses word type wd,n Dirichlet parameter for document to topic distribution Dirichlet parameter for topic to word distribution How much this document likes topic k How much this topic likes word wd,n Material adapted from David Mimno UMD Topic Models 31 / 51

Gibbs Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v (7) k,i + λ i K i Number of times document d uses topic k Number of times topic k uses word type wd,n Dirichlet parameter for document to topic distribution Dirichlet parameter for topic to word distribution How much this document likes topic k How much this topic likes word wd,n Material adapted from David Mimno UMD Topic Models 31 / 51

Gibbs Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v (7) k,i + λ i K i Number of times document d uses topic k Number of times topic k uses word type wd,n Dirichlet parameter for document to topic distribution Dirichlet parameter for topic to word distribution How much this document likes topic k How much this topic likes word wd,n Material adapted from David Mimno UMD Topic Models 31 / 51

Gibbs Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v (7) k,i + λ i K i Number of times document d uses topic k Number of times topic k uses word type wd,n Dirichlet parameter for document to topic distribution Dirichlet parameter for topic to word distribution How much this document likes topic k How much this topic likes word wd,n Material adapted from David Mimno UMD Topic Models 31 / 51

Sample Document Material adapted from David Mimno UMD Topic Models 32 / 51

Sample Document Material adapted from David Mimno UMD Topic Models 33 / 51

Randomly Assign Topics Material adapted from David Mimno UMD Topic Models 34 / 51

Randomly Assign Topics Material adapted from David Mimno UMD Topic Models 35 / 51

Total Topic Counts Material adapted from David Mimno UMD Topic Models 36 / 51

Total Topic Counts Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v k,i + λ i K i Material adapted from David Mimno UMD Topic Models 36 / 51

Total Topic Counts Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v k,i + λ i K i Material adapted from David Mimno UMD Topic Models 36 / 51

We want to sample this word... Material adapted from David Mimno UMD Topic Models 37 / 51

We want to sample this word... Material adapted from David Mimno UMD Topic Models 37 / 51

Decrement its count Material adapted from David Mimno UMD Topic Models 38 / 51

What is the conditional distribution for this topic? Material adapted from David Mimno UMD Topic Models 39 / 51

Part 1: How much does this document like each topic? Material adapted from David Mimno UMD Topic Models 40 / 51

Part 1: How much does this document like each topic? Material adapted from David Mimno UMD Topic Models 41 / 51

Part 1: How much does this document like each topic? Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v k,i + λ i K i Material adapted from David Mimno UMD Topic Models 41 / 51

Part 1: How much does this document like each topic? Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v k,i + λ i K i Material adapted from David Mimno UMD Topic Models 41 / 51

Part 2: How much does each topic like the word? Material adapted from David Mimno UMD Topic Models 42 / 51

Part 2: How much does each topic like the word? Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v k,i + λ i K i Material adapted from David Mimno UMD Topic Models 42 / 51

Part 2: How much does each topic like the word? Sampling Equation n d,k + α k v k,wd,n + λ wd,n n d,i + α i i v k,i + λ i K i Material adapted from David Mimno UMD Topic Models 42 / 51

Geometric interpretation Material adapted from David Mimno UMD Topic Models 43 / 51

Geometric interpretation Material adapted from David Mimno UMD Topic Models 43 / 51

Geometric interpretation Material adapted from David Mimno UMD Topic Models 43 / 51

Update counts Material adapted from David Mimno UMD Topic Models 44 / 51

Update counts Material adapted from David Mimno UMD Topic Models 44 / 51

Update counts Material adapted from David Mimno UMD Topic Models 44 / 51

Details: how to sample from a distribution P Q P Topic 1 0.0 n d,k + k P K i n d,i + i v k,wd,n + wd,n P i v k,i + i Topic 2 Topic 3 0.112 Topic 4 Normalize Topic 5 1.0 Material adapted from David Mimno UMD Topic Models 45 / 51

Algorithm 1. For each iteration i: 1.1 For each document d and word n currently assigned to z old : 1.1.1 Decrement n d,zold and v zold,w d,n 1.1.2 Sample z new = k with probability proportional to n d,k +α k v k,wd,n +λ wd,n K i n d,i +α i i v k,i +λ i 1.1.3 Increment n d,znew and v znew,w d,n Material adapted from David Mimno UMD Topic Models 46 / 51

Implementation Algorithm 1. For each iteration i: 1.1 For each document d and word n currently assigned to z old : 1.1.1 Decrement n d,zold and v zold,w d,n 1.1.2 Sample z new = k with probability proportional to n d,k +α k v k,wd,n +λ wd,n K i n d,i +α i i v k,i +λ i 1.1.3 Increment n d,znew and v znew,w d,n Material adapted from David Mimno UMD Topic Models 47 / 51

Desiderata Hyperparameters: Sample them too (slice sampling) Initialization: Random Sampling: Until likelihood converges Lag / burn-in: Difference of opinion on this Number of chains: Should do more than one Material adapted from David Mimno UMD Topic Models 48 / 51

Available implementations Mallet (http://mallet.cs.umass.edu) LDAC (http://www.cs.princeton.edu/ blei/lda-c) Topicmod (http://code.google.com/p/topicmod) Material adapted from David Mimno UMD Topic Models 49 / 51

Wrapup Topic Models: Tools to uncover themes in large document collections Another example of Gibbs Sampling In class: Gibbs sampling example Material adapted from David Mimno UMD Topic Models 50 / 51

Material adapted from David Mimno UMD Topic Models 51 / 51