Topic Modelling and Latent Dirichlet Allocation

Similar documents
CS Lecture 18. Topic Models and LDA

Topic Models and Applications to Short Documents

Generative Clustering, Topic Modeling, & Bayesian Inference

Latent Dirichlet Allocation (LDA)

Lecture 13 : Variational Inference: Mean Field Approximation

Applying LDA topic model to a corpus of Italian Supreme Court decisions

16 : Approximate Inference: Markov Chain Monte Carlo

CS145: INTRODUCTION TO DATA MINING

Text Mining for Economics and Finance Latent Dirichlet Allocation

Document and Topic Models: plsa and LDA

Sparse Stochastic Inference for Latent Dirichlet Allocation

Modeling Environment

Latent Dirichlet Allocation (LDA)

AN INTRODUCTION TO TOPIC MODELS

Evaluation Methods for Topic Models

Introduction To Machine Learning

Statistical Debugging with Latent Topic Models

Latent Dirichlet Allocation

Introduction to Machine Learning

Study Notes on the Latent Dirichlet Allocation

Latent Dirichlet Allocation Introduction/Overview

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Time-Sensitive Dirichlet Process Mixture Models

Note 1: Varitional Methods for Latent Dirichlet Allocation

Web Search and Text Mining. Lecture 16: Topics and Communities

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07

Non-Parametric Bayes

Introduction to Bayesian inference

Gaussian Mixture Model

Topic Models. Charles Elkan November 20, 2008

Lecture 19, November 19, 2012

Applying hlda to Practical Topic Modeling

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

13: Variational inference II

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Latent Dirichlet Bayesian Co-Clustering

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Mixture Models and Expectation-Maximization

Latent Variable Models Probabilistic Models in the Study of Language Day 4

Dirichlet Enhanced Latent Semantic Analysis

LDA with Amortized Inference

Topic Modeling: Beyond Bag-of-Words

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Latent Dirichlet Allocation

Probabilistic Graphical Networks: Definitions and Basic Results

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Variational Scoring of Graphical Model Structures

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Note for plsa and LDA-Version 1.1

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Lecture 22 Exploratory Text Analysis & Topic Models

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

STA 4273H: Statistical Machine Learning

arxiv: v6 [stat.ml] 11 Apr 2017

Advanced Machine Learning

Lecture 6: Graphical Models: Learning

The Role of Semantic History on Online Generative Topic Modeling

Bayesian Nonparametric Models

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Measuring Topic Quality in Latent Dirichlet Allocation

Probabilistic Topic Models Tutorial: COMAD 2011

A Continuous-Time Model of Topic Co-occurrence Trends

Bayesian Inference for Dirichlet-Multinomials

Optimization Number of Topic Latent Dirichlet Allocation

topic modeling hanna m. wallach

Bayesian Inference and MCMC

Content-based Recommendation

Mixed-membership Models (and an introduction to variational inference)

Machine Learning Summer School

LECTURE 15 Markov chain Monte Carlo

Unsupervised Learning

Scaling Neighbourhood Methods

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Probabilistic Machine Learning

MCMC: Markov Chain Monte Carlo

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Machine Learning for Data Science (CS4786) Lecture 24

Language Information Processing, Advanced. Topic Models

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Tutorial on Probabilistic Programming with PyMC3

The Monte Carlo Method: Bayesian Networks

Based on slides by Richard Zemel

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Expectation Maximization

Parameter estimation for text analysis

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Inference Methods for Latent Dirichlet Allocation

Lecture 7 and 8: Markov Chain Monte Carlo

COS513 LECTURE 8 STATISTICAL CONCEPTS

Latent Dirichlet Allocation Based Multi-Document Summarization

Bayesian Mixtures of Bernoulli Distributions

Collapsed Variational Bayesian Inference for Hidden Markov Models

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED.

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

ECE 5984: Introduction to Machine Learning

Gentle Introduction to Infinite Gaussian Mixture Modeling

Transcription:

Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer Science MPhil in Advanced Computer Science

Introduction to Probabilistic Topic Models We want to find themes (or topics) in documents useful for e.g. search or browsing We don t want to do supervised topic classification rather not fix topics in advance nor do manual annotation Need an approach which automatically teases out the topics This is essentially a clustering problem- can think of both words and documents as being clustered MPhil in Advanced Computer Science 1

Key Assumptions behind the LDA Topic Model Documents exhibit multiple topics (but typically not many) LDA is a probabilistic model with a corresponding generative process each document is assumed to be generated by this (simple) process A topic is a distribution over a fixed vocabulary these topics are assumed to be generated first, before the documents Only the number of topics is specified in advance MPhil in Advanced Computer Science 2

The Generative Process To generate a document: 1. Randomly choose a distribution over topics 2. For each word in the document a. randomly choose a topic from the distribution over topics b. randomly choose a word from the corresponding topic (distribution over the vocabulary) Note that we need a distribution over a distribution (for step 1) Note that words are generated independently of other words (unigram bag-ofwords model) MPhil in Advanced Computer Science 3

The Generative Process more Formally Some notation: β 1:K are the topics where each β k is a distribution over the vocabulary θ d are the topic proportions for document d θ d,k is the topic proportion for topic k in document d z d are the topic assignments for document d z d,n is the topic assignment for word n in document d w d are the observed words for document d The joint distribution (of the hidden and observed variables): p(β 1:K,θ 1:D,z 1:D,w 1:D ) = K D N p(β i ) p(θ d ) p(z d,n θ d )p(w d,n β 1:K,z d,n ) i=1 d=1 n=1 MPhil in Advanced Computer Science 4

Module L101: Machine Learning for Language Processing Plate Diagram of the Graphical Model Note that only the words are observed (shaded) α and η are the parameters of the respective dirichlet distributions (more later) Note that the topics are generated (not shown in earlier pseudo code) Plates indicate repetition Picture from Blei 2012 MPhil in Advanced Computer Science 5

Multinomial Distribution Multinomial distribution: x i {0,...,n} P(x θ) = n! d i=1 x i! d i=1 θ x i i, d n = x i, i=1 d θ i = 1, θ i 0 i=1 When n = 1 the multinomial distribution simplifies to P(x θ) = d i=1 θ x i i, d θ i = 1, θ i 0 i=1 a unigram language model with 1-of-V coding (d = { V the vocabulary size) 1, word i observed x i indicateswordiofthevocabularyobserved,x i = 0, otherwise θ i = P(w i ) the probability that word i is seen MPhil in Advanced Computer Science 6

The Dirichlet Distribution Dirichlet (continuous) distribution with parameters α p(x α) = Γ( d i=1 α i) d i=1 Γ(α i) d i=1 x α i 1 i ; for observations : d x i = 1, x i 0 i=1 Γ() is the Gamma distribution Conjugate prior to the multinomial distribution (form of posterior p(θ D,M) is the same as the prior p(θ M)) MPhil in Advanced Computer Science 7

Dirichlet Distribution Example (6,2,2) (3,7,5) (2,3,4) Parameters: (α 1,α 2,α 3 ) (6,2,6) MPhil in Advanced Computer Science 8

Parameter Estimation Main variables of interest: β k : distribution over vocabulary for topic k θ d,k : topic proportion for topic k in document d Could try and get these directly, eg using EM (Hoffmann, 1999), but this approach not very successful One common technique is to estimate the posterior of the word-topic assignments, given the observed words, directly (whilst marginalizing out β and θ) MPhil in Advanced Computer Science 9

Gibbs Sampling Gibbs sampling is an example of a Markov Chain Monte Carlo (MCMC) technique Markov chain in this instance means that we sample from each variable one at a time, keeping the current values of the other variables fixed MPhil in Advanced Computer Science 10

Posterior Estimate The Gibbs sampler produces the following estimate, where, following Steyvers and Griffiths: z i is the topic assigned to the ith token in the whole collection; d i is the document containing the ith token; w i is the word type of the ith token; z i is the set of topic assignments of all other tokens; is any remaining information such as the α and η hyperparameters: P(z i = j z i,w i,d i, ) C WT w i j +η W w=1 CWT wj +Wη C DT d i j +α T t=1 CDT d i t +Tα where C WT and C DT are matrices of counts (word-topic and document-topic) MPhil in Advanced Computer Science 11

Posterior Estimates of β and θ β ij = Cij WT +η W k=1 CWT kj +Wη θ dj = C DT dj +α T k=1 CDT dk +Tα Using the count matrices as before, where β ij is the probability of word type i for topic j, and θ dj is the proportion of topic j in document d MPhil in Advanced Computer Science 12

References David Blei s webpage is a good place to start A good introductory paper: D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):7784, 2012. Introduction to Gibbs sampling for LDA: Steyvers, M., Griffiths, T. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning. T. Landauer, D. McNamara, S. Dennis, and W. Kintsch, eds. Lawrence Erlbaum, 2006. MPhil in Advanced Computer Science 13