Latent Dirichlet Alloca/on

Similar documents
CS 6140: Machine Learning Spring 2017

Document and Topic Models: plsa and LDA

Non-Parametric Bayes

Latent Dirichlet Allocation (LDA)

13: Variational inference II

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Study Notes on the Latent Dirichlet Allocation

Mixture Models. Michael Kuhn

Latent Dirichlet Allocation

Lecture 13 : Variational Inference: Mean Field Approximation

CS Lecture 18. Topic Models and LDA

Introduction to Bayesian inference

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Non-parametric Clustering with Dirichlet Processes

Bayesian networks Lecture 18. David Sontag New York University

Modeling Environment

Lecture 7: Con3nuous Latent Variable Models

Recent Advances in Bayesian Inference Techniques

Introduction to Probabilistic Machine Learning

Density Estimation. Seungjin Choi

Bayesian Nonparametric Models

Latent Dirichlet Allocation Introduction/Overview

13 : Variational Inference: Loopy Belief Propagation and Mean Field

LDA with Amortized Inference

Dirichlet Enhanced Latent Semantic Analysis

Machine Learning Techniques for Computer Vision

Lecture 3a: Dirichlet processes

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

COS513 LECTURE 8 STATISTICAL CONCEPTS

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

An Introduction to Expectation-Maximization

Generative Clustering, Topic Modeling, & Bayesian Inference

Latent Dirichlet Allocation (LDA)

Clustering problems, mixture models and Bayesian nonparametrics

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Dirichlet Processes: Tutorial and Practical Course

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Lecture 8: Graphical models for Text

Latent Dirichlet Allocation

Sta$s$cal sequence recogni$on

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

How to generate large-scale data from small-scale realworld

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Kernel Density Topic Models: Visual Topics Without Visual Words

Introduction to Machine Learning

Graphical Models. Lecture 5: Template- Based Representa:ons. Andrew McCallum

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Mixed-membership Models (and an introduction to variational inference)

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Introduction to Particle Filters for Data Assimilation

Topic Modelling and Latent Dirichlet Allocation

Introduc)on to Ar)ficial Intelligence

Bayesian Machine Learning

Clustering with k-means and Gaussian mixture distributions

Variational Inference (11/04/13)

Graphical Models for Collaborative Filtering

Mixtures of Gaussians. Sargur Srihari

Bayesian Nonparametrics for Speech and Signal Processing

CS145: INTRODUCTION TO DATA MINING

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Linear Dynamical Systems

STAD68: Machine Learning

Advanced Machine Learning

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Lecture 19, November 19, 2012

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

STA 4273H: Statistical Machine Learning

Two Useful Bounds for Variational Inference

Curve Fitting Re-visited, Bishop1.2.5

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Statistical learning. Chapter 20, Sections 1 3 1

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Lecture 6: Graphical Models: Learning

PMR Learning as Inference

Introduc)on to Ar)ficial Intelligence

Foundations of Nonparametric Bayesian Methods

Learning Bayesian network : Given structure and completely observed data

Expectation Propagation for Approximate Bayesian Inference

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Evaluation Methods for Topic Models

EM & Variational Bayes

Learning Deep Genera,ve Models

Bayesian Methods: Naïve Bayes

Image segmentation combining Markov Random Fields and Dirichlet Processes

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Machine Learning Summer School

Probabilistic Graphical Models

Variational Methods in Bayesian Deconvolution

Note 1: Varitional Methods for Latent Dirichlet Allocation

AN INTRODUCTION TO TOPIC MODELS

Probabilistic Time Series Classification

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Collapsed Variational Bayesian Inference for Hidden Markov Models

Transcription:

Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam

What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which can be learned and used to do inference. LDA is a hierarchical Bayesian Model

LDA and Document Modeling A Document of a collec/on is modeled as a finite mixture over underlying topics. Topics in turn are modeled as an infinite mixture over an underlying set of topic probabili/es. Topic probabili/es are explicit representa/ons of a document. Find short descrip/ons of members while preserving sta/s/cal rela/ons. Document classifica/on is easier with LDA

Previous Schemes for Document Modeling! idf scheme where counts are taken for each word and document is modeled. Latent Seman.c Indexing which uses SVD to capture P idf features which capture most of the variance. plsi Each word in a document is a sample from a mixture model and generated from a single topic. (Each document is represented as a mixing propor/ons of topics and there is not probabilis/c model for these propor/ons)

An Early Example.. α θ β Z N w

An Early Example.. α θ β Z Words N w

An Early Example.. α Topics θ β Z Words N w

An Early Example.. α Topics θ β Z Words N w

Mixture of Topics α An Early Example.. Topics θ β Z Words N w Document

Mixture of Topics α An Early Example.. Topics θ β Z Words N w Document

Exchangeability and Bag of Words Assump/on that the order of words in the document can be neglected A finite set of Random Variables {x 1,..x N } is exchangeable if σ the joint distribu/on is invariant to any permuta/on of these RVs. i.e. if is a permuta/on of 1 to N: σ P(x 1,..., x N ) = P(x σ (1),..., x σ (N ) ) e.g : Any weighted average of i.i.d sequences of random variables is exchangeable.

De Fine\ s Theorem Can Rewrite the Joint of an infinitely exchangeable sequence of RVs by drawing a random parameter from some distribu/on and trea/ng the RVs as i.i.d condi/oned on that random parameter. θ Z n N

De Fine\ s Theorem Can Rewrite the Joint of an infinitely exchangeable sequence of RVs by drawing a random parameter from some distribu/on and trea/ng the RVs as i.i.d condi/oned on that random parameter. θ Random Parameter of a Mul/nomial over topics Z n N

De Fine\ s Theorem Can Rewrite the Joint of an infinitely exchangeable sequence of RVs by drawing a random parameter from some distribu/on and trea/ng the RVs as i.i.d condi/oned on that random parameter. θ Random Parameter of a Mul/nomial over topics Z n N Topics are now i.i.d condi/oned on theta.

LDA and Exchangeability Words are generated by topics with a fixed condi/onal distribu/on Topics are infinitely exchangeable within a document. For a document W= (w 1,w 2,..w N ) of N words and a corpus of M documents C = { W 1, W 2, W M } for k topics denoted by z, N p(w,z) = p(θ) p(z n θ) p(w n z n ) d(θ) n=1 What type of distribu/on can be used to make it easy for inference?

The Dirichlet Distribu/on A K Dimensional Dirichlet RV can take values in the (k 1) θ simplex and has the following density on that simplex Where is a k vector with components greater than 0. α Dirichlet makes it easy for inference as it has finite dimensional sufficient sta/s/cs and is a conjugate to the Mul/nomial distribu/on.

Genera/ve Process of LDA Choose N ~ Poisson(ξ) Choose θ ~ Dir(α) For each word w n : choose a topic z n ~ Mul6nomial ( ) Choose a word w n from a mul6nomial p(w n z n,β) probability condi6oned on the topic z n Beta is a k x v Matrix and β ij = p(w j =1 z i =1) θ

Graphical Model of LDA The joint over the topics and words is given by, Sampled once per corpus Sampled once every document Sampled once every word

The Marginal of a Document and The Probability of the Corpus. Integra/ng over the topic mixtures and summing over the words gives the Marginal of a document. Product of the Marginals of all documents gives the probability of the corpus Corpus level Document Level Word Level

Geometric Representa/on

Inference Problem We have to find the Posterior of the latent variables of a document. Intractable cause we need to marginalize over hidden variables. Tight Coupling between two parameters Use approximate inference like MCMC or varia.onal methods.

Varia/onal Inference Drop edges which cause the coupling in graphical model. Simplified graphical model with free varia/onal parameters Problema/c coupling not present in the simpler graphical model.

Problema/c edge Varia/onal Inference Drop edges which cause the coupling in graphcial model. Simplified graphical model with free varia/onal parameters Problema/c coupling not present in the simpler graphical model.

Varia/onal Inference Results in the following distribu/on : Minimize the Kullback Leibler divergence. Equa/ng deriva/ves of KL to zero, we get the update equa/ons,

Varia/onal Inference Results in the following distribu/on : Dirichlet Parameter Mul/nomial Parameter Minimize the Kullback Leibler divergence. Equa/ng deriva/ves of KL to zero, we get the update equa/ons,

Parameter Es/ma/on Using empirical Bayes Find the parameters which maximize the log likelihood of data. Intractable for same reasons. Varia/onal inference provided a /ght lower bound. Alterna/ng Varia/onal Expecta/on Maximiza/on: E Step : for each document find op/mizing values of varia/onal parameters. (γ,φ) M Step: Maximize the lower bound on the likelihood with respect to the model parameters. (α,β)

Smoothing Likelihood of previously unseen documents is always zero. Smooth matrix by considering its elements as RVs with a β posterior condi/oned on data. Do the whole inference procedure again for new model to get new update equa/ons. Another Dirichlet Prior Treat Elements of Beta as RVs endowed with a posterior

Smoothing Likelihood of previously unseen documents is always zero. Smooth matrix by considering its elements as RVs with a β posterior condi/oned on data. Do the whole inference procedure again for new model to get new update equa/ons. Final Hyper parameters

Extending LDA Make a con/nuous variant using gaussians instead of mul/nomials. Par/cular form of clustering by having a mixture of Dirichlet distribu/ons instead of one. What must be done to extend LDA to a more useful model? Can we use this LDA model in Computer Vision?

Applica/on in Computer Vision One of the methods extended in Describing visual scenes (Sudderth et al 2005) Topics in a scene Llama Sky Tree Grass Llama Llama Sky Tree Grass

Applica/on in Computer Vision One of the methods extended in Describing visual scenes (Sudderth et al 2005) Topics in a scene Spa/al rela/onships? Llama Sky Tree Grass Llama Llama Sky Tree Grass

Applica/on in Computer Vision One of the methods extended in Describing visual scenes (Sudderth et al 2005) Topics in a scene More Hierarchies.. Cooler Models.. Llama Sky Tree Grass Llama Llama Sky Tree Grass

Even cooler models..!

Take home message. LDA illustrates how Probabilis/c models can be scaled up. With good inference techniques, we can solve hard problems in mul/ple domains which have a mul/ple hierarchies. Genera/ve models are modular and extensible easily.

Thank You!