LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

Similar documents
Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

Gaussian Mixture Model

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 2: Correlated Topic Model

Collapsed Variational Inference for HDP

13: Variational inference II

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Collapsed Variational Inference for LDA

Introduction to Bayesian inference

Generative Clustering, Topic Modeling, & Bayesian Inference

Topic Modeling: Beyond Bag-of-Words

Expectation Maximization

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Mixed-membership Models (and an introduction to variational inference)

Latent Dirichlet Allocation (LDA)

Sparse Stochastic Inference for Latent Dirichlet Allocation

CS Lecture 18. Topic Models and LDA

Introduction To Machine Learning

Note 1: Varitional Methods for Latent Dirichlet Allocation

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Latent Variable Models

Lower bounds on Locality Sensitive Hashing

Series 7, May 22, 2018 (EM Convergence)

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Introduction to Machine Learning

Variational Inference (11/04/13)

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

26.1 Metropolis method

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Topic Modelling and Latent Dirichlet Allocation

Two Useful Bounds for Variational Inference

Variational inference

Topic Models. Charles Elkan November 20, 2008

Latent Variable Models and EM algorithm

Statistical Debugging with Latent Topic Models

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Evaluation Methods for Topic Models

Lecture 8: Graphical models for Text

Proof of SPNs as Mixture of Trees

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

An Introduction to Expectation-Maximization

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Online Bayesian Passive-Agressive Learning

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Introduction to Machine Learning

Latent Dirichlet Alloca/on

Gaussian Mixture Models

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Posterior Regularization

Optimization of Geometries by Energy Minimization

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Instructor: Dr. Volkan Cevher. 1. Background

Equilibrium in Queues Under Unknown Service Times and Service Value

Cascaded redundancy reduction

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Expectation Maximization

Expectation Maximization

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Chapter 4. Electrostatics of Macroscopic Media

Ad Placement Strategies

Data Mining Techniques

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Introduction to Machine Learning

A Review of Multiple Try MCMC algorithms for Signal Processing

Variational Inference. Sargur Srihari

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

an introduction to bayesian inference

Topic 7: Convergence of Random Variables

Probabilistic Time Series Classification

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Non-Parametric Bayes

Multi-View Clustering via Canonical Correlation Analysis

Variational Inference: A Review for Statisticians

Stochastic Variational Inference

2.1 Optimization formulation of k-means

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

IN the evolution of the Internet, there have been

Clustering, K-Means, EM Tutorial

The Expectation-Maximization Algorithm

WUCHEN LI AND STANLEY OSHER

Note for plsa and LDA-Version 1.1

A Variational Approach to Semi-Supervised Clustering

AN INTRODUCTION TO TOPIC MODELS

STA 4273H: Statistical Machine Learning

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Topic 2.3: The Geometry of Derivatives of Vector Functions

Statistical Machine Learning Lectures 4: Variational Bayes

G8325: Variational Bayes

Transcription:

Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe Membership Moels Now: Document may belong to mulnple clusters EDUCATION FINANCE TECHNOLOGY Emily Fox 05

5/8/5 Latent Dirichlet AllocaNon (LDA) Emily Fox 05 Latent Dirichlet AllocaNon (LDA) Latent Dirichlet allocation (LDA) Topics Documents Topic proportions an assignments But we only observe the ocuments; the other structure is hien. We compute the posterior p.topics, proportions, j ocuments/ Emily Fassignments ox 05 4

LDA GeneraNve Moel n ObservaNons: w,...,w N n Associate topics: z,...,z N n Parameters: = {{ }, { k }} n GeneraNve moel: Emily Fox 05 5 Collapse LDA Sampling Sample topic inicators for each wor Algorithm: z i k K p(z i = k z \i, { },, ) wi N D / p(zi = k {zj,j 6= i}, )p(w i {w j c :: zj c j c =, k, (j, (j, c) c) 6= 6= (i, (i, )}, )}, ) ) Emily Fox 05 6

Select a Document Etruscan trae Emily Fox 05 7 Ranomly Assign Topics z i Etruscan trae Emily Fox 05 8 4

Ranomly Assign Topics z i Etruscan trae Etruscan trae Etruscan trae Etruscan Etruscan trae trae Etruscan Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan Etruscan trae trae trae trae Etruscan Etruscan trae ship trae trae Etruscan Etruscan trae trae ship Etruscan trae ship trae Etruscan trae Italy ship trae Emily Fox 05 9 Maintain Local StaNsNcs z i Etruscan trae Doc Topic Topic Topic Emily Fox 05 0 5

Maintain Global StaNsNcs z i Etruscan trae Topic Topic Topic Etruscan 0 5 50 0 4 0 0 0 0 trae 0 8... Topic Topic Topic Doc Total counts from all ocs Emily Fox 05 Resample Assignments z i Etruscan trae Topic Topic Topic Topic Topic Topic Doc Etruscan 0 5 50 0 4 0 0 0 0 trae 0 8... Emily Fox 05 6

What is the coninonal istribunon for this topic? z i? Etruscan trae Emily Fox 05 What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? z i? Etruscan trae Topic Topic Topic Topic Topic Topic Doc 0 Emily Fox 05 4 7

What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? Part II: How much oes each topic like this wor? z i? Etruscan trae Topic Topic Topic Topic Topic Topic trae 0 7 Emily Fox 05 5 What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? Part II: How much oes each topic like this wor? z i? Etruscan trae Topic Topic Topic Emily Fox 05 6 8

What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? Part II: How much oes each topic like this wor? z i? Etruscan trae Topic Topic Topic N n i k + k + P k k m i trae,k + P V = m i,k + trae Emily Fox 05 7 Sample a New Topic Inicator z i? Etruscan trae Topic Topic Topic Emily Fox 05 8 9

Upate Counts z i? Etruscan trae Topic Topic Topic Topic Topic Topic Doc 0 Etruscan 0 5 50 0 4 0 0 0 0 trae 0 7... Emily Fox 05 9 Geometrically z i Etruscan trae Topic Topic Topic Emily Fox 05 0 0

Issues with Generic LDA Sampling Slow mixing rates à Nee many iteranons Each iteranon cycles through sampling topic assignments for all wors in all ocuments Moern approaches inclue: Large- scale LDA. For example, Mimno, Davi, Machew D. Hoffman an Davi M. Blei. "Sparse stochasnc inference for latent Dirichlet allocanon." InternaNonal Conference on Machine Learning, 0. Distribute LDA. For example, Ahme, Amr, et al. "Scalable inference in latent variable moels." Proceeings of the figh ACM internanonal conference on Web search an ata mining (0): - An many, many more! AlternaNve: VariaNonal methos instea of sampling Approximate posterior with an opnmize varianonal istribunon Emily Fox 05 Case Stuy 5: Mixe Membership Moeling VariaNonal Methos Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05

VariaNonal Methos Goal Recall task: Characterize the posterior Turn posterior inference into an opnmizanon task Introuce a tractable family of istribunons over parameters an latent variables Family is inexe by a set of free parameters Fin member of the family closest to: Emily Fox 05 VariaNonal Methos Cartoon Cartoon of goal: QuesNons: How o we measure closeness? If the posterior is intractable, how can we approximate something we o not have to begin with? Emily Fox 05 4

A Measure of Closeness Kullback- Leibler (KL) ivergence Measures istance between two istribunons p an q If p = q for all θ Otherwise, Emily Fox 05 5 Not symmetric A Measure of Closeness Z KL(p q), D(p q) = p etermines where the ifference is important: p(θ)=0 an q(θ) 0 p( ) log p( ) q( ) p(θ) 0 an q(θ)=0 Want Just as har as the original problem! Emily Fox 05 6

Reverse Divergence Divergence D(p q ) true istribution p efines support of iff. the correct irection will typically be intractable to compute Reverse ivergence D(q p ) approximate istribution efines support tens to give overconfient results will often be tractable Emily Fox 05 7 InterpretaNons of Minimizing Reverse KL D(q p) =E q apple log q p Similarity measure: Evience lower boun (ELBO) Emily Fox 05 8 4

InterpretaNons of Minimizing Reverse KL Evience lower boun (ELBO) log p(x) = D(q(z, ) p(z, x)) + L(q) L(q) Therefore, ELBO provies a lower boun on marginal likelihoo Maximizing ELBO is equivalent to minimizing KL Emily Fox 05 9 Mean Fiel L(q) =E q [log p(z,,x)] E q [log q(z, )] How o we choose a Q such that the following is tractable? Simplest case = mean fiel approximation Assume each parameter an latent variable is conitionally inepenent given the set of free parameters Original graph Naïve mean fiel Emily Fox 05 0 5

Naïve mean fiel ecomposition: q(z, ) =q( ) Mean Fiel L(q) =E q [log p(z,,x)] E q [log q(z, )] NY q(z i i= Uner this approximation, entropy term ecomposes as i ) Can (always) rewrite joint term as E q [log p(, z, x)] = E q [log p( z,x)] + E q [log p(z,x)] E q [log p(, z, x)] = E q [log p(z i z \i,,x)] + E q [log p(z \i,,x)] Emily Fox 05 Mean Fiel OpNmize Examine one free parameter, e.g., L(q) =E q [log p( z,x)] + E q [log p(z,x)] E q [log q( )] X E q [log q(z i i i )] Look at terms of ELBO just epening on L = Emily Fox 05 6

Mean Fiel OpNmize i Examine another free parameter, e.g., i L(q) =E q [log p(z i z \i,,x)] + E q [log p(z \i,,x) E q [log q( )] X E q [log q(z i i i )] Look at terms of ELBO just epening on i L i = This motivates using a coorinate ascent algorithm for optimization Iteratively optimize each free parameter holing all others fixe Emily Fox 05 Algorithm Outline Initialization: Ranomly select starting istribution q (0) E-Step: Given parameters, fin posterior of hien ata q z (t) (t ) = arg max ) q z L(q z,q M-Step: Given posterior istributions, fin likely parameters q (t) = arg max L(q z (t),q ) q Iteration: Alternate E-step & M-step until convergence Emily Fox 05 4 7

Case Stuy 5: Mixe Membership Moeling VariaNonal Inference for LDA Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 5 In LDA, our parameters are Mean Fiel for LDA = { }, { k } z = {z i } The variational istribution factorizes as zi wi N D k K The joint istribution factorizes as KY DY YN p(,, z, w) = p( k ) p( ) p(zi )p(wi zi, ) k= = i= Emily Fox 05 6 8

Mean Fiel for LDA KY DY q(,, z) = q( k k ) q( k= k= = Y q(zi N ) KY DY YN p(,, z, w) = p( k ) p( ) p(zi )p(wi zi, ) k= i= Examine the ELBO KX DX L(q) = E q [log p( k )] + E q [log p( )] + = i ) X XN E q [log p(zi )] + E q [log p(wi zi, )] = i= KX E q [log q( k k )] k= = i= DX E q [log q( = )] X XN E q [log q(zi = i= z i i N D k k K i )] Emily Fox 05 7 Mean Fiel for LDA Let s look at some of these terms z i i k k K X Eq [log p(z i )] N D E q [log q(z i i )] Other terms follow similarly Emily Fox 05 8 9

OpNmize via Coorinate Ascent Algorithm: z i i k k K N D Emily Fox 05 9 OpNmize via Coorinate Ascent Algorithm: z i i k k K N D Emily Fox 05 40 0

What you nee to know Latent Dirichlet allocanon (LDA) MoNvaNon an generanve moel specificanon Collapse Gibbs sampler VariaNonal methos Overall goal InterpretaNon in terms of minimizing (reverse) KL Mean fiel approximanon Emily Fox 05 4 Acknowlegements Thanks to Dave Blei, Davi Mimno, an Joran Boy- Graber for some material in this lecture relanng to LDA Emily Fox 05 4