LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe Membership Moels Now: Document may belong to mulnple clusters EDUCATION FINANCE TECHNOLOGY Emily Fox 05

5/8/5 Latent Dirichlet AllocaNon (LDA) Emily Fox 05 Latent Dirichlet AllocaNon (LDA) Latent Dirichlet allocation (LDA) Topics Documents Topic proportions an assignments But we only observe the ocuments; the other structure is hien. We compute the posterior p.topics, proportions, j ocuments/ Emily Fassignments ox 05 4

LDA GeneraNve Moel n ObservaNons: w,...,w N n Associate topics: z,...,z N n Parameters: = {{ }, { k }} n GeneraNve moel: Emily Fox 05 5 Collapse LDA Sampling Sample topic inicators for each wor Algorithm: z i k K p(z i = k z \i, { },, ) wi N D / p(zi = k {zj,j 6= i}, )p(w i {w j c :: zj c j c =, k, (j, (j, c) c) 6= 6= (i, (i, )}, )}, ) ) Emily Fox 05 6

Select a Document Etruscan trae Emily Fox 05 7 Ranomly Assign Topics z i Etruscan trae Emily Fox 05 8 4

Ranomly Assign Topics z i Etruscan trae Etruscan trae Etruscan trae Etruscan Etruscan trae trae Etruscan Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan Etruscan trae trae trae trae Etruscan Etruscan trae ship trae trae Etruscan Etruscan trae trae ship Etruscan trae ship trae Etruscan trae Italy ship trae Emily Fox 05 9 Maintain Local StaNsNcs z i Etruscan trae Doc Topic Topic Topic Emily Fox 05 0 5

Maintain Global StaNsNcs z i Etruscan trae Topic Topic Topic Etruscan 0 5 50 0 4 0 0 0 0 trae 0 8... Topic Topic Topic Doc Total counts from all ocs Emily Fox 05 Resample Assignments z i Etruscan trae Topic Topic Topic Topic Topic Topic Doc Etruscan 0 5 50 0 4 0 0 0 0 trae 0 8... Emily Fox 05 6

What is the coninonal istribunon for this topic? z i? Etruscan trae Emily Fox 05 What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? z i? Etruscan trae Topic Topic Topic Topic Topic Topic Doc 0 Emily Fox 05 4 7

What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? Part II: How much oes each topic like this wor? z i? Etruscan trae Topic Topic Topic Topic Topic Topic trae 0 7 Emily Fox 05 5 What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? Part II: How much oes each topic like this wor? z i? Etruscan trae Topic Topic Topic Emily Fox 05 6 8

What is the coninonal istribunon for this topic? Part I: How much oes this ocument like each topic? Part II: How much oes each topic like this wor? z i? Etruscan trae Topic Topic Topic N n i k + k + P k k m i trae,k + P V = m i,k + trae Emily Fox 05 7 Sample a New Topic Inicator z i? Etruscan trae Topic Topic Topic Emily Fox 05 8 9

Upate Counts z i? Etruscan trae Topic Topic Topic Topic Topic Topic Doc 0 Etruscan 0 5 50 0 4 0 0 0 0 trae 0 7... Emily Fox 05 9 Geometrically z i Etruscan trae Topic Topic Topic Emily Fox 05 0 0

Issues with Generic LDA Sampling Slow mixing rates à Nee many iteranons Each iteranon cycles through sampling topic assignments for all wors in all ocuments Moern approaches inclue: Large- scale LDA. For example, Mimno, Davi, Machew D. Hoffman an Davi M. Blei. "Sparse stochasnc inference for latent Dirichlet allocanon." InternaNonal Conference on Machine Learning, 0. Distribute LDA. For example, Ahme, Amr, et al. "Scalable inference in latent variable moels." Proceeings of the figh ACM internanonal conference on Web search an ata mining (0): - An many, many more! AlternaNve: VariaNonal methos instea of sampling Approximate posterior with an opnmize varianonal istribunon Emily Fox 05 Case Stuy 5: Mixe Membership Moeling VariaNonal Methos Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05

VariaNonal Methos Goal Recall task: Characterize the posterior Turn posterior inference into an opnmizanon task Introuce a tractable family of istribunons over parameters an latent variables Family is inexe by a set of free parameters Fin member of the family closest to: Emily Fox 05 VariaNonal Methos Cartoon Cartoon of goal: QuesNons: How o we measure closeness? If the posterior is intractable, how can we approximate something we o not have to begin with? Emily Fox 05 4

A Measure of Closeness Kullback- Leibler (KL) ivergence Measures istance between two istribunons p an q If p = q for all θ Otherwise, Emily Fox 05 5 Not symmetric A Measure of Closeness Z KL(p q), D(p q) = p etermines where the ifference is important: p(θ)=0 an q(θ) 0 p( ) log p( ) q( ) p(θ) 0 an q(θ)=0 Want Just as har as the original problem! Emily Fox 05 6

Reverse Divergence Divergence D(p q ) true istribution p efines support of iff. the correct irection will typically be intractable to compute Reverse ivergence D(q p ) approximate istribution efines support tens to give overconfient results will often be tractable Emily Fox 05 7 InterpretaNons of Minimizing Reverse KL D(q p) =E q apple log q p Similarity measure: Evience lower boun (ELBO) Emily Fox 05 8 4

InterpretaNons of Minimizing Reverse KL Evience lower boun (ELBO) log p(x) = D(q(z, ) p(z, x)) + L(q) L(q) Therefore, ELBO provies a lower boun on marginal likelihoo Maximizing ELBO is equivalent to minimizing KL Emily Fox 05 9 Mean Fiel L(q) =E q [log p(z,,x)] E q [log q(z, )] How o we choose a Q such that the following is tractable? Simplest case = mean fiel approximation Assume each parameter an latent variable is conitionally inepenent given the set of free parameters Original graph Naïve mean fiel Emily Fox 05 0 5

Naïve mean fiel ecomposition: q(z, ) =q( ) Mean Fiel L(q) =E q [log p(z,,x)] E q [log q(z, )] NY q(z i i= Uner this approximation, entropy term ecomposes as i ) Can (always) rewrite joint term as E q [log p(, z, x)] = E q [log p( z,x)] + E q [log p(z,x)] E q [log p(, z, x)] = E q [log p(z i z \i,,x)] + E q [log p(z \i,,x)] Emily Fox 05 Mean Fiel OpNmize Examine one free parameter, e.g., L(q) =E q [log p( z,x)] + E q [log p(z,x)] E q [log q( )] X E q [log q(z i i i )] Look at terms of ELBO just epening on L = Emily Fox 05 6

Mean Fiel OpNmize i Examine another free parameter, e.g., i L(q) =E q [log p(z i z \i,,x)] + E q [log p(z \i,,x) E q [log q( )] X E q [log q(z i i i )] Look at terms of ELBO just epening on i L i = This motivates using a coorinate ascent algorithm for optimization Iteratively optimize each free parameter holing all others fixe Emily Fox 05 Algorithm Outline Initialization: Ranomly select starting istribution q (0) E-Step: Given parameters, fin posterior of hien ata q z (t) (t ) = arg max ) q z L(q z,q M-Step: Given posterior istributions, fin likely parameters q (t) = arg max L(q z (t),q ) q Iteration: Alternate E-step & M-step until convergence Emily Fox 05 4 7

Case Stuy 5: Mixe Membership Moeling VariaNonal Inference for LDA Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 5 In LDA, our parameters are Mean Fiel for LDA = { }, { k } z = {z i } The variational istribution factorizes as zi wi N D k K The joint istribution factorizes as KY DY YN p(,, z, w) = p( k ) p( ) p(zi )p(wi zi, ) k= = i= Emily Fox 05 6 8

Mean Fiel for LDA KY DY q(,, z) = q( k k ) q( k= k= = Y q(zi N ) KY DY YN p(,, z, w) = p( k ) p( ) p(zi )p(wi zi, ) k= i= Examine the ELBO KX DX L(q) = E q [log p( k )] + E q [log p( )] + = i ) X XN E q [log p(zi )] + E q [log p(wi zi, )] = i= KX E q [log q( k k )] k= = i= DX E q [log q( = )] X XN E q [log q(zi = i= z i i N D k k K i )] Emily Fox 05 7 Mean Fiel for LDA Let s look at some of these terms z i i k k K X Eq [log p(z i )] N D E q [log q(z i i )] Other terms follow similarly Emily Fox 05 8 9

OpNmize via Coorinate Ascent Algorithm: z i i k k K N D Emily Fox 05 9 OpNmize via Coorinate Ascent Algorithm: z i i k k K N D Emily Fox 05 40 0

What you nee to know Latent Dirichlet allocanon (LDA) MoNvaNon an generanve moel specificanon Collapse Gibbs sampler VariaNonal methos Overall goal InterpretaNon in terms of minimizing (reverse) KL Mean fiel approximanon Emily Fox 05 4 Acknowlegements Thanks to Dave Blei, Davi Mimno, an Joran Boy- Graber for some material in this lecture relanng to LDA Emily Fox 05 4