Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Size: px

Start display at page:

Download "Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling"

Camron Ball
5 years ago
Views:

1 Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example Collapse MoG Sampling Dir(,..., ) z i {µ, } F( ) x i z i N(x i ; µ z i, z i) Collapse sampler z i x i N Emily Fox 0

2 Example Collapse MoG Sampling Dir(,..., ) zi {µ, } F ( ) xi z i N (xi ; µzi, zi ) n n Derivation zi xi N Important facts: ( p(z:n ) = Q P Q ) (n + ) P ( ) ( n + ) (m + ) =m (m) Emily Fox 0 Latent Dirichlet Allocation (LDA) Emily Fox 0 4

3 LDA Generative Moel Observations: w,...,w N Associate topics: z,...,z N Parameters: = {{ }, { }} Generative moel: Emily Fox 0 5 LDA Generative Moel z i N D Y DY p( ) = p( ) p( ) = =! YN p(zi )p(wi zi, ) i= Emily Fox 0 6

4 Collapse LDA Sampling Marginalize parameters Document-specific topic weights Corpus-wie topic-specific wor istributions Sample topic inicators for each wor Derivation: zi wi N D p(z:n ) = (P Q Q ) (n + ) ( ) ( P n + ) p(z ) = DY p(z:n ) = p({wi zi = }, )= Q (P ) ( ) Y p(w z, )= p({wi zi = }, ) = Q (v + ) ( P v + ) Emily Fox 0 7 Collapse LDA Sampling Marginalize parameters Document-specific topic weights Corpus-wie topic-specific wor istributions Sample topic inicators for each wor Algorithm: zi wi N D Emily Fox 0 8 4

5 Sample Document Etruscan trae Emily Fox 0 9 Ranomly Assign Topics z i Etruscan trae Emily Fox 0 0 5

6 Ranomly Assign Topics z i Etruscan trae Etruscan trae Etruscan trae Etruscan Etruscan trae trae Etruscan Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan Etruscan trae trae trae trae Etruscan Etruscan trae ship trae trae Etruscan Etruscan trae trae ship Etruscan trae ship trae Etruscan trae Italy ship trae Emily Fox 0 Maintain Global Statistics z i Etruscan trae Total counts from all ocs Etruscan trae Emily Fox 0 6

7 Resample Assignments z i Etruscan trae Etruscan trae Emily Fox 0 What is the conitional istribution for this topic? z i? Etruscan trae Emily Fox 0 4 7

8 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? z i? Etruscan trae Topic Topic Topic Emily Fox 0 5 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? Part II: How much oes each topic lie this wor? z i? Etruscan trae Topic Topic Topic trae 0 7 Emily Fox 0 6 8

9 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? Part II: How much oes each topic lie this wor? z i? Etruscan trae Topic Topic Topic Emily Fox 0 7 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? Part II: How much oes each topic lie this wor? z i? Etruscan trae Topic Topic Topic n + P j= n j + j vtrae P + V j= v j + Emily Fox 0 8 j 9

10 Sample a New Topic Inicator z i? Etruscan trae Topic Topic Topic Emily Fox 0 9 Upate Counts z i? Etruscan trae Etruscan trae Emily Fox 0 0 0

11 Geometrically z i Etruscan trae Topic Topic Topic Emily Fox 0 Issues with Generic LDA Sampling Slow mixing rates à Nee many iterations Each iteration cycles through sampling topic assignments for all wors in all ocuments Moern approaches: Large-scale LDA. For example, Mimno, Davi, Matthew D. Hoffman an Davi M. Blei. "Sparse stochastic inference for latent Dirichlet allocation." International Conference on Machine Learning, 0. Distribute LDA. For example, Ahme, Amr, et al. "Scalable inference in latent variable moels." Proceeings of the fifth ACM international conference on Web search an ata mining (0): - Alternative: Variational methos instea of sampling Approximate posterior with an optimize variational istribution Emily Fox 0

12 Variational Methos Recall tas: Characterize the posterior Turn posterior inference into an optimization tas Introuce a tractable family of istributions over parameters an latent variables Family is inexe by a set of free parameters Fin member of the family closest to: Questions: How o we measure closeness? If the posterior is intractable, how can we approximate something we o not have to begin with? Emily Fox 0 A Measure of Closeness ullbac-leibler (L) ivergence Measures istance between two istributions p an q Not symmetric p etermines where the ifference is important: p(x)=0 an q(x) 0 p(x) 0 an q(x)=0 Want Just as har as the original problem! Emily Fox 0 4

13 Reverse Divergence Divergence D(q p ) true istribution p efines support of iff. the correct irection will be intractable to compute Reverse ivergence D(q p ) approximate istribution efines support tens to give overconfient results will be tractable Emily Fox 0 5 Interpretations of Minimizing Reverse L Similarity measure: Evience lower boun (ELBO) Therefore, minimizing L is equivalent to maximizing a lower boun on the marginal lielihoo: Max L(q) = min D(q p) = max lower boun of log p(x) Emily Fox 0 6

14 Mean Fiel How o we choose a Q such that the following is tractable? Simplest case = mean fiel approximation Assume each parameter an latent variable is conitionally inepenent given the set of free parameters Then, entropy term ecomposes as Emily Fox 0 7 Mean Fiel Examine one free parameter, e.g., Can rewrite joint as E q [log p(, z, x)] = E q [log p( z,x)] + E q [log p(z,x)] Loo at terms of ELBO just epening on L = Liewise, L n = This motivates using a coorinate ascent algorithm for optimization Iteratively optimize each free parameter holing all others fixe Emily Fox 0 8 4

15 Mean Fiel for LDA In LDA, our parameters are = { }, { } z = {z i } z i N D The variational istribution factorizes as The joint istribution factorizes as Y DY YN p(,, z, w) = p( ) p( ) p(zi )p(wi zi, ) = = i= Emily Fox 0 9 Mean Fiel for LDA Y DY q(,, z) = q( ) q( = = Y q(zi N ) Y DY YN p(,, z, w) = p( ) p( ) p(zi )p(wi zi, ) = = i= i= i ) z i i N D Examine the ELBO X DX L(q) = E q [log p( )] + E q [log p( )] = + = X XN E q [log p(zi )] + E q [log p(wi zi, )] = i= X E q [log q( )] DX X XN E q [log q( )] E q [log q(zi = = = i= i )] Emily Fox 0 0 5

16 Mean Fiel for LDA Let s loo at some of these terms z i i X Eq [log p(z i )] N D E q [log q(z i i )] Other terms follow similarly Emily Fox 0 Optimize via Coorinate Ascent Algorithm: z i i N D Emily Fox 0 6

17 Optimize via Coorinate Ascent Algorithm: z i i N D Emily Fox 0 Alternative Optimization Schemes Inefficient: Start from ranomly initialize (topics) Analyze whole corpus before upating again If streaming ata scenario, can t compute even one iteration! Din t have to o coor. ascent. Coul have use graient ascent. Emily Fox 0 4 7

18 Alternative Optimization Schemes Recall stochastic graient ascent: Assume M = Unbiase, but noisy Here, DX L = E q [log p( )] E q [log q( )] + E q [log p( )] E q [log q( )] DX = + E q [log p(z,x, )] E q [log q(z )] = L t = E q [log p( )] E q [log q( )]+D E q [log p( t )] E[log q( t )] +D E q [log p(z t,x t t, )] E q [log q(z t )] Emily Fox 0 5 Stochastic Variational Inference for LDA Initialize (0) ranomly. Repeat (inefinitely): Sample a ocument uniformly from the ata set. For all, initialize = Repeat until converge For i=,,n i / exp{e[log ]+E[log,w i ]} XN Set = + i= i Tae a stochastic graient step (t) = (t ) + t r L Emily Fox 0 6 8

19 Acnowlegements Thans to Dave Blei, Davi Mimno, an Joran Boy-Graber for some material in this lecture relating to LDA Emily Fox 0 7 9

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe