A Practical Algorithm for Topic Modeling with Provable Guarantees

Size: px

Start display at page:

Download "A Practical Algorithm for Topic Modeling with Provable Guarantees"

Ethelbert Young
5 years ago
Views:

1 1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December 13, 2013

2 1 1 Background 2 The Proposed Algorithm Word-Word Co-occurences Matrix Construction Topic Recovery via Bayes Rule Anchor Words Search 3 Experimental Results

3 2 Outline 1 Background 2 The Proposed Algorithm Word-Word Co-occurences Matrix Construction Topic Recovery via Bayes Rule Anchor Words Search 3 Experimental Results

4 Topic Modeling Nonnegative Matrix Factorization problem [Arora, Ge, and Moitra 2012]: A: Unknown word-topic matrix with dimension V K. A i,k = p(w 1 = i z 1 = k) = p(w 2 = i z 2 = k) W : Unknown topic-document matrix with dimension K M. W i,k = p(z 1 = i d 1 = k) = p(z 2 = i d 2 = k) R: Unknown topic-topic covariance matrix with dimension K K R = E(W W T ) Q: Word-word co-occurrence matrix with dimension V V. Q = E[AW W T A T ] = ARA T Task : Given Q, recover A and R. 3

5 4 Previous Work [Arora, Ge, and Moitra 2012] presented a provably polynomial-time algorithm learning parameters of topic model provided that every topic contains at least one anchor word. The word-topic matrix A is p separable: For p > 0 and for each topic k, there is some word i such that A i,k p; A i,k = 0, for k k. The word here is called Anchor Word. Two steps: Selection step: Find anchor words. Recovery step: Reconstruct topic distribution.

6 Previous Work (Cont.) 5

7 6 Previous Work (Cont.) Limitations in [Arora, Ge, and Moitra 2012] Unstable and sensitive to noise: Use matrix inversion to recover A. Not sufficient for real data: Only use K rows of the word-word co-occurrence matrix Q, corresponding to the anchor words.

8 7 Outline 1 Background 2 The Proposed Algorithm Word-Word Co-occurences Matrix Construction Topic Recovery via Bayes Rule Anchor Words Search 3 Experimental Results

9 Framework 8

10 9 Outline 1 Background 2 The Proposed Algorithm Word-Word Co-occurences Matrix Construction Topic Recovery via Bayes Rule Anchor Words Search 3 Experimental Results

11 10 Construct Q Matrix Let H d R V where the i th element equals the number of times word i appearing in document d and n d be the length of document. H d Hd = nd (n d 1) Ĥ d = Diag(H d) n d (n d 1) Then, collect all Hd to form H and compute the sum of all Ĥ d to get Ĥ. We can prove E( H H T Ĥ) = Q and use H H T Ĥ to approximate Q and normalize its row to get Q.

12 11 Outline 1 Background 2 The Proposed Algorithm Word-Word Co-occurences Matrix Construction Topic Recovery via Bayes Rule Anchor Words Search 3 Experimental Results

13 12 Interpretations for matrix Q For an anchor word s k Q sk,j = p(z 1 = k w 1 = s k ) p(w 2 = j z 1 = k ) k (1) = p(w 2 = j z 1 = k) (2) For any other word i Q i,j = k = k p(z 1 = k w 1 = i) p(w 2 = j z 1 = k) (3) C i,k Qsk,j (4) Every other rows of Q lies in the Convex Hull of rows corresponding to the anchor words.

14 13 Matrices A and R recovery Bayes rule: p(w 1 = i z 1 = k) = p(z 1 = k w 1 = i) p(w 1 = i) i p(z 1 = k w 1 = i ) p(w 1 = i ) (5) where p(w 1 = i) = j p(w 1 = i, w 2 = j) = j Q i,j (6) How to compute C i = p(z 1 w 1 = i): Defining different loss function 1 KL Divergence: D( Q i k S C i,k Qsk ) 2 Quadratic Loss Q i k S C i,k Qsk 2 Matrix R recovery R = A Q(A ) T (7)

15 RecoverKL Algorithm 14

16 15 Outline 1 Background 2 The Proposed Algorithm Word-Word Co-occurences Matrix Construction Topic Recovery via Bayes Rule Anchor Words Search 3 Experimental Results

17 16 A Geometric Interpretation For infinite documents, the convex hull P of the rows in Q (a 1, a 2,..., a V ) will be a simplex whose vertices correspond to the anchor words. For finite documents, the rows in Q (d 1, d 2,..., d V ) are only approximated to their expectation. Find an approximation to the vertices of P Iteratively find the farthest point from the subspace spanned by the anchor words found so far.

18 Anchor Words Search Algorithm 17

19 18 Outline 1 Background 2 The Proposed Algorithm Word-Word Co-occurences Matrix Construction Topic Recovery via Bayes Rule Anchor Words Search 3 Experimental Results

20 19 Experiments Setup Compare the following four methods: Gibbs sampling [McCallum 2002] Recover [Arora, Ge, and Moitra 2012] RecoverKL RecoverL2 Two real-world data sets: New York Times articles 295k documents, vocabulary size 15k, mean document length 298. NIPS abstracts 1100 documents, vocabulary size 2500, mean document length 68.

21 20 Experiments Setup (Cont.) Semi-synthetic corpora 1 Train a model using MCMC for each real corpora 2 Generate new documents using parameter trained. Performance metrics Reconstruction error: l 1 distance between a learned matrix Â and the true matrix A. Held-out probability: probability of previously unseen documents under the learned model Coherence: Coherence(W ) = w 1,w 2 W log D(w 1, w 2 ) + ɛ D(w 2 ) where D(w) and D(w 1, w 2 ) are the number of documents with at least one instance of w, and of w 1 and w 2, respectively.

22 Semi-synthetic dataset 21

23 Semi-synthetic dataset 22

24 Semi-synthetic+Anchors dataset 23

25 24 Correlation Topics are correlated

26 Real Documents 25

27 26 References I Sanjeev Arora, Rong Ge, Yoni Halpern, David M. Mimno, Ankur Moitra, David Sontag, and Yichen Wu and Michael Zhu. A Practical Algorithm for Topic Modeling with Provable Guarantees. : Proceedings of the International Conference on Machine Learning (ICML). Vol. 28 (2). JMLR: W&CP, 2013, pp Sanjeev Arora, Rong Ge, and Ankur Moitra. Learning Topic Models Going beyond SVD. : Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on. IEEE. 2012, pp Andrew Kachites McCallum. Mallet: A machine learning for language toolkit. (2002). u r l:

Robust Spectral Inference for Joint Stochastic Matrix Factorization

Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University October 20, 2016 K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization