Probabilistic Latent Semantic Analysis

Size: px

Start display at page:

Download "Probabilistic Latent Semantic Analysis"

Imogen Casey
5 years ago
Views:

1 Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011

2 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

3 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

4 LSA: A quick review LSA uses PCA to find a lower-dimensional topic space. terms documents x 1,1 x 1,n..... x m,1 x m,n = terms topics w 1,1 w 1,r..... w m,1 w m,r topic weights topics documents v 1,1 v 1,n..... v r,1 v r,n TERMS TOPICS DOCUMENTS Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

5 PCA as reconstruction error minimization For each data vector x n = (x n1,..., x nd ), and for M < d, find U = (u 1,..., u M ) that minimizes E M N x n ˆx n 2 n=1 where ˆx n = x + M y ni u i and x 1 N i=1 N x n giving: n=1 d N d d E M = [u T i (x n ˆx)] 2 = u T i Σu i = i=m+1 n=1 i=m+1 i=m+1 λ i Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

6 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

7 Probabilistic LSA The same document topic word idea in a probabilistic framework. Asymmetric generative aspect model: 1. Select a document d with probability P(d). 2. Select a latent class z with probability P(z d). 3. Generate a word w with probability P(w z). A mixture model Each document corresponds to a mixture of topics. Each topic corresponds to a mixture of words. d z w Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

8 Parametrization d The index of a document in the dataset. P(d) The frequency of the document in the corpus (uniform in practice). z The index of a topic. P(z d) Latent parameters that define the distribution of topics for a particular document. w The index of a word in the dictionary. P(w z) Latent parameters that define the distribution of words for a particular topics. d z w Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

9 Independence Remember independence equivalence classes in Bayesian networks? A C B A B C A C A B C A B C A B C Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

10 Symmetric aspect model Parametrization P(d, w) = z Z P(z)P(d z)p(w z) z Inference is BN inference. Learning is the same as for any BN with latent variables: EM. d w Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

11 plsa vs LSA plsa Assumes conditional independence given a lower-dimensional variable. Maximizes likelihood function. Parameters are multinomial distributions. EM is slow. EM converges to a local optimum. LSA Assumes linear transformation to a low-dimensional space. Minimizes Gaussian error. Parameters have no obvious interpretation. Linear operations are fast. SVD is exact (up to numerical precision). Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

12 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

13 Learning: standard EM E-step: M-step: P(z d, w) = P(z)P(d z)p(w z) z Z P(z )P(d z )P(w z ) P(w z) d D n(d, w)p(z d, w) P(d z) n(d, w)p(z d, w) w W P(z) n(d, w)p(z d, w) d D w W Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

14 Learning: tempered EM (TEM) New E-Step: P(z d, w) = P(z) [P(d z)p(w z)] β z Z P(z ) [P(d z )P(w z )] β Same as the standard E-Step when β = 1. Same as a posterior given uniform data when β = 0. Algorithm: 1. Hold out some data. 2. Set β Perform EM and decrease β at some rate (β ηβ with η < 1). 4. Stop if performance on held-out data doesn t increase, otherwise repeat previous step. 5. Perform some final iterations on full data. Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

15 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

16 Applications to information retrieval and link analysis Information Retrieval: plsi Index documents by their topic (z) distributions. Queries are computed by scoring each document with P(w d) (for words in the query). Can fold-in a new query as a hypothetical document P(z q) by updating that probability with EM. Link analysis: phits d are documents, c are citations (correspond to w in plsa). Want to group these into communities (z). Authoritativeness measures: P(c z) authority of c within the community z. P(z c) topic-specific authority. P(z c)p(c z) topic characteristic for community. Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, / 16

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications