Probabilistic Latent Semantic Analysis

Similar documents
PROBABILISTIC LATENT SEMANTIC ANALYSIS

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Document and Topic Models: plsa and LDA

Probabilistic Latent Semantic Analysis

Knowledge Discovery and Data Mining 1 (VO) ( )

CS145: INTRODUCTION TO DATA MINING

Notes on Latent Semantic Analysis

Modeling Environment

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Generative Clustering, Topic Modeling, & Bayesian Inference

Data Mining and Matrices

Probabilistic Graphical Models

Dirichlet Enhanced Latent Semantic Analysis

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Expectation maximization tutorial

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Note for plsa and LDA-Version 1.1

STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

CS6220: DATA MINING TECHNIQUES

CS 572: Information Retrieval

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Expectation Maximization

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Bayesian Machine Learning

Latent Dirichlet Allocation Introduction/Overview

The Expectation-Maximization Algorithm

Lecture 8: Graphical models for Text

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Latent Dirichlet Allocation

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Bayesian Machine Learning

Modeling User Rating Profiles For Collaborative Filtering

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Advanced Introduction to Machine Learning

Inverse regression approach to (robust) non-linear high-to-low dimensional mapping

Probabilistic Latent Semantic Analysis

Non-Parametric Bayes

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Study Notes on the Latent Dirichlet Allocation

Recent Advances in Bayesian Inference Techniques

13 : Variational Inference: Loopy Belief Propagation and Mean Field

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Language Information Processing, Advanced. Topic Models

Lecture 8: Clustering & Mixture Models

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

CS Lecture 18. Topic Models and LDA

ECE 5984: Introduction to Machine Learning

Graphical Models for Collaborative Filtering

Bayesian learning of sparse factor loadings

Gaussian Mixture Models, Expectation Maximization

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Latent variable models for discrete data

PMR Learning as Inference

STA 4273H: Statistical Machine Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Networks BY: MOHAMAD ALSABBAGH

EM (cont.) November 26 th, Carlos Guestrin 1

Dimension Reduction (PCA, ICA, CCA, FLD,

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Linear Factor Models. Sargur N. Srihari

Latent Dirichlet Allocation (LDA)

EM-algorithm for motif discovery

Text Mining for Economics and Finance Unsupervised Learning

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

Latent Variable Models and Expectation Maximization

Introduction to Probabilistic Machine Learning

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Discriminative Topic Modeling based on Manifold Learning

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Machine Learning - MT & 14. PCA and MDS

CS47300: Web Information Search and Management

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

topic modeling hanna m. wallach

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

CSC 411: Lecture 04: Logistic Regression

EM & Variational Bayes

Lecture 19, November 19, 2012

Latent Dirichlet Alloca/on

STA 414/2104: Machine Learning

Dimensionality Reduction and Principle Components Analysis

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Latent Variable Models and Expectation Maximization

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

Probabilistic Graphical Models for Image Analysis - Lecture 4

Sparse Stochastic Inference for Latent Dirichlet Allocation

Gentle Introduction to Infinite Gaussian Mixture Modeling

Transcription:

Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011

Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 2 / 16

Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 3 / 16

LSA: A quick review LSA uses PCA to find a lower-dimensional topic space. terms documents x 1,1 x 1,n..... x m,1 x m,n = terms topics w 1,1 w 1,r..... w m,1 w m,r 0.. 0 topic weights topics documents v 1,1 v 1,n..... v r,1 v r,n TERMS TOPICS DOCUMENTS Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 4 / 16

PCA as reconstruction error minimization For each data vector x n = (x n1,..., x nd ), and for M < d, find U = (u 1,..., u M ) that minimizes E M N x n ˆx n 2 n=1 where ˆx n = x + M y ni u i and x 1 N i=1 N x n giving: n=1 d N d d E M = [u T i (x n ˆx)] 2 = u T i Σu i = i=m+1 n=1 i=m+1 i=m+1 λ i Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 5 / 16

Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 6 / 16

Probabilistic LSA The same document topic word idea in a probabilistic framework. Asymmetric generative aspect model: 1. Select a document d with probability P(d). 2. Select a latent class z with probability P(z d). 3. Generate a word w with probability P(w z). A mixture model Each document corresponds to a mixture of topics. Each topic corresponds to a mixture of words. d z w Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 7 / 16

Parametrization d The index of a document in the dataset. P(d) The frequency of the document in the corpus (uniform in practice). z The index of a topic. P(z d) Latent parameters that define the distribution of topics for a particular document. w The index of a word in the dictionary. P(w z) Latent parameters that define the distribution of words for a particular topics. d z w Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 8 / 16

Independence Remember independence equivalence classes in Bayesian networks? A C B A B C A C A B C A B C A B C Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 9 / 16

Symmetric aspect model Parametrization P(d, w) = z Z P(z)P(d z)p(w z) z Inference is BN inference. Learning is the same as for any BN with latent variables: EM. d w Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 10 / 16

plsa vs LSA plsa Assumes conditional independence given a lower-dimensional variable. Maximizes likelihood function. Parameters are multinomial distributions. EM is slow. EM converges to a local optimum. LSA Assumes linear transformation to a low-dimensional space. Minimizes Gaussian error. Parameters have no obvious interpretation. Linear operations are fast. SVD is exact (up to numerical precision). Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 11 / 16

Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 12 / 16

Learning: standard EM E-step: M-step: P(z d, w) = P(z)P(d z)p(w z) z Z P(z )P(d z )P(w z ) P(w z) d D n(d, w)p(z d, w) P(d z) n(d, w)p(z d, w) w W P(z) n(d, w)p(z d, w) d D w W Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 13 / 16

Learning: tempered EM (TEM) New E-Step: P(z d, w) = P(z) [P(d z)p(w z)] β z Z P(z ) [P(d z )P(w z )] β Same as the standard E-Step when β = 1. Same as a posterior given uniform data when β = 0. Algorithm: 1. Hold out some data. 2. Set β 1. 3. Perform EM and decrease β at some rate (β ηβ with η < 1). 4. Stop if performance on held-out data doesn t increase, otherwise repeat previous step. 5. Perform some final iterations on full data. Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 14 / 16

Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa) The plsa model Learning EM and tempered EM Applications plsi and phits Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 15 / 16

Applications to information retrieval and link analysis Information Retrieval: plsi Index documents by their topic (z) distributions. Queries are computed by scoring each document with P(w d) (for words in the query). Can fold-in a new query as a hypothetical document P(z q) by updating that probability with EM. Link analysis: phits d are documents, c are citations (correspond to w in plsa). Want to group these into communities (z). Authoritativeness measures: P(c z) authority of c within the community z. P(z c) topic-specific authority. P(z c)p(c z) topic characteristic for community. Yuriy Sverchkov (ISP) Probabilistic Latent Semantic Analysis October 6, 2011 16 / 16