Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Similar documents
Probabilistic Latent Semantic Analysis

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Notes on Latent Semantic Analysis

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Probabilistic Latent Semantic Analysis

CS 572: Information Retrieval

Document and Topic Models: plsa and LDA

Latent Semantic Analysis. Hongning Wang

Insights from Viewing Ranked Retrieval as Rank Aggregation

Nonnegative Matrix Factorization

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Data Mining and Matrices

Non-Negative Matrix Factorization

Knowledge Discovery and Data Mining 1 (VO) ( )

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Nonnegative Matrix Factorization. Data Mining

CS47300: Web Information Search and Management

Latent Semantic Analysis. Hongning Wang

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

* Matrix Factorization and Recommendation Systems

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Manning & Schuetze, FSNLP, (c)

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs

Expectation Maximization

Collaborative Filtering. Radek Pelánek

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Information Retrieval

Variable Latent Semantic Indexing

Non-negative matrix factorization with fixed row and column sums

Collaborative Filtering: A Machine Learning Perspective

A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data

Text Mining for Economics and Finance Unsupervised Learning

Language Information Processing, Advanced. Topic Models

Linear Algebra Background

Probabilistic Latent Semantic Analysis

Probabilistic Latent Variable Models as Non-Negative Factorizations

Modeling User Rating Profiles For Collaborative Filtering

Intelligent Data Analysis Lecture Notes on Document Mining

Introduction to Information Retrieval

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Data Mining Techniques

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Collaborative topic models: motivations cont

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Machine learning for pervasive systems Classification in high-dimensional spaces

Document Similarity in Information Retrieval

Scaling Neighbourhood Methods

CS281 Section 4: Factor Analysis and PCA

a Short Introduction

Dimensionality Reduction and Principle Components Analysis

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

Latent variable models for discrete data

PV211: Introduction to Information Retrieval

Collaborative Topic Modeling for Recommending Scientific Articles

A few applications of the SVD

Problems. Looks for literal term matches. Problems:

Matrices, Vector Spaces, and Information Retrieval

Lecture 8: Graphical models for Text

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes

A Note on the Expectation-Maximization (EM) Algorithm

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

Lecture 7: Con3nuous Latent Variable Models

Structured matrix factorizations. Example: Eigenfaces

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Fast LSI-based techniques for query expansion in text retrieval systems

Manning & Schuetze, FSNLP (c) 1999,2000

Using Matrix Decompositions in Formal Concept Analysis

Lecture 8: Clustering & Mixture Models

CS145: INTRODUCTION TO DATA MINING

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Matrix and Tensor Factorization from a Machine Learning Perspective

ECE 5984: Introduction to Machine Learning

Embeddings Learned By Matrix Factorization

Learning representations

Data Mining Techniques

Singular Value Decompsition

Latent semantic indexing

Nonnegative Matrix Factorization with Orthogonality Constraints

9 Searching the Internet with the SVD

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

Matrix decompositions and latent semantic indexing

Unsupervised Learning: Linear Dimension Reduction

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Unsupervised Learning

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Latent Dirichlet Allocation Introduction/Overview

Andriy Mnih and Ruslan Salakhutdinov

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Transcription:

Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang

Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing

Vector Space Model A document: a vector in term space Vector computation: TF / TFIDF Similarity measure: angular cosine between query and documents. cos θ = i q* d Document vectors make up a term-document matrix. q * i d i

9 documents Example Terms in bold are in the dictionary.

Term-Document Matrix (TF)

Weakness of VSM Noise in term-document matrix Synonyms E.g. car & automobile. Decrease recall Polysems E.g. saturn. Decrease precision

Latent Semantic Indexing (LSI) A m*n : term-document matrix Singular Value Decomposition (SVD) T A = UWV Latent Semantic Indexing (LSI) A = U k k W k V T k

What s really happening? Transformation of space Original: Term space Basis B 1 = {e 1, e 2,, e m }, m is the term number in dictionary. New: Latent semantic space Basis B 2 = {u 1, u 2,, u k }, k is the truncated dimension of document vector.

Thinking with LSI LSI aims to find Meaning behind words Topics in documents Difference between topics and words Words observable Topics latent Topic space Latent semantic space Each basis vector u i represents a topic

Evaluation of LSI Strength Filter out noise(synonyms, polysems): dimension reduction considers only essential components of term-document matrix. Reduces storage Weakness Interpretation impossible: mixed signs Orthogonal restriction on basis vector Good truncation point k is hard to determine.

Non-negative Matrix Factorization Unlike SVD, we do matrix factorization as A k Topic space = W H, W, H 0 k k k k Dimension: k Basis b 3 = {w 1, w 2,, w k }

Properties of NMF No orthogonal restriction on basis vector Easy interpretation Elements of W and H are all non-negative. W ij reflects how much basis vector w j is related to term t i H ij reflects how much document d j points to the direction of basis vector w i.

Computation of NMF [ W, H] = min A WH F 2, s.t. W, H 0 Algorithms Lee and Seung 2000 Berry etc. 2004

Evaluation of NMF Strength Great interpretability Improved Performance for document clustering comparing to LSI. Weakness Factorization is not unique Local minimum problem

plsi: a probabilistic view of LSI Why Latent Concepts? Sparseness problem: terms not occurring in a document get zero probability Concept expression probabilities are estimated based on all documents that are dealing with a concept No prior knowledge about concepts required Dimension reduction

PLSA: Graphical model representation (a) (b) Asymmetric decomposition Symmetric decomposition

plsa via Likelihood Maximization Log-Likelihood L( D, W ) = l = n( d, w)log( d, w d, w ( z P( w z) P( z z P( w d)) n( d, w) z) P( z d)) Goal : maximize the log-likelihood with the constraints l ) = 1, p(z d j ) = w p(w z 1 z

KL Projection KL divergence is a measure of difference between the empirical data distribution and the model + = = d w w d z d P d w P d n w d n d n d z P z w P w d n l )] ( log ) ( log ) ( ), ( )[ ( )) ( ) ( )log(, (,

PLSA via EM

examples Model based on : 1568 documents on Clustering, Z=128

Performance comparison of a retrieval system: Three models, four document collection.

PLSA Mixture Decomposition vs. LSA/SVD

PLSA vs. LSA Objective function: Frobenius norm vs. likelihood Non-negative Normalized There is no obvious interpretation of the directions in the LSA latent space; Multinomial word distribution in PLSA PLSA utilized statistical theory to determine the number of latent space dimension. LSA based on ad hoc heuristics

Relation between PLSA and NMP Any (local) maximum likelihood solution of PLSA is a solution of NMF with KL divergence KL divergence is a measure of the difference between the empirical distribution and the model Implications Any problem which can be formulated with NMF, may be efficiently solved by PLSA

SVD in Collaborative Filtering Filling in missing values Filling matrix using average value EM algorithms R k M r11 r1 M w11 w1 k v11 v1 M = r r w w v v N1 NM N1 Nk k1 NM

Weighted -SVD R R

NMF in Collaborative Filtering Objective: Err( P, Q) = ( r ( u, i) κ ui p T u q i ) 2 Only deal with known values in R Can deal with large dataset

References PLSA: T. Hofmann, Probabilistic Latent Semantic Analysis, Uncertainty in AI,1999 T. Hofmann, Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning Journal, 2000 EM for PCA/SVD: S. Roweis, EM Algorithm for PCA and SPCA, 1997 S. Zhang, Using Singular Value Decomposition Approximation for Collaborative Filtering, CEC05, 2005

References(2) NMF: I. Dhillon, Generalized Nonnegative Matrix Approximations with Bregman Divergences, NIPS2005 D. Lee, Algorithms for Non-negative Matrix Factorization PLSA vs NMF: E. Gaussier, Relation between PLSA and NMF and Implications, SIGIR 05 C. Ding, On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing, 2008

References(3) Advanced topics: A. Singh, Relational Learning via Collective Matrix Factorization, KDD 08 R. Bell, Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems, KDD07 Y. Koren, Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD08 C. Lippert, Relation Prediction in Multi-Relational Domains using Matrix Factorization, NIPS08