Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Similar documents
Latent Dirichlet Allocation (LDA)

Probabilistic Latent Semantic Analysis

Language Information Processing, Advanced. Topic Models

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Document and Topic Models: plsa and LDA

Latent Dirichlet Allocation Introduction/Overview

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme

CS 572: Information Retrieval

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

topic modeling hanna m. wallach

Lecture 19, November 19, 2012

Latent Dirichlet Alloca/on

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization

Latent Dirichlet Allocation

Notes on Latent Semantic Analysis

Latent Semantic Analysis. Hongning Wang

Probabilistic Latent Semantic Analysis

Knowledge Discovery and Data Mining 1 (VO) ( )

Latent Semantic Analysis. Hongning Wang

Study Notes on the Latent Dirichlet Allocation

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Latent Dirichlet Allocation (LDA)

CS145: INTRODUCTION TO DATA MINING

Mixtures of Multinomials

Generative Clustering, Topic Modeling, & Bayesian Inference

Probabilistic Latent Semantic Analysis

Modeling User Rating Profiles For Collaborative Filtering

Topic Modelling and Latent Dirichlet Allocation

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

CS Lecture 18. Topic Models and LDA

Introduction To Machine Learning

Information Retrieval

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

CS47300: Web Information Search and Management

Probabilistic Dyadic Data Analysis with Local and Global Consistency

Linear Algebra Background

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Text Mining for Economics and Finance Latent Dirichlet Allocation

Dirichlet Enhanced Latent Semantic Analysis

AUTOMATIC DETECTION OF WORDS NOT SIGNIFICANT TO TOPIC CLASSIFICATION IN LATENT DIRICHLET ALLOCATION

Lecture 13 : Variational Inference: Mean Field Approximation

Recent Advances in Bayesian Inference Techniques

Non-Parametric Bayes

16 : Approximate Inference: Markov Chain Monte Carlo

Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Non-parametric Clustering with Dirichlet Processes

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Topic Models and Applications to Short Documents

Latent Topic Models for Hypertext

Manning & Schuetze, FSNLP (c) 1999,2000

Embeddings Learned By Matrix Factorization

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Web Search and Text Mining. Lecture 16: Topics and Communities

Latent variable models for discrete data

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Text Mining for Economics and Finance Unsupervised Learning

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

Maximum Likelihood Estimation of Dirichlet Distribution Parameters

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

Matrix decompositions and latent semantic indexing

Bayesian Nonparametrics for Speech and Signal Processing

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Modeling Environment

Introduction to Bayesian inference

Applied Linear Algebra in Geoscience Using MATLAB

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Lecture 8: Graphical models for Text

Latent Dirichlet Allocation Based Multi-Document Summarization

Estimating Likelihoods for Topic Models

Review of Some Concepts from Linear Algebra: Part 2

Data Mining Techniques

Appendix A. Proof to Theorem 1

13: Variational inference II

Evaluation Methods for Topic Models

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Topic Modeling: Beyond Bag-of-Words

RECSM Summer School: Facebook + Topic Models. github.com/pablobarbera/big-data-upf

Kernel Density Topic Models: Visual Topics Without Visual Words

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

AN INTRODUCTION TO TOPIC MODELS

HTM: A Topic Model for Hypertexts

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

LSI, plsi, LDA and inference methods

Benchmarking and Improving Recovery of Number of Topics in Latent Dirichlet Allocation Models

CS281 Section 4: Factor Analysis and PCA

Latent Dirichlet Conditional Naive-Bayes Models

Lecture 13: More uses of Language Models

STA 414/2104: Machine Learning

LDA with Amortized Inference

Transcription:

Information retrieval LSI, plsi and LDA Jian-Yun Nie

Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and λ a scalar (eigenvalue) E.g. = = 1 1 4 1 1 3 1 1 3 1 1 2 1 1 3 1 1 3

Why using eigenvector? Linear algebra: A x = b Eigenvector: A x = λ x

Why using eigenvector Eigenvectors are orthogonal (seen as being independent) Eigenvector represents the basis of the original vector A Useful for Solving linear equations Determine the natural frequency of bridge

Latent Semantic Indexing (LSI)

Latent Semantic Analysis

LSI

Classic LSI Example (Deerwester)

LSI, SVD, & Eigenvectors SVD decomposes: Term x Document matrix X as X=UΣV T Where U,V left and right singular vector matrices, and Σ is a diagonal matrix of singular values Corresponds to eigenvector-eigenvalue decompostion: Y=VLV T Where V is orthonormal and L is diagonal U: matrix of eigenvectors of Y=XX T V: matrix of eigenvectors of Y=X T X Σ : diagonal matrix L of eigenvalues

SVD: Dimensionality Reduction

Cutting the dimensions with the least singular values

Computing Similarity in LSI

LSI and PLSI LSI: find the k-dimensions that Minimizes the Frobenius norm of A-A. Frobenius norm of A: plsi: defines one s own objective function to minimize (maximize)

plsi a generative model

plsi a probabilistic approach

plsi Assume a multinomial distribution Distribution of topics (z) Question: How to determine z?

Using EM Likelihood E-step M-step

Relation with LSI Relation P ( d, w) = P( z) P( d z) P( w z) z Z Difference: LSI: minimize Frobenius (L-2) norm ~ additive Gaussian noise assumption on counts plsi: log-likelihood of training data ~ cross-entropy / KLdivergence

Mixture of Unigrams (traditional) Z i w i1 w 2i w 3i w 4i Mixture of Unigrams Model (this is just Naïve Bayes) For each of M documents, Choose a topic z. Choose N words by drawing each one independently from a multinomial conditioned on z. In the Mixture of Unigrams model, we can only have one topic per document!

The plsi Model d For each word of document d in the training set, z d1 z d2 z d3 z d4 Choose a topic z according to a multinomial conditioned on the index d. w d1 w d2 w d3 w d4 Generate the word by drawing from a multinomial conditioned on z. Probabilistic Latent Semantic Indexing (plsi) Model In plsi, documents can have multiple topics.

Problem of plsi It is not a proper generative model for document: Document is generated from a mixture of topics The number of topics may grow linearly with the size of the corpus Difficult to generate a new document

Dirichlet Distributions In the LDA model, we would like to say that the topic mixture proportions for each document are drawn from some distribution. So, we want to put a distribution on multinomials. That is, k-tuples of non-negative numbers that sum to one. The space is of all of these multinomials has a nice geometric interpretation as a (k-1)-simplex, which is just a generalization of a triangle to (k-1) dimensions. Criteria for selecting our prior: It needs to be defined for a (k-1)-simplex. Algebraically speaking, we would like it to play nice with the multinomial distribution.

Dirichlet Distributions Useful Facts: This distribution is defined over a (k-1)-simplex. That is, it takes k non-negative arguments which sum to one. Consequently it is a natural distribution to use over multinomial distributions. In fact, the Dirichlet distribution is the conjugate prior to the multinomial distribution. (This means that if our likelihood is multinomial with a Dirichlet prior, then the posterior is also Dirichlet!) The Dirichlet parameter α i can be thought of as a prior count of the i th class.

The LDA Model α θ θ θ z 1 z 2 z 3 z 4 z 1 z 2 z 3 z 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 For each document, Choose θ~dirichlet(α) For each of the N words wn: Choose a topic z n» Multinomial(θ) Choose a word w n from p(w n z n,β), a multinomial probability conditioned on the topic z n. β

The LDA Model For each document, Choose θ» Dirichlet(α) For each of the N words w n : Choose a topic z n» Multinomial(θ) Choose a word w n from p(w n z n,β), a multinomial probability conditioned on the topic z n.

LDA (Latent Dirichlet Allocation) Document = mixture of topics (as in plsi), but according to a Dirichlet prior When we use a uniform Dirichlet prior, plsi=lda A word is also generated according to another variable β:

Variational Inference In variational inference, we consider a simplified graphical model with variational parameters γ, φ and minimize the KL Divergence between the variational and posterior distributions.

Use of LDA A widely used topic model Complexity is an issue Use in IR: Interpolate a topic model with traditional LM Improvements over traditional LM, But no improvement over Relevance model (Wei and Croft, SIGIR 06)

References LSI Deerwester, S., et al, Improving Information Retrieval with Latent Semantic Indexing, Proceedings of the 51st Annual Meeting of the American Society for Information Science 25, 1988, pp. 36 40. Michael W. Berry, Susan T. Dumais and Gavin W. O'Brien, Using Linear Algebra for Intelligent Information Retrieval, UT-CS-94-270,1994 plsi LDA Thomas Hofmann, Probabilistic Latent Semantic Indexing, Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR-99), 1999 Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Finding Scientific Topics. Griffiths, T., & Steyvers, M. (2004). Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235. Hierarchical topic models and the nested Chinese restaurant process. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems (NIPS) 16, Cambridge, MA, 2004. MIT Press. Also see Wikipedia articles on LSI, plsi and LDA