Probabilistic Latent Semantic Analysis

Size: px
Start display at page:

Download "Probabilistic Latent Semantic Analysis"

Transcription

1 Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea 1 / 27

2 Outline Singular value decomposition Latent semantic analysis (a.k.a. latent semantic indexing) Probabilistic latent semantic analysis (a.k.a. probabilistic latent semantic indexing) 2 / 27

3 Range Space and Null Space Definition For any A R m n, its range space R(A) and null space N (A) are defined as follows: R(A) = {y R m y = Ax for some x R n } N (A) = {x R n Ax = 0}. Examples: What are range space and null space of A and B? [ ] [ ] A = B = / 27

4 Linear Algebraic Equations A linear algebraic equation has the form where A R m n, x R n, and b R m. Ax = b, (1) Theorem A solution of Ax = b exists if and only if b R(A). Theorem Let x be a particular solution to (1), then x = x + N (A) = {x + w w N (A)} is the general solution. Remark: Note that N (A) is the set of homogeneous solutions to (1). Therefore the set of all solutions is the sum of a particular solution and the set of homogeneous solutions. 4 / 27

5 Singular Value Decomposition Theorem (SVD) Let A R m n. Then there exist orthogonal matrices such that U = [u 1, u 2,..., u m ] R m m V = [v 1, v 2,..., v n ] R n n U T AV = diag (σ 1, σ 2,..., σ p ) R m n (2) where p = min{m, n} and σ 1 σ 2 σ p 0. The scalars {σ i R} are called singular values. SVD A = UΣV 5 / 27

6 Low-Rank Approximation Theorem (Eckart and Young, 1936) Suppose that A is an m n matrix of rank r, with singular values σ 1 σ 2 σ r. and SVD A = UΣV. Then the best rank-k approximation of A is A k = U 1 Σ 1 V 1, where That is, A A k F = min U = [U 1, U 2 ], V = [V 1, V 2 ], Σ 1 = diag(σ 1,..., σ k ). { } A Â F Â Rm n, rank(â) = k. 6 / 27

7 Proof of Eckart-Young Theorem Note that Direct calculations give A Â F = UΣV Â F = Σ U } {{ ÂV } F. N Σ N 2 F = i,j Σ i,j N i,j 2 = r σ i N i,i 2 + N i,i 2 + i>r i j i=1 N i,j 2, which is minimal when all the non-diagonal entries of N are equal to zero and so are all N i,i for i > r. The minimum of r i=1 σ i N i,i 2 is attained when σ i = N i,i for i = 1,..., k and all other N i,i are zero. 7 / 27

8 Least Squares Problem Consider a matrix A R m n with Rank(A) = r. Then R(A) = R(U 1 ) = sp [u 1, u 2,..., u r ] N (A) = sp [v r+1, v r+2,..., v n ] = R(V 2 ). The least squares (LS) problem is as follows: Setting x b Ax 2 = 0 leads to min x R n b Ax 2. (3) A T Ax = A T b. (4) 8 / 27

9 Remarks From SVD, one can see that the particular solution to (4) is x = V 1 Σ 1 U T 1 b. The complete solution is x = x + N (A T A). Note that N (A T A) = N (A) = sp[v 2 ]. Hence x = V 1 Σ 1 U T 1 b + V 2 w for some w R r. Since V 1 Σ 1 U T 1 b V 2 w for all w, x = V 1 Σ 1 U T 1 b is the MMSE solution for which x is the smallest among all solutions. The pseudo-inverse of A is The condition number of A is given by σ1 σ r. A = V 1 Σ 1 U T 1. (5) 9 / 27

10 PCA Principal component analysis (PCA) is a well-established technique for dimension reduction. Its applications include data compression, image processing, data visualization, exploratory data analysis, pattern recognition, and time series prediction. The most common derivation of PCA is in terms of a orthogonal projection which maximizes the variance in the projected space. Given a set of m-dimensional observation vector, {x t }, the PCA aims at finding a orthogonal linear projection y = W x such that the variance of y R q (q < d) is maximized. The ith element of y is called the ith principal component. Alternatively, the PCA provides an orthogonal linear projection which minimize the squared reconstruction error t x t ˆx t 2. Thus the PCA is an optimal linear encoding in MS sense. 10 / 27

11 PCA and SVD It was shown that the ith row vector of W denoted by w T i corresponds to the normalized eigenvector associated with the ith largest eigenvalue of the covariance matrix R x = E{xx T } Principal components can be found by SVD, linear neural networks, or probabilistic methods. The SVD of R x has the form R x = UΣV T. Then we select n column vectors to contruct U 1 = [u 1,..., u n ]. The the PCA transform leads to y = U T 1 x. 11 / 27

12 PCA: An Example / 27

13 Spectral Decomposition (Eigen-Decomposition) Given a symmetric matrix C R m m, its spectral decomposition is given by C = λ 1 u 1 u T 1 + λ 2 u 2 u T λ m u m u T m, where λ 1 λ 2 λ m are eigenvalues of C and u i are associated eigenvectors. 13 / 27

14 Power Iteration The power iteration is a classical method which finds the largest eigenvector (associated with the largest eigenvalue) of a matrix C R m m. Given a symmetric matrix C R m m (hence its eigenvalues are real), the power iteration starts from a nonzero vector w(0) and iteratively updates w(t) by where 2 represents Euclidean norm. w(t + 1) = Cw(t), (6) w(t + 1) = w(t + 1), w(t + 1) 2 (7) Combining (6) and (7) leads to the updating rule which has the form [ w(t + 1) = Cw(t) w T (t)c w(t)] (8) Assume that C has an unique eigenvalue of maximum modulus λ 1 associated with the leading eigenvector u 1. Then the power iteration (8) leads w(t) to converge to u / 27

15 Deflation Suppose that we are interested in computing eigenvectors of the data covariance matrix C = xx T. The power iteration is applied to C for extracting its first eigenvector. Our question arises, How can we compute the second eigenvector of C using the power iteration?. The deflation method is a common numerical technique for computing several eigenvalues and eigenvectors of C. Assume that the first eigenvector is already computed. Then the output value can be deflated by the following transformation: x = ( I u 1 u T ) 1 x. (9) One can easily see that x x T = m i=2 λ iu i u T i. The power iteration is applied to the deflated data, in order to extract the second eigenvector of C. 15 / 27

16 Term-Document Matrix A term-document matrix X R D N is a collection of vector space representations of documents, where rows are terms (words) and columns are documents ( ) N X ij = t ij log, idf i where t ij is the term frequency of word i in document j and idf i is the number of documents containing word i. We write X = [d 1, d 2,..., d N ] (document vectors) = t 1 t 2. t D (term vectors). 16 / 27

17 Latent Semantic Analysis A method for automatic indexing and retrieval, uncovering latent semantic structure of a term-document matrix. SVD of X R D N is given by X UΣV where U R D K, Σ R K K, and V R N K. D N K K X U Σ V K K D N Comparing documents: Σ 1 U X = Σ 1 U UΣV = V = [ d 1,..., d N ]. Comparing terms: X V Σ 1 = UΣV V Σ 1 = U = [ t 1,..., t D ]. 17 / 27

18 Probabilistic Latent Semantic Analysis The key idea in latent semantic analysis is to map high-dimensional count vectors (co-occurrence data, dyadic data) to a lower-dimensional representation in a so-called latent semantic space. The goal of LSA is to find a data mapping which reveals semantical relations between the entries of interest. Probabilistic latent semantic analysis (PLSA) is a probabilistic variant of LSA: Has a sound statistical foundation; Defines a proper generative model of the data. Aspect model: A latent variable model for co-occurrence data which associates an unobserved class z {z 1,..., z K } with each occurrence of a word w {w 1,..., w D } in a document d {d 1,..., d N }. 18 / 27

19 PLSA: Graphical Representation d j A document d j and a term (word) w i are conditionally independent given an unobserved topic z: [ ] z p(w i, d j ) = p(d j ) Generation process: z p(w i z)p(z d j ). w i N D Select a document dj with probability p(d j ). Pick a latent class (topic) with probability p(z d j ). Generate a word wi with probability p(w i z). 19 / 27

20 PLSA: Symmetric Parameterization d j z p(w i, d j ) = z = z p(w i, d j, z) p(w i, d j z)p(z) N = z p(w i z)p(d j z)p(z). w i D 20 / 27

21 Model Fitting: EM Algorithm Dyadic data X : Entries X ij are made for dyads (w i, d j ) which refer to a domain with two sets of objects, W = {w 1,..., w D } and D = {d 1,..., d N }. Complete-data likelihood: p(x, z) = p(w i, d j, z) C ij = [p(w i z)p(d j z)p(z)] C ij, i j where C ij are the empirical counts for dyads (w i, d j ) and X ij = C ij / i j C ij. EM optimization E-step: Compute the expected complete-data log-likelihood: L c = p(z k w i, d j )C ij log [p(w i z k )p(d j z k )p(z k )]. i j k M-step: Re-estimate parameters p(wi z k ), p(d j z k ), p(z k ) which maximizes L c. 21 / 27

22 E-Step: Compute p(z w i, d j ) Compute the posterior distribution over latent variables: p(z k w i, d j ) = p(w i, d j z k )p(z k ) p(w i, d j ) p(w i z k )p(d j z k )p(z k ) = l p(w i z l )p(d j z l )p(z l ), where p(w i z k ), p(d j z k ), p(z k ) are estimated in the M-step. 22 / 27

23 M-Step: Re-estimate Parameters Re-estimate parameters: p(w i z k ) = p(d j z k ) = p(z k ) = j C ijp(z k w i, d j ) j C ijp(z k w i, d j ), i i C ijp(z k w i, d j ) j C ijp(z k w i, d j ), i i j C ijp(z k w i, d j ) i j C. ij 23 / 27

24 Updating equations in the M-step are determined by solving [ L c + λ(1 ] p(w i z k )) = 0, p(w i z k ) i L c + λ(1 p(d j z k )) = 0, p(d j z k ) j [ L c + λ(1 ] p(z l )) = 0, p(z k ) l for p(w i z k ), p(d j z k ), p(z k ), respectively. 24 / 27

25 Document Clustering by PLSA You are given parameters p(w i z k ), p(d j z k ), p(z k ) estimated by EM optimization. Compute p(z k d j ) p(d j z k )p(z k ). Assign document d j to cluster k if k = arg max p(z k d j ). k 25 / 27

26 PLSA: Revisited The PLSA models each word in a document as a sample from a mixture model where the mixture components are multinomial random variables that can be viewed as representations of topics, p(w, d) = z p(z)p(w z)p(d z) = p(d) z p(w z)p(z d). Each document is represented as a list of mixing proportions for mixture components and reduced to a probability distribution on a fixed set of topics. This distribution is the reduced description associated with the document. In PLSA, each document is represented as a list of numbers (the mixing proportions for topics) and there is no generative probabilistic model for these numbers. This leads to the following problems: The number of parameters in the model grows linearly with the size of corpus, which leads to problems with overfitting. It is not clear how to assign probability to a document outside of the training set. 26 / 27

27 References S. Deerwester, S. T. Dumais, and R. Harshman, Indexing by latent semantic analysis, Journal of the Americal Society of Information Science, vol. 41, no. 6, pp , T. Hofmann, Probabilistic latent semantic indexing, in Proc. SIGIR T. Hofmann, Probabilistic latent semantic analysis, in Proc. UAI / 27

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Language Information Processing, Advanced. Topic Models

Language Information Processing, Advanced. Topic Models Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

A Coupled Helmholtz Machine for PCA

A Coupled Helmholtz Machine for PCA A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Probabilistic Latent Semantic Analysis Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, 2014 1 / 47 Big picture: KDDM

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Constrained Projection Approximation Algorithms for Principal Component Analysis

Constrained Projection Approximation Algorithms for Principal Component Analysis Constrained Projection Approximation Algorithms for Principal Component Analysis Seungjin Choi, Jong-Hoon Ahn, Andrzej Cichocki Department of Computer Science, Pohang University of Science and Technology,

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Linear Least Squares. Using SVD Decomposition.

Linear Least Squares. Using SVD Decomposition. Linear Least Squares. Using SVD Decomposition. Dmitriy Leykekhman Spring 2011 Goals SVD-decomposition. Solving LLS with SVD-decomposition. D. Leykekhman Linear Least Squares 1 SVD Decomposition. For any

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare

More information

Singular Value Decomposition and its. SVD and its Applications in Computer Vision

Singular Value Decomposition and its. SVD and its Applications in Computer Vision Singular Value Decomposition and its Applications in Computer Vision Subhashis Banerjee Department of Computer Science and Engineering IIT Delhi October 24, 2013 Overview Linear algebra basics Singular

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Chapter 7: Symmetric Matrices and Quadratic Forms

Chapter 7: Symmetric Matrices and Quadratic Forms Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. CSC 566 Advanced Data Mining Alexander Dekhtyar.. Information Retrieval Latent Semantic Indexing Preliminaries Vector Space Representation of Documents: TF-IDF Documents. A single text document is a

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

MATH36001 Generalized Inverses and the SVD 2015

MATH36001 Generalized Inverses and the SVD 2015 MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1 Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Au = λu Example: ( ) 2 0 A = 2 1 λ 1 = 1 with eigenvector

More information

Review of Some Concepts from Linear Algebra: Part 2

Review of Some Concepts from Linear Algebra: Part 2 Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: Au = λu λ C : eigenvalue u C n : eigenvector Example:

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Eigenvalue Problems Computation and Applications

Eigenvalue Problems Computation and Applications Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462 Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

A few applications of the SVD

A few applications of the SVD A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following

More information