Latent Semantic Analysis (Tutorial)

Similar documents
Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Econ Slides from Lecture 7

CS 246 Review of Linear Algebra 01/17/19

Conceptual Questions for Review

Recall : Eigenvalues and Eigenvectors

1. Linear systems of equations. Chapters 7-8: Linear Algebra. Solution(s) of a linear system of equations (continued)

Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015

Symmetric and anti symmetric matrices

UNIT 6: The singular value decomposition.

CS 143 Linear Algebra Review

JUST THE MATHS SLIDES NUMBER 9.6. MATRICES 6 (Eigenvalues and eigenvectors) A.J.Hobson

Problems. Looks for literal term matches. Problems:

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Knowledge Discovery and Data Mining 1 (VO) ( )

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12

Maths for Signals and Systems Linear Algebra in Engineering

Foundations of Computer Vision

Linear Algebra - Part II

Image Registration Lecture 2: Vectors and Matrices

Stat 206: Linear algebra

Review of Some Concepts from Linear Algebra: Part 2

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra Review. Fei-Fei Li

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Matrices and Linear Algebra

and let s calculate the image of some vectors under the transformation T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

CS47300: Web Information Search and Management

1 Last time: least-squares problems

Linear Algebra Review. Fei-Fei Li

Dimensionality Reduction

Computational Methods. Eigenvalues and Singular Values

Lecture 15, 16: Diagonalization

1 Inner Product and Orthogonality

Manning & Schuetze, FSNLP (c) 1999,2000

Applied Linear Algebra in Geoscience Using MATLAB

Information Retrieval

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Lecture Notes: Eigenvalues and Eigenvectors. 1 Definitions. 2 Finding All Eigenvalues

Properties of Matrices and Operations on Matrices

Diagonalization of Matrix

ICS 6N Computational Linear Algebra Eigenvalues and Eigenvectors

APPLICATIONS The eigenvalues are λ = 5, 5. An orthonormal basis of eigenvectors consists of

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

Deep Learning Book Notes Chapter 2: Linear Algebra

Introduction to Matrix Algebra

Latent semantic indexing

The Singular Value Decomposition

Large Scale Data Analysis Using Deep Learning

Review problems for MA 54, Fall 2004.

Study Guide for Linear Algebra Exam 2

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Assignment 3. Latent Semantic Indexing

Review of Linear Algebra

Chapter 3. Determinants and Eigenvalues

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.

4. Determinants.

MATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

2 Eigenvectors and Eigenvalues in abstract spaces.

Problem # Max points possible Actual score Total 120

3 (Maths) Linear Algebra

6 EIGENVALUES AND EIGENVECTORS

Jordan Normal Form and Singular Decomposition

Singular Value Decompsition

Numerical Linear Algebra Homework Assignment - Week 2

Linear Algebra Review. Vectors

1. The Polar Decomposition

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Eigenvalues and Eigenvectors

Massachusetts Institute of Technology Department of Economics Statistics. Lecture Notes on Matrix Algebra

CHAPTER 3. Matrix Eigenvalue Problems

Matrix Algebra: Summary

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Linear Algebra Background

Math Linear Algebra Final Exam Review Sheet

Matrix operations Linear Algebra with Computer Science Application

A Review of Linear Algebra

EE5120 Linear Algebra: Tutorial 7, July-Dec Covers sec 5.3 (only powers of a matrix part), 5.5,5.6 of GS

3 Matrix Algebra. 3.1 Operations on matrices

Linear Algebra- Final Exam Review

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Math 489AB Exercises for Chapter 2 Fall Section 2.3

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

1 Linearity and Linear Systems

3.3 Eigenvalues and Eigenvectors

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Eigenvalues and Eigenvectors

Background Mathematics (2/2) 1. David Barber

14 Singular Value Decomposition

18.06SC Final Exam Solutions

Linear Algebra for Machine Learning. Sargur N. Srihari

Lecture 6 Positive Definite Matrices

Linear Algebra Primer

Transcription:

Latent Semantic Analysis (Tutorial) Alex Thomo Eigenvalues and Eigenvectors Let A be an n n matrix with elements being real numbers. If x is an n-dimensional vector, then the matrix-vector product Ax is well-defined, and the result is again an n-dimensional vector. In general, multiplication by a matrix changes the direction of a non-zero vector x, unless the vector is special and we have that Ax λx, for some scalar λ. In such a case, the multiplication by matrix A only stretches or contracts or reverses vector x, but it does not change its direction. These special vectors and their corresponding λ s are called eigenvectors and eigenvalues of A. For diagonal matrices it is easy to spot the eigenvalues and eigenvectors. For example matrix A 4 3 2 has eigenvalues and eigenvectors λ 4withx,λ 2 3withx 2, and λ 3 2withx 3 We will see that the number of eigenvalues is n for an n n matrix. Regarding eigenvectors, if x is an eigenvector then so is ax for any scalar a. However, if we consider only one eigenvector for each ax family, then there is a - correspondence of such eigenvectors to eigenvalues. Typically, we consider eigenvectors of unit length. Diagonal matrices are simple, the eigenvalues are the entries on the diagonal, and the eigenvectors are their columns. For other matrices we find the eigenvalues first by reasoning as follows. If Ax λx then (A λi)x, where I is the identity matrix. Since x is non-zero, matrix A λi has dependent columns and thus its determinant A λi must be zero. This gives us the equation A λi whose solutions are the eigenvalues of A. As an example let A [ 3 2 2 3 and A λi [ 3 λ 2 2 3 λ Then the equation A λi becomes (3 λ) 2 4 which has λ and λ 2 5 as solutions. For each of these eigenvalues the equation (A λi)x can be used to find the corresponding eigenvectors, e.g. [ [ [ [ 2 2 2 2 A λ I yields x 2 2, and A λ 2 I yields x 2 2 2..

In general, for an n n matrix A, the determinant A λi will give a polynomial of degree n which has n roots. In other words, the equation A λi willgiven eigenvalues. Let us create a matrix S with columns the n eigenvectors of A. We have that AS A[x,...,x n Ax +...+ Ax n λ x +...+ λ n x n λ [x,...,x n.... SΛ, λ n where Λ is the above diagonal matrix with the eigenvalues of A along its diagonal. Now suppose that the above n eigenvectors are linearly independent. This is true when the matrix has n distinct eigenvalues. Then matrix S is invertible and by mutiplying both sides of AS SΛ we have A SΛS. So, we were able to diagonalize matrix A in terms of the diagonal matrix Λ spelling the eigenvalues of A along its diagonal. This was possible because matrix S was invertible. When there are fewer than n eigenvalues then it might happen that the diagonalization is not possible. In such a case the matrix is defective having too few eigenvectors. In this tutorial, for reasons to be clear soon we will be interested in symmetric matrices (A A T ). For n n symmetric matrices it has been shown that that they always have real eigenvalues and their eigenvectors are perpendicular. As such we have that S T S x T.. x T n [x,...,x n... I. In other words, for symmetric matrices, S is S T (which can be easily obtained) and we have A SΛS T. 2

2 Singular Value Decomposition Now let A be an m n matrix with entries being real numbers and m>n. Let us consider the n n square matrix B A T A. It is easy to verify that B is symmetric; namely B T (A T A) T A T (A T ) T A T A B). It has been shown that the eigenvalues of such matrices (A T A) are real non-negative numbers. Since they are non-negative we can write them in decreasing order as squares of non-negative real numbers: σ 2... σn. 2 For some index r (possibly n) the first r numbers σ,...,σ r are positive whereas the rest are zero. For the above eigenvalues, we know that the corresponding eigenvectors x,...,x r are perpendicular. Furthemore, we normalize them to have length. Let S [x,...,x r. We create now the vectors y σ Ax,...,y r σ r Ax r. These are perpendicular m-dimensional vectors of length (orthonormal vectors) because y T i y j ( ) T Ax i Ax j σ σ j x T i A T Ax j σ i σ j x T i Bx j σ i σ j x T i σj 2 x j σ i σ j σ j σ i x T i x j is for i j and for i j (since x T i x j fori j and x T i x i ). Let We have S 2 [y,...,y r. y T j Ax i y T j (σ i y i )σ i y T j y i, which is if i j, and σ i if i j. From this we have S T 2 AS Σ, where Σ is the diagonal r r matrix with σ,...,σ r along the diagonal. Observe that S T 2 is r m, A is m n, and S is n r, and thus the above matrix multiplication is well defined. Since S 2 and S have orthonormal columns, S 2 S T 2 I m m and S S T I n n, where I m m and I n n are the m m and n n identity matrices. Thus, by multiplying the above equality by S 2 on the left and S on the right, we have A S 2 ΣS T. Reiterating, matrix Σ is diagonal and the values along the diagonal are σ,...,σ r, which are called singular values. They are the square roots of the eigenvalues of A T A and thus completely determined by A. The above decomposition of A into S T 2 AS is called singular value decomposition. For the ease of notation, let us denote S 2 by S and S by U (getting thus rid of the subscripts). Then A SΣU T. 3

3 Latent Semantic Indexing Latent Semantic Indexing (LSI) is a method for discovering hidden concepts in document data. Each document and term (word) is then expressed as a vector with elements corresponding to these concepts. Each element in a vector gives the degree of participation of the document or term in the corresponding concept. The goal is not to describe the concepts verbally, but to be able to represent the documents and terms in a unified way for exposing document-document, document-term, and term-term similarities or semantic relationship which are otherwise hidden. 3. An Example Suppose we have the following set of five documents d : Romeo and Juliet. d 2 : Juliet: O happy dagger! d 3 : Romeo died by dagger. d 4 : Live free or die, that s the New-Hampshire s motto. d 5 : Did you know, New-Hampshire is in New-England. and a search query: dies, dagger. Clearly, d 3 should be ranked top of the list since it contains both dies, dagger. Then, d 2 and d 4 should follow, each containing a word of the query. However, what about d and d 5? Should they be returned as possibly interesting results to this query? As humans we know that d is quite related to the query. On the other hand, d 5 is not so much related to the query. Thus, we would like d but not d 5, or differently said, we want d to be ranked higher than d 5. The question is: Can the machine deduce this? The answer is yes, LSI does exactly that. In this example, LSI will be able to see that term dagger is related to d because it occurs together with the d s terms Romeo and Juliet, ind 2 and d 3, respectively. Also, term dies is related to d and d 5 because it occurs together with the d s term Romeo and d 5 s term New-Hampshire in d 3 and d 4, respectively. LSI will also weigh properly the discovered connections; d more is related to the query than d 5 since d is doubly connected to dagger through Romeo and Juliet, and also connected to die through Romeo, whereas d 5 has only a single connection to the query through New-Hampshire. 3.2 SVD for LSI Formally let A be the m n term-document matrix of a collection of documents. Each column of A corresponds to a document. If term i occurs a times in document j then A[i, j a. The dimensions of A, m and n, correspond to the number of words and documents, respectively, in the collection. For our example, matrix A is: romeo juliet happy dagger live die free new-hampshire d d 2 d 3 d 4 d 5 4

Observe that B A T A is the document-document matrix. If documents i and j have b words in common then B[i, j b. On the other hand, C AA T is the term-term matrix. If terms i and j occur together in c documents then C[i, j c. Clearly, both B and C are square and symmetric; B is an m m matrix, whereas C is an n n matrix. Now, we perform an SVD on A using matrices B and C as described in the previous section A SΣU T, where S is the matrix of the eigenvectors of B, U is the matrix of the eigenvectors of C, and Σ is the diagonal matrix of the singular values obtained as square roots of the eigenvalues of B. Matrix Σ for our example is calculated to be Σ 2.285 2..36.8.797 (For small SVD calculations, you can use the BlueBit calculator at http://www.bluebit.gr/matrixcalculator). As you can see, the singular values along the diagonal of Σ are listed in descending order of their magnitude. Some of the singular values are too small and thus negligible. What really constitutes too small is usually determined empirically. In LSI we ignore these small singular values and replace them by. Let us say that we only keep k singular values in Σ. Then Σ will be all zeros except the first k entries along its diagonal. As such, we can reduce matrix Σ into Σ k which is an k k matrix containing only the k singular values that we keep, and also reduce S and U T,intoS k and U T k,to have k columns and rows, respectively. Of course, all these matrix parts that we throw out would have been zeroed anyway by the zeros in Σ. Matrix A is now approximated by A k S k Σ k U T k. Observe that since S k,σ k, and Uk T are m k, k k, and k n matrices, their product, A k is again an m n matrix. Intuitively, the k remaning ingredients of the eigenvectors in S and U correspond to k hidden concepts where the terms and documents participate. The terms and documents have now a new representation in terms of these hidden concepts. Namely, the terms are represented by the row vectors of the m k matrix S k Σ k, whereas the documents by the column vectors the k n matrix Σ k U T k. Then the query is represented by the centroid of the vectors for its terms. 3.3 Completing the Example Let us now fully develop the example we started with in this section. Suppose we set k 2, i.e. we will consider only the first two singular values. Then we have [ 2.285 Σ 2 2. 5

U T 2 romeo juliet happy dagger live die free new-hampshire S 2.396.28.34.45.78.269.438.369.264.346.524.246.264.346.326.46 [.3.47.594.63.43.363.54.2.695.229 The terms in the concept space are represented by the row vectors of S 2 whereas the documents by the column vectors of U2 T. In fact we scale the (two) coordinates of these vectors by multiplying with the corresponding singular values of Σ 2 and thus represent the terms by the row vectors of S 2 Σ 2 and and the documents by the column vectors of Σ 2 U2 T. Finally we have and romeo live d [.95.563 [.63.695 [.7.73, juliet [.97, die.494 [.93,d 2.87 q [.77.95, happy [.63, free.695 [.357,d 3.42 [.47.54, dagger [..742 [.745, new-hampshire.925 [.378,d 4.397, [.327,d 5.46 Now the query is represented by a vector computed as the centroid of the vectors for its terms. In our example, the query is die, dagger and so the vector is [ [.97. +.494.742 2 [.99.24 In order to rank the documents in relation to query q we will use the cosine distance. In other words, we will compute d i q d i q where i [,n and then sort the results in descending order. For our example, let us do this geometrically. Based on Figure, the following interesting observations can be made.. Document d is closer to query q than d 5. As a result d is ranked higher than d 5. This conforms to our human preference, both Romeo and Juliet died by a dagger. 2. Document d is slightly closer to q than d 2. Thus the system is quite intelligent to find out that d, containing both Romeo and Juliet, is more relevant to the query than d 2 even though d 2 contains explicitly one of the words in the query while d does not. A human user would probably do the same..,. 6

d3 d2 dagger romeo juliet d happy q die live, fr e e new-hamshire d5 d4 Figure : Geometric representation of documents d,...,d 5, their terms, and query q. 7