Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
Example 1: Term-Document matrices Doc1 Doc2 Doc3 Doc4 Query Term1 1 0 1 0 1 Term2 0 0 1 1 1 Term3 0 1 1 0 0 The documents and the query are represented by a vector in R n (here n = 3). In applications matrices may be large! Number of terms: 10 4, number of documents: 10 6. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1
Example 1 continued: Tasks Find document vectors close to query. Use some distance measure in R n. Use linear algebra methods for data compression retrieval enhancement. Find topics or concepts from term-document matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2
40 Example 2: measurement data 20 0 20 40 0 60 2 4 6 8 T672 10 12 14 16 x 10 4 40 20 0 20 0 2 4 6 8 10 12 14 16 0.08 NOx672 x 10 4 0.06 0.04 0.02 0 0 2 4 6 8 10 12 14 16 csink x 10 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3
Example 2 continued In Hyytiälä Forest Field Station the 30 minute averages of some 100+ variables have measured for 10+ years... some 175 000 time points, 100 variables: alot of data! Possible question: how do days vary? how do measured variables depend on each other? what separates days when phenomenon X occurs from those when it doesn t? are there (independent) (pollution) sources present? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4
Matrices A = a 11 a 12... a 1n a 21. a 22.... a 2n. a m1 a m2... a mn Rm n Rectangular array of data: elements are real numbers. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5
vectors Basic concepts norms and distances eigenvalues, eigenvectors linearly independent vectors, basis orthogonal bases matrices, orthogonal matrices orthogonal matrix decompositions: SVD Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6
Next: quick review of the following concepts: matrix-vector multiplication, matrix-matrix multiplication vector norms, matrix norms distances between vectors eigenvalues, eigenvectors linear independence basis orthogonality Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7
Matrix-vector multiplication Ax = a 11 a 12... a 1n a 21. a 22.... a 2n. a m1 a m2... a mn x 1 x 2. x n = n j=1 a 1jx j n j=1 a 2jx j. n j=1 a mjx j = y = Symbolically Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8
In practice 2 3 6 4 1 0 ( 5 2 ) =??? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9
2 3 6 4 1 0 ( 5 2 ) = 2 5 + 3 ( 2) 6 5 + 4 ( 2) 1 5 + 0 ( 2) = 4 22 5 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10
Or 2 3 6 4 1 0 ( 5 2 ) = 5 2 6 1 2 3 4 0 = 4 22 5 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11
Alternative presentation of matrix-vector multiplication: Denote the column vectors of the matrix A by a j. Then y = Ax = (a 1 a 2... a n ) x 1 x 2. x n = n j=1 x j a j So the vector y is a linear combination of the columns of A. Often this is a useful way to consider matrix-vector multiplication: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12
Example Let columns of A be different topics : Topic1 Topic2 Topic3 Term1 1 0 0 Term2 1 0 0 Term3 0 1 0 Term4 0 0 1 Then if we multipy A by the vector w = = A. 2 0 1, we get... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13
Aw = 1 0 0 1 0 0 0 1 0 0 0 1 2 0 1 = 2 1 1 0 0 + 0 0 0 1 0 + 1 0 0 0 1 = 2 2 0 1 = y, which represents a document dealing primarily with topic 1 and secondarily with topic 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14
Note on computational aspects The column oriented approach is also good when considering computational efficiency. Modern computing devices are able to exploit the fact that a vector operation is a very regular sequence of scalar operations. This approach is embedded in packages like Matlab and LAPACK (and others). SAXPY, GAXPY Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15
Matrix-matrix multiplication Let A R m s and B R s n. Then, by definition, R m n C = AB = (c ij ), s c ij = a ik b kj, i = 1...m, j = 1...n. k=1 Note: each column vector in B is multiplied by A. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16
In practice 2 3 6 4 1 0 ( 5 1 2 1 ) =?????? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17
2 3 6 4 1 0 ( 5 1 2 1 ) = 2 5 3 2 2 1 + 3 1 6 5 4 2 6 1 + 4 1 1 5 0 2 1 1 + 0 1 = 4 6 22 10 5 1 = 5 2 6 1 2 3 4 0 1 2 6 1 + 1 3 4 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18
Matrix multiplication code for i=1:m, for j=1:n, for k=1:s, c(i,j)=c(i,j)+a(i,s)*b(s,j); end end end Note: loops may be permuted in 6 different ways! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19
How to measure the size of a vector? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20
1-norm: x 1 = n i=1 x i Vector norms The most common vector norms are Euclidean norm: x 2 = n i=1 x2 i max-norm: x = max 1 i n x i all of the above are special cases of the L p -norm (or p-norm): x p = ( n i=1 xp i )1/p Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21
General definition of a vector norm Generally, a vector norm is a mapping R n R, with the properties x 0 for all x, x = 0 if and only if x = 0, αx = α x, for all α R, x + y x + y, the triangular equality. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22
How to measure distance between vectors? Obvious answer: the distance between two vectors x and y is x y, where is some vector norm. Frequently one measures the distance by the Euclidean norm x y 2. So usually, if the index is dropped, this is what is meant. Alternative: use the angle between two vectors x and y to measure the distance between them. How to calculate the angle between two vectors? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23
Angle between vectors The inner product between two vectors is defined by (x, y) = x T y. This is associated with the Euclidean norm: x 2 = (x T x) 1/2. The angle θ between two vectors x and y is cos θ = xt y x 2 y 2. The cosine of the angle between two vectors x and y can be used to measure the similarity between the two vectors: if x and y are close, the angle between them is small, and cos θ 1. x and y are orthogonal, if θ = π 2, i.e. xt y = 0. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24
Why not just use the Euclidean distance? a b c Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25
Example: term-document matrix Each entry tells how many times a term appears in the document: Doc1 Doc2 Doc3 Term1 10 1 0 Term2 10 1 0 Term3 0 0 1 Using the Euclidean distance Documents 1 and 2 look dissimilar, and Documents 2 and 3 look similar. This is just due to the length of the documents! Using the cosine of the angle between document vectors Documents 1 and 2 are similar to each other and dissimilar to Document 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26
Eigenvalues and eigenvectors Let A be a n n matrix. The vector v that satisfies Av = λv for some scalar λ is called the eigenvector of A and λ is the eigenvalue corresponding to the eigenvector v. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27
In practice Av = ( 2 1 1 3 ) v = λv. det(a λi) = 2 λ 1 1 3 λ = (2 λ)(3 λ) 1 = 0 λ 1 = 3.62 λ 2 = 1.38 ( ) ( 0.52 0.85 v 1 = v 2 = 0.85 0.52 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28
3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29
Matrix norms Let. be a vector norm and A R m n. The corresponding matrix norm is A = sup x 0 Ax x. A 2 = (max 1 i n λ i (A T A)) 1/2 = square root of the largest eigenvalue of A T A. Heavy to compute! A = max 1 i m n j=1 a ij (maximum over rows) A 1 = max 1 j n m i=1 a ij (maximum over columns) m n A F = i=1 j=1 a2 ij Frobenius norm: does not correspond to any vector norm. Still, related to Euclidean vector norm. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30
Linear Independence Given a set of vectors (v j ) n j=1 in Rm, m n, consider the set of linear combinations y = n j=1 α jv j for arbitrary coefficients α j. The vectors (v j ) n j=1 are linearly independent, if n j=1 α jv j = 0 if and only if α j = 0 for all j = 1,..., n. A set of m linearly independent vectors of R m is called a basis in R m : any vector in R m can be expressed as a linear combination of the basis vectors. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31
Example The column vectors of the matrix 1 0 0 1 1 1 0 0 [v 1 v 2 v 3 v 4 ] = 0 1 1 0 0 0 1 1 are not linearly independent, as α 1 v 1 + α 2 v 2 + α 3 v 3 + α 4 v 4 = 0 holds for α 1 = α 3 = 1, α 2 = α 4 = 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32
Rank of a matrix The rank of a matrix is the maximum number of linearly independent column vectors. A square matrix A R n n with rank n is called nonsingular, and it has an inverse A 1 satisfying AA 1 = A 1 A = I. The (outer product) matrix xy T has rank 1: All columns of xy T = (y 1 x y 2 x... y n x) are linearly dependent (and so are all the rows). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33
Example The 4 4 matrix 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 has rank 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34
Example Consider a m n term-document matrix A = [a 1 a 2 a j R m are the documents.... a n ], where If A has rank 3, then all the documents can be expressed as a linear combination of only three vectors v 1, v 2 and v 3 R m : a j = w 1j v 1 + w 2j v 2 + w 3j v 3, j = 1,..., n. The term-document matrix can be written as A = VW where V = (v 1 v 2 v 3 ) R m 3 and W = (w ij ) R 3 n. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35
For any a 1 the matrix A = ( 1 1 inverse A 1 = 1 a 1 1 a Condition number ). ( a 1 1 1 As a 1, the norm of A 1 tends to infinity. Nonsingularity is not always enough! ) is nonsingular and has the Define the condition number of a matrix to be κ(a) = A A 1. Large condition number means trouble! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36
Orthogonality Two vectors x and y are orthogonal, if x T y = 0. Let q j, j = 1,..., n be orthogonal, i.e. q T i q j = 0, i j. Then they are linearly independent. (Proof?) Let the set of orthogonal vectors q j, j = 1,..., m in R m be normalized, q = 1. Then they are orthonormal, and constitute an orthonormal basis in R m. A matrix R m m Q = [q 1 q 2... q m ] with orthonormal columns is called an orthogonal matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37
Why we like orthogonal matrices An orthogonal matrix Q R m m has rank m (since its columns are linearly independent). Q T Q = I. QQ T = I. (Proofs?) The inverse of an orthogonal matrix Q is Q 1 = Q T. The Euclidean length of a vector is invariant under an orthogonal transformation Q: Qx 2 = (Qx) T Qx = x T x = x 2. The product of two orthogonal matrices Q and P is orthogonal: X T X = (PQ) T PQ = Q T P T PQ = Q T Q = I. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38
References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM 2007. [2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns Hopkins Press, Baltimore, MD., 1996. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39