Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

Example 1: Term-Document matrices Doc1 Doc2 Doc3 Doc4 Query Term1 1 0 1 0 1 Term2 0 0 1 1 1 Term3 0 1 1 0 0 The documents and the query are represented by a vector in R n (here n = 3). In applications matrices may be large! Number of terms: 10 4, number of documents: 10 6. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

Example 1 continued: Tasks Find document vectors close to query. Use some distance measure in R n. Use linear algebra methods for data compression retrieval enhancement. Find topics or concepts from term-document matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

40 Example 2: measurement data 20 0 20 40 0 60 2 4 6 8 T672 10 12 14 16 x 10 4 40 20 0 20 0 2 4 6 8 10 12 14 16 0.08 NOx672 x 10 4 0.06 0.04 0.02 0 0 2 4 6 8 10 12 14 16 csink x 10 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

Example 2 continued In Hyytiälä Forest Field Station the 30 minute averages of some 100+ variables have measured for 10+ years... some 175 000 time points, 100 variables: alot of data! Possible question: how do days vary? how do measured variables depend on each other? what separates days when phenomenon X occurs from those when it doesn t? are there (independent) (pollution) sources present? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

Matrices A = a 11 a 12... a 1n a 21. a 22.... a 2n. a m1 a m2... a mn Rm n Rectangular array of data: elements are real numbers. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

vectors Basic concepts norms and distances eigenvalues, eigenvectors linearly independent vectors, basis orthogonal bases matrices, orthogonal matrices orthogonal matrix decompositions: SVD Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

Next: quick review of the following concepts: matrix-vector multiplication, matrix-matrix multiplication vector norms, matrix norms distances between vectors eigenvalues, eigenvectors linear independence basis orthogonality Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

Matrix-vector multiplication Ax = a 11 a 12... a 1n a 21. a 22.... a 2n. a m1 a m2... a mn x 1 x 2. x n = n j=1 a 1jx j n j=1 a 2jx j. n j=1 a mjx j = y = Symbolically Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

In practice 2 3 6 4 1 0 ( 5 2 ) =??? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

2 3 6 4 1 0 ( 5 2 ) = 2 5 + 3 ( 2) 6 5 + 4 ( 2) 1 5 + 0 ( 2) = 4 22 5 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

Or 2 3 6 4 1 0 ( 5 2 ) = 5 2 6 1 2 3 4 0 = 4 22 5 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

Alternative presentation of matrix-vector multiplication: Denote the column vectors of the matrix A by a j. Then y = Ax = (a 1 a 2... a n ) x 1 x 2. x n = n j=1 x j a j So the vector y is a linear combination of the columns of A. Often this is a useful way to consider matrix-vector multiplication: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

Example Let columns of A be different topics : Topic1 Topic2 Topic3 Term1 1 0 0 Term2 1 0 0 Term3 0 1 0 Term4 0 0 1 Then if we multipy A by the vector w = = A. 2 0 1, we get... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

Aw = 1 0 0 1 0 0 0 1 0 0 0 1 2 0 1 = 2 1 1 0 0 + 0 0 0 1 0 + 1 0 0 0 1 = 2 2 0 1 = y, which represents a document dealing primarily with topic 1 and secondarily with topic 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

Note on computational aspects The column oriented approach is also good when considering computational efficiency. Modern computing devices are able to exploit the fact that a vector operation is a very regular sequence of scalar operations. This approach is embedded in packages like Matlab and LAPACK (and others). SAXPY, GAXPY Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

Matrix-matrix multiplication Let A R m s and B R s n. Then, by definition, R m n C = AB = (c ij ), s c ij = a ik b kj, i = 1...m, j = 1...n. k=1 Note: each column vector in B is multiplied by A. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

In practice 2 3 6 4 1 0 ( 5 1 2 1 ) =?????? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

2 3 6 4 1 0 ( 5 1 2 1 ) = 2 5 3 2 2 1 + 3 1 6 5 4 2 6 1 + 4 1 1 5 0 2 1 1 + 0 1 = 4 6 22 10 5 1 = 5 2 6 1 2 3 4 0 1 2 6 1 + 1 3 4 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

Matrix multiplication code for i=1:m, for j=1:n, for k=1:s, c(i,j)=c(i,j)+a(i,s)*b(s,j); end end end Note: loops may be permuted in 6 different ways! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

How to measure the size of a vector? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

1-norm: x 1 = n i=1 x i Vector norms The most common vector norms are Euclidean norm: x 2 = n i=1 x2 i max-norm: x = max 1 i n x i all of the above are special cases of the L p -norm (or p-norm): x p = ( n i=1 xp i )1/p Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

General definition of a vector norm Generally, a vector norm is a mapping R n R, with the properties x 0 for all x, x = 0 if and only if x = 0, αx = α x, for all α R, x + y x + y, the triangular equality. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

How to measure distance between vectors? Obvious answer: the distance between two vectors x and y is x y, where is some vector norm. Frequently one measures the distance by the Euclidean norm x y 2. So usually, if the index is dropped, this is what is meant. Alternative: use the angle between two vectors x and y to measure the distance between them. How to calculate the angle between two vectors? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

Angle between vectors The inner product between two vectors is defined by (x, y) = x T y. This is associated with the Euclidean norm: x 2 = (x T x) 1/2. The angle θ between two vectors x and y is cos θ = xt y x 2 y 2. The cosine of the angle between two vectors x and y can be used to measure the similarity between the two vectors: if x and y are close, the angle between them is small, and cos θ 1. x and y are orthogonal, if θ = π 2, i.e. xt y = 0. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

Why not just use the Euclidean distance? a b c Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

Example: term-document matrix Each entry tells how many times a term appears in the document: Doc1 Doc2 Doc3 Term1 10 1 0 Term2 10 1 0 Term3 0 0 1 Using the Euclidean distance Documents 1 and 2 look dissimilar, and Documents 2 and 3 look similar. This is just due to the length of the documents! Using the cosine of the angle between document vectors Documents 1 and 2 are similar to each other and dissimilar to Document 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

Eigenvalues and eigenvectors Let A be a n n matrix. The vector v that satisfies Av = λv for some scalar λ is called the eigenvector of A and λ is the eigenvalue corresponding to the eigenvector v. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

In practice Av = ( 2 1 1 3 ) v = λv. det(a λi) = 2 λ 1 1 3 λ = (2 λ)(3 λ) 1 = 0 λ 1 = 3.62 λ 2 = 1.38 ( ) ( 0.52 0.85 v 1 = v 2 = 0.85 0.52 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

Matrix norms Let. be a vector norm and A R m n. The corresponding matrix norm is A = sup x 0 Ax x. A 2 = (max 1 i n λ i (A T A)) 1/2 = square root of the largest eigenvalue of A T A. Heavy to compute! A = max 1 i m n j=1 a ij (maximum over rows) A 1 = max 1 j n m i=1 a ij (maximum over columns) m n A F = i=1 j=1 a2 ij Frobenius norm: does not correspond to any vector norm. Still, related to Euclidean vector norm. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

Linear Independence Given a set of vectors (v j ) n j=1 in Rm, m n, consider the set of linear combinations y = n j=1 α jv j for arbitrary coefficients α j. The vectors (v j ) n j=1 are linearly independent, if n j=1 α jv j = 0 if and only if α j = 0 for all j = 1,..., n. A set of m linearly independent vectors of R m is called a basis in R m : any vector in R m can be expressed as a linear combination of the basis vectors. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

Example The column vectors of the matrix 1 0 0 1 1 1 0 0 [v 1 v 2 v 3 v 4 ] = 0 1 1 0 0 0 1 1 are not linearly independent, as α 1 v 1 + α 2 v 2 + α 3 v 3 + α 4 v 4 = 0 holds for α 1 = α 3 = 1, α 2 = α 4 = 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

Rank of a matrix The rank of a matrix is the maximum number of linearly independent column vectors. A square matrix A R n n with rank n is called nonsingular, and it has an inverse A 1 satisfying AA 1 = A 1 A = I. The (outer product) matrix xy T has rank 1: All columns of xy T = (y 1 x y 2 x... y n x) are linearly dependent (and so are all the rows). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

Example The 4 4 matrix 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 has rank 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

Example Consider a m n term-document matrix A = [a 1 a 2 a j R m are the documents.... a n ], where If A has rank 3, then all the documents can be expressed as a linear combination of only three vectors v 1, v 2 and v 3 R m : a j = w 1j v 1 + w 2j v 2 + w 3j v 3, j = 1,..., n. The term-document matrix can be written as A = VW where V = (v 1 v 2 v 3 ) R m 3 and W = (w ij ) R 3 n. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

For any a 1 the matrix A = ( 1 1 inverse A 1 = 1 a 1 1 a Condition number ). ( a 1 1 1 As a 1, the norm of A 1 tends to infinity. Nonsingularity is not always enough! ) is nonsingular and has the Define the condition number of a matrix to be κ(a) = A A 1. Large condition number means trouble! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36

Orthogonality Two vectors x and y are orthogonal, if x T y = 0. Let q j, j = 1,..., n be orthogonal, i.e. q T i q j = 0, i j. Then they are linearly independent. (Proof?) Let the set of orthogonal vectors q j, j = 1,..., m in R m be normalized, q = 1. Then they are orthonormal, and constitute an orthonormal basis in R m. A matrix R m m Q = [q 1 q 2... q m ] with orthonormal columns is called an orthogonal matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37

Why we like orthogonal matrices An orthogonal matrix Q R m m has rank m (since its columns are linearly independent). Q T Q = I. QQ T = I. (Proofs?) The inverse of an orthogonal matrix Q is Q 1 = Q T. The Euclidean length of a vector is invariant under an orthogonal transformation Q: Qx 2 = (Qx) T Qx = x T x = x 2. The product of two orthogonal matrices Q and P is orthogonal: X T X = (PQ) T PQ = Q T P T PQ = Q T Q = I. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38

References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM 2007. [2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns Hopkins Press, Baltimore, MD., 1996. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39