Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 2. Basic Linear Algebra continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
Happened so far: matrix-vector multiplication, matrix-matrix multiplication vector norms, matrix norms distances between vectors eigenvalues, eigenvectors linear independence basis orthogonality Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1
Eigenvalues and eigenvectors Let A be a n n matrix. The vector v 0 that satisfies Av = λv for some scalar λ is called the eigenvector of A and λ is the eigenvalue corresponding to the eigenvector v. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2
Example det(a λi) = Av = ( 2 1 1 3 2 λ 1 1 3 λ ) v = λv. = (2 λ)(3 λ) 1 = 0 λ 1 = 3.62 λ 2 = 1.38 ( ) ( 0.52 0.85 v 1 = v 2 = 0.85 0.52 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3
3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4
Orthogonality Two vectors x and y are orthogonal, if x T y = 0. Let q j, j = 1,..., n be orthogonal, i.e. q T i q j = 0, i j. Then they are linearly independent. Let the set of orthogonal vectors q j, j = 1,..., m in R m be normalized, q = 1. Then they are orthonormal, and constitute an orthonormal basis in R m. A matrix R m m Q = [q 1 q 2... q m ] with orthonormal columns is called an orthogonal matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5
Example In the previous example we determined the eigenvectors of the matrix ( 2 1 1 3 ) to be v 1 = ( 0.52 0.85 ), v 2 = The vectors v 1 and v 2 are orthogonal: v T 1 v 2 = 0.52 0.85 + 0.85 ( 0.52) = 0. This is no coincidence! ( 0.85 0.52 ). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6
Eigenvalues and eigenvectors of a symmetric matrix The eigenvectors of a symmetric matrix are mutually orthogonal and its eigenvalues are real. (A R n n is a symmetric matrix, if A = A T.) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7
Eigendecomposition A symmetric matrix A R n n can be written in the form A = UΛU T, where the columns of U are the eigenvectors of A and Λ is a diagonal matrix, the diagonal elements of which are the corresponding eigenvalues of A. Note, that U is orthogonal. This is called the eigendecomposition or symmetric Schur decomposition of A. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8
U=[.52 0.85; 0.85-0.52] Check U = 0.5200 0.8500 0.8500-0.5200 Lambda=[3.62 0;0 1.38] Lambda = 3.6200 0 0 1.3800 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9
U*Lambda*U ans = 1.9759 0.9901 0.9901 2.9886 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10
Example of symmetric matrices: graphs The adjacency matrix of an undirected graph is a symmetric matrix: 1 2 A = 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 3 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11
Example: virus propagation Question 1: How does a virus spread across an arbitrary network? Question 2: Will it create an epidemic? Question 3: What can we do about it? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12
Model: the Susceptible-Infected-Susceptible (SIS) model cured nodes immediately become susceptible virus birth rate β: probability that an infected neighbor attacks virus death rate δ: probability that an infected node heals virus strength s = β/δ. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13
Healthy Infected? Infected Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14
Epidemic threshold τ The epidemic threshold of a graph is the largest value of τ for which it holds that if the virus strength an epidemic can not happen. s = β/δ < τ, Problem: given a graph G, compute the epidemic threshold τ. The epidemic threshold depends only on the graph! But which properties of the graph? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16
Answer (to question 2) Theorem. (Wang et al, 2003) Let us have a graph with the adjacency matrix A. We have no epidemic, if where β/δ < τ = 1/λ A, λ A the largest eigenvalue of A β is the prob. that an infected neighbor attacks, δ is the prob. that an infected node heals. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17
What can we do with this information? Q: Who is the best person/computer to immunize against the virus? A: The one the removal of which will make the largest difference in λ A. Note: Eigenvalues are strongly related to graph topology! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18
Other questions that can be answered in a similar way: who is the best customer to advertise to? who originated a raging rumor? in general: how important is a node in a network? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19
What if the matrix is not symmetric? If A is symmetric, we can write A = UΛU T, where U and Λ contain the eigenvectors and corresponding eigenvalues of A. What if A R m n? Definitely not symmetric! Then we can use the singular value decomposition (SVD): A = UΣV T, where U and V are orthogonal, and Σ is diagonal. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20
Example: customer \ day Wed Thu Fri Sat Sun ABC Inc. 1 1 1 0 0 CDE Co. 2 2 2 0 0 FGH Ltd. 1 1 1 0 0 NOP Inc. 5 5 5 0 0 Smith 0 0 0 2 2 Brown 0 0 0 3 3 Johnson 0 0 0 1 1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21
1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 ( ) 9.64 0 0 5.29 ( 0.58 0.58 0.58 0 0 ) 0 0 0 0.71 0.71 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22
How to compute matrix decompositions? We have had a brief glimpse at eigenvalue decomposition and singular value decomposition. Before taking a closer look: how can we compute these? they are imple- Answer 1: use LAPACK, Matlab, Mathematica,... mented everywhere! Answer 2: we need more linear algebra tools! Back to basics... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23
Givens (plane) rotations G = ( c s s c ), c 2 + s 2 = 1. theta=pi/8;c=cos(theta);s=sin(theta); G=[c s;-s c] G *G 0.9239 0.3827-0.3827 0.9239 1.0000 0.0000 0.0000 1.0000 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24
Givens (plane) rotations G = ( c s s c ), c 2 + s 2 = 1. We can choose c and s so that c = cos(θ), s = sin(θ) for some θ. Then multiplication of a vector x by G means that we rotate the vector in th R 2 by an angle θ: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25
x θ Gx Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26
Embed a 2-D rotation in a larger unit matrix: 1 0 0 0 0 c 0 s 0 0 1 0 0 s 0 c Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27
G = G *G ans = 1.0000 0 0 0 0 0.9239 0 0.3827 0 0 1.0000 0 0-0.3827 0 0.9239 1.0000 0 0 0 0 1.0000 0 0.0000 0 0 1.0000 0 0 0.0000 0 1.0000 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28
Givens rotations can be used to zero elements in vectors and matrices. ( ) c s Rotation matrix G =, c 2 + s 2 = 1. s c Choose c (and s) so that Gv = ( α 0 ) : Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29
Choose c (and s) so that Gv = ( α 0 ) : c = v 1 v 2 1 + v 2 2, s = v 2 v 2 1 + v 2 2 (Check that this does what it is supposed to do!) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30
We can choose c and s in 1 0 0 0 0 c 0 s 0 0 1 0 0 s 0 c so that we can zero element 4 in a vector by a rotation in plane (2,4). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31
x=[1;2;3;4]; sq=sqrt(x(2)^2+x(4)^2); c=x(2)/sq;s=x(4)/sq; G=[1 0 0 0;0 c 0 s;0 0 1 0;0 -s 0 c]; y=g*x y = 1.0000 4.4721 3.0000 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32
Reminder The inverse of an orthogonal matrix Q is Q 1 = Q T. The Euclidean length of a vector is invariant under an orthogonal transformation Q: Qx 2 = (Qx) T Qx = x T x = x 2. The product of two orthogonal matrices Q and P is orthogonal: X T X = (PQ) T PQ = Q T P T PQ = Q T Q = I. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33
Transforming a vector v to αe 1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34
Summarize αe 1 = G 3 (G 2 (G 1 v)) = (G 3 G 2 G 1 )v. Denote P = G 3 G 2 G 1. P is orthogonal, and Pv = αe 1. Since P is orthogonal, euclidean length is preserved, and α = v = n vi 2. i=1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35
Number of floating point operations What is the number of flops needed to transform the first column of a m n matrix to αe 1, a multiple of the first unit vector? the computation of ( c s s c ) ( x y ) = ( cx + sy sx + cy ) requires 4 multiplications and 2 additions, i.e. 6 flops. Applying such a transformation to a m n matrix requires 6n flops. In order to zero all elements but one in the first column of the matrix we must apply (m 1) rotations. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36
overall flop count: 6(m 1)n 6mn. the so called Householder transformation, with which one can do the same thing, is slightly cheaper: 4mn. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37
What was that all about? We can use very simple, orthogonal transformations (e.g. rotations) to zero elements in a matrix. Givens In principle, this is what is done when matrix decompositions are calculated. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38
Matrix decompositions We wish to decompose the matrix A by writing it as a product of two or more matrices: A m n = B m k C k n, A m n = B m k C k r D r n This is done in such a way that the right side of the equation yields some useful information or insight to the nature of the data matrix A. Or is in other ways useful for solving the problem at hand. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39
Matrix decompositions There are numerous examples of useful matrix decompositions: Eigendecomposition: A = UΛU T, A symmetric, U and Λ eigenvectors and eigenvalues. Singular value decomposition: A = UΣV T, U, V orthogonal, Σ diagonal. Matrix factorization is the same thing as matrix decomposition (e.g. NMF = nonnegative matix factorization, V = WH, all elements nonnegative.) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 40
How to compute these? Roughly (attn: in reality there is more to it than this!), (1) Manipulate A by multiplying it by intelligently chosen, fairly simple (orthogonal) matrices from both sides: V k... V 2 V 1 AW 1... W s = B, until B is nice. (2) Denote V = V k... V 2 V 1, W = W 1... W s. Now A = VBW T. But how to choose V 1,..., V k, W 1,..., W s? Needed: linear algebra tools for transforming matrices in an orderly fashion: Givens! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 41
By applying several Givens rotations in succession, we can transform α 0 x. 0. 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 42
QR transformation We transform A Q T A = R, where Q is orthogonal and R is upper triangular. Example: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 43
Note! Any matrix A R m n, m n, can be transformed to upper triangular form by an orthogonal matrix. If the columns of A are linearly independent, then R is non-singular. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 44
QR decomposition A = Q ( R 0 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 45
References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM 2007. [2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns Hopkins Press, Baltimore, MD., 1996. [3] Y. Wang, D. Chakrabarti, C. Wang and C. Faloutsos, Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint, SRDS 2003, Florence, Italy. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 46