Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 2. Basic Linear Algebra continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

Happened so far: matrix-vector multiplication, matrix-matrix multiplication vector norms, matrix norms distances between vectors eigenvalues, eigenvectors linear independence basis orthogonality Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

Eigenvalues and eigenvectors Let A be a n n matrix. The vector v 0 that satisfies Av = λv for some scalar λ is called the eigenvector of A and λ is the eigenvalue corresponding to the eigenvector v. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

Example det(a λi) = Av = ( 2 1 1 3 2 λ 1 1 3 λ ) v = λv. = (2 λ)(3 λ) 1 = 0 λ 1 = 3.62 λ 2 = 1.38 ( ) ( 0.52 0.85 v 1 = v 2 = 0.85 0.52 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

Orthogonality Two vectors x and y are orthogonal, if x T y = 0. Let q j, j = 1,..., n be orthogonal, i.e. q T i q j = 0, i j. Then they are linearly independent. Let the set of orthogonal vectors q j, j = 1,..., m in R m be normalized, q = 1. Then they are orthonormal, and constitute an orthonormal basis in R m. A matrix R m m Q = [q 1 q 2... q m ] with orthonormal columns is called an orthogonal matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

Example In the previous example we determined the eigenvectors of the matrix ( 2 1 1 3 ) to be v 1 = ( 0.52 0.85 ), v 2 = The vectors v 1 and v 2 are orthogonal: v T 1 v 2 = 0.52 0.85 + 0.85 ( 0.52) = 0. This is no coincidence! ( 0.85 0.52 ). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

Eigenvalues and eigenvectors of a symmetric matrix The eigenvectors of a symmetric matrix are mutually orthogonal and its eigenvalues are real. (A R n n is a symmetric matrix, if A = A T.) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

Eigendecomposition A symmetric matrix A R n n can be written in the form A = UΛU T, where the columns of U are the eigenvectors of A and Λ is a diagonal matrix, the diagonal elements of which are the corresponding eigenvalues of A. Note, that U is orthogonal. This is called the eigendecomposition or symmetric Schur decomposition of A. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

U=[.52 0.85; 0.85-0.52] Check U = 0.5200 0.8500 0.8500-0.5200 Lambda=[3.62 0;0 1.38] Lambda = 3.6200 0 0 1.3800 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

U*Lambda*U ans = 1.9759 0.9901 0.9901 2.9886 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

Example of symmetric matrices: graphs The adjacency matrix of an undirected graph is a symmetric matrix: 1 2 A = 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 3 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

Example: virus propagation Question 1: How does a virus spread across an arbitrary network? Question 2: Will it create an epidemic? Question 3: What can we do about it? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

Model: the Susceptible-Infected-Susceptible (SIS) model cured nodes immediately become susceptible virus birth rate β: probability that an infected neighbor attacks virus death rate δ: probability that an infected node heals virus strength s = β/δ. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

Healthy Infected? Infected Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

Epidemic threshold τ The epidemic threshold of a graph is the largest value of τ for which it holds that if the virus strength an epidemic can not happen. s = β/δ < τ, Problem: given a graph G, compute the epidemic threshold τ. The epidemic threshold depends only on the graph! But which properties of the graph? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

Answer (to question 2) Theorem. (Wang et al, 2003) Let us have a graph with the adjacency matrix A. We have no epidemic, if where β/δ < τ = 1/λ A, λ A the largest eigenvalue of A β is the prob. that an infected neighbor attacks, δ is the prob. that an infected node heals. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

What can we do with this information? Q: Who is the best person/computer to immunize against the virus? A: The one the removal of which will make the largest difference in λ A. Note: Eigenvalues are strongly related to graph topology! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

Other questions that can be answered in a similar way: who is the best customer to advertise to? who originated a raging rumor? in general: how important is a node in a network? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

What if the matrix is not symmetric? If A is symmetric, we can write A = UΛU T, where U and Λ contain the eigenvectors and corresponding eigenvalues of A. What if A R m n? Definitely not symmetric! Then we can use the singular value decomposition (SVD): A = UΣV T, where U and V are orthogonal, and Σ is diagonal. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

Example: customer \ day Wed Thu Fri Sat Sun ABC Inc. 1 1 1 0 0 CDE Co. 2 2 2 0 0 FGH Ltd. 1 1 1 0 0 NOP Inc. 5 5 5 0 0 Smith 0 0 0 2 2 Brown 0 0 0 3 3 Johnson 0 0 0 1 1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 ( ) 9.64 0 0 5.29 ( 0.58 0.58 0.58 0 0 ) 0 0 0 0.71 0.71 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

How to compute matrix decompositions? We have had a brief glimpse at eigenvalue decomposition and singular value decomposition. Before taking a closer look: how can we compute these? they are imple- Answer 1: use LAPACK, Matlab, Mathematica,... mented everywhere! Answer 2: we need more linear algebra tools! Back to basics... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

Givens (plane) rotations G = ( c s s c ), c 2 + s 2 = 1. theta=pi/8;c=cos(theta);s=sin(theta); G=[c s;-s c] G *G 0.9239 0.3827-0.3827 0.9239 1.0000 0.0000 0.0000 1.0000 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

Givens (plane) rotations G = ( c s s c ), c 2 + s 2 = 1. We can choose c and s so that c = cos(θ), s = sin(θ) for some θ. Then multiplication of a vector x by G means that we rotate the vector in th R 2 by an angle θ: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

x θ Gx Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

Embed a 2-D rotation in a larger unit matrix: 1 0 0 0 0 c 0 s 0 0 1 0 0 s 0 c Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

G = G *G ans = 1.0000 0 0 0 0 0.9239 0 0.3827 0 0 1.0000 0 0-0.3827 0 0.9239 1.0000 0 0 0 0 1.0000 0 0.0000 0 0 1.0000 0 0 0.0000 0 1.0000 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

Givens rotations can be used to zero elements in vectors and matrices. ( ) c s Rotation matrix G =, c 2 + s 2 = 1. s c Choose c (and s) so that Gv = ( α 0 ) : Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

Choose c (and s) so that Gv = ( α 0 ) : c = v 1 v 2 1 + v 2 2, s = v 2 v 2 1 + v 2 2 (Check that this does what it is supposed to do!) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

We can choose c and s in 1 0 0 0 0 c 0 s 0 0 1 0 0 s 0 c so that we can zero element 4 in a vector by a rotation in plane (2,4). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

x=[1;2;3;4]; sq=sqrt(x(2)^2+x(4)^2); c=x(2)/sq;s=x(4)/sq; G=[1 0 0 0;0 c 0 s;0 0 1 0;0 -s 0 c]; y=g*x y = 1.0000 4.4721 3.0000 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

Reminder The inverse of an orthogonal matrix Q is Q 1 = Q T. The Euclidean length of a vector is invariant under an orthogonal transformation Q: Qx 2 = (Qx) T Qx = x T x = x 2. The product of two orthogonal matrices Q and P is orthogonal: X T X = (PQ) T PQ = Q T P T PQ = Q T Q = I. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

Transforming a vector v to αe 1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

Summarize αe 1 = G 3 (G 2 (G 1 v)) = (G 3 G 2 G 1 )v. Denote P = G 3 G 2 G 1. P is orthogonal, and Pv = αe 1. Since P is orthogonal, euclidean length is preserved, and α = v = n vi 2. i=1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

Number of floating point operations What is the number of flops needed to transform the first column of a m n matrix to αe 1, a multiple of the first unit vector? the computation of ( c s s c ) ( x y ) = ( cx + sy sx + cy ) requires 4 multiplications and 2 additions, i.e. 6 flops. Applying such a transformation to a m n matrix requires 6n flops. In order to zero all elements but one in the first column of the matrix we must apply (m 1) rotations. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36

overall flop count: 6(m 1)n 6mn. the so called Householder transformation, with which one can do the same thing, is slightly cheaper: 4mn. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37

What was that all about? We can use very simple, orthogonal transformations (e.g. rotations) to zero elements in a matrix. Givens In principle, this is what is done when matrix decompositions are calculated. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38

Matrix decompositions We wish to decompose the matrix A by writing it as a product of two or more matrices: A m n = B m k C k n, A m n = B m k C k r D r n This is done in such a way that the right side of the equation yields some useful information or insight to the nature of the data matrix A. Or is in other ways useful for solving the problem at hand. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39

Matrix decompositions There are numerous examples of useful matrix decompositions: Eigendecomposition: A = UΛU T, A symmetric, U and Λ eigenvectors and eigenvalues. Singular value decomposition: A = UΣV T, U, V orthogonal, Σ diagonal. Matrix factorization is the same thing as matrix decomposition (e.g. NMF = nonnegative matix factorization, V = WH, all elements nonnegative.) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 40

How to compute these? Roughly (attn: in reality there is more to it than this!), (1) Manipulate A by multiplying it by intelligently chosen, fairly simple (orthogonal) matrices from both sides: V k... V 2 V 1 AW 1... W s = B, until B is nice. (2) Denote V = V k... V 2 V 1, W = W 1... W s. Now A = VBW T. But how to choose V 1,..., V k, W 1,..., W s? Needed: linear algebra tools for transforming matrices in an orderly fashion: Givens! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 41

By applying several Givens rotations in succession, we can transform α 0 x. 0. 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 42

QR transformation We transform A Q T A = R, where Q is orthogonal and R is upper triangular. Example: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 43

Note! Any matrix A R m n, m n, can be transformed to upper triangular form by an orthogonal matrix. If the columns of A are linearly independent, then R is non-singular. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 44

QR decomposition A = Q ( R 0 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 45

References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM 2007. [2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns Hopkins Press, Baltimore, MD., 1996. [3] Y. Wang, D. Chakrabarti, C. Wang and C. Faloutsos, Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint, SRDS 2003, Florence, Italy. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 46