Linear Algebra Methods for Data Mining

Size: px

Start display at page:

Download "Linear Algebra Methods for Data Mining"

Alberta Warren
6 years ago
Views:

1 Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 PCA, NMF Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

2 Summary: PCA PCA is SVD done on centered data. PCA looks for such a direction that the data projected onto it has maximal variance. When found, PCA continues by seeking the next direction, which is orthogonal to all the previously found directions, and which explains as much of the remaining variance in the data as possible. Principal components are uncorrelated. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

3 How to compute the PCA: Data matrix A, rows=data points, columns = variables (attributes, parameters). 1. Center the data by subtracting the mean of each column. 2. Compute the SVD of the centered matrix values and vectors): Â = UΣV T. Â (or the k first singular 3. The principal components are the columns of V, the coordinates of the data in the basis defined by the principal components are UΣ. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

4 Singular values tell about variance The variance in the direction of the k th principal component is given by the corresponding singular value: σ 2 k. Singular values can be used to estimate how many principal components to keep. Rule of thumb: keep enough to explain 85% of the variation: k j=1 σ2 j n j=1 σ2 j Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

5 PCA is useful for data exploration visualizing data compressing data outlier detection Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

6 Example: customer data Data matrix. Rows=customers, columns=days. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

7 Same data, rows and columns permuted: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

8 Define the data matrix on the previous slide to be A. The rows of A correspond to customers. The columns of A correspond to days. Let us compute the principal components of A: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

9 pcc= columns: principal components rows: days scc= columns: coordinates of customers in pc basis rows: customers Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

10 pcc1= mon tue wed thu fri sat sun mon tue wed thu fri sat sun scc1= ABC Ltd BCD Inc CDECorp DEF Ltd EFG Inc FGHCorp GHI Ltd HIJ Inc Smith Jones Brown Black Blake Lake Mr. X st pc: weekdays vs. weekends. Result: weekday customers (companies) get separated from weekend customers (private citizens). Big customers end up at exteme ends. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

11 pcc2= mon tue wed thu fri sat sun mon tue wed thu fri sat sun scc2= ABD Ltd BCD Inc CDECorp DEF Ltd EFG Inc FGHCorp GHI Ltd HIJ Inc Smith Jones Brown Black Blake Lake Mr. X st pc: weekends vs week days. 2nd pc: Weekends and weekdays have about equal total weight. Most weight on exceptional friday. Result: Separates big customers from small ones. Mr. X gets separated from the other customers. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

12 4 Customers: companies (blue), private (yellow), Mr. X (red) nd principal component st principal component Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

13 1.5 Customers: companies (blue), private (yellow), Mr. X (red) rd principal component st principal component Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

14 What if we transpose our problem? Instead of thinking of customers as our data points, why not think of days as our data points, and customers as the attributes/variables? The rows of A correspond to days. The columns of A correspond to customers. Let us compute the principal components of A: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

15 pcd= columns: principal components rows: customers scd= columns: coordinates of days in pc basis rows: days Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

16 Lets just look at the coordinates of the new data: scd= Rows=days 1st col = projection along 1st pc: weekdays vs weekends 2nd col = projection along 2nd pc: Mr. X 3rd col = projection along 3rd pc: exceptional friday 4th column: Nothing left to explain, except differences between monday and tuesday... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

17 Look at singular values The singular values of the centered data matrix By looking at these you might already conclude that the first three principal components are enough to capture most of the variation in the data Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

18 3.5 Days; weekdays (blue), weekends (green), exceptional friday (red) nd principal component st principal component Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

19 SVD vs PCA: Centering is central SVD will give vectors that go through the origin. Centering makes sure that the origin is in the middle of the data set. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

20 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

21 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

22 Matrix decompositions revisited We wish to decompose the matrix A by writing it as a product of two or more matrices: A m n = B m k C k n, A m n = B m k C k r D r n This is done in such a way that the right side of the equation yields some useful information or insight to the nature of the data matrix A. Or is in other ways useful for solving the problem at hand. examples: QR, SVD, PCA, NMF, Factor analysis, ICA, CUR, MPCA, AB,... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

23 NMF = Nonnegative Matrix Factorization Given a nonnegative matrix A R m n, we wish to express the matrix as a product of two nonnegative matrices W R m k and H R k n : A WH Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

24 Why require nonnegativity? nonnegativity is natural in many applications... term-document market basket etc Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

25 Example Rows: customers, columns: products they buy. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

26 PC= Example continued Principal components. Each column corresponds to a product. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

27 W = H = Rows of W are customers, rows of H T are products. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

28 Example: finnish dialects revisited Data: dialect words, 500 counties. Word-county matrix A: A(i, j) = { 1 if word i appears in county j 0 otherwise. Apply PCA to this: data points: words, variables: counties Each principal component tells which counties explain the most significant part of the variation left in the data. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

29 pca 1, nocomp 6 number of words Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

30 pca 2, nocomp 6 pca 3, nocomp Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

31 Gives a general idea of how dialects vary... But (in general) does not capture local structure very well! What we would like would be a decomposition, where the components represent contributions of single dialects. NMF? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

32 nmf 1, nocomp 6 nmf 2, nocomp Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

33 nmf 3, nocomp 6 nmf 4, nocomp Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

34 nmf 5, nocomp 6 nmf 6, nocomp Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

35 Results More local structure Components correspond to dialect regions Interpretation: A = WH where W R m k is the word per dialect region matrix, and H R k n is the dialect region per county matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

36 How to compute NMF: Multiplicative algorithm W=rand(m,k); H=rand(k,n); for i=1:maxiter H=H.*(W *A)./(W *W*H+epsilon); W=W.*(A*H )./(W*H*H +epsilon); end Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

37 Comments on the multiplicative algorithm Easy to implement. Convergence? Once an element is zero, it stays zero. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36

38 W = rand(m, k); for i=1:maxiter end How to compute NMF: ALS Solve for H in equation W T WH = W T A Set all negative elements in H to 0. Solve for W in equation HH T W T = HA T. Set all negative elements of W to zero. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37

39 Comments on the ALS algorithm Can be very fast (depending on implementation) Convergence? Sparsity Improved versions exist Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38

40 Uniqueness of NMF NMF is not unique: let D be a diagonal matrix with positive diagonal entries. Then WH = (WD)(D 1 H) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39

41 Initialization Convergence can be slow. It can be speeded up using a good initial guess: initialization. A good initialization can be found using SVD (see p. 106) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 40

42 Summary Given a nonnegative matrix A R m n W R m k and H R k n so that find nonnegative matrices A WH is minimized. Algorithms exist, both basic (easy to implement) and more advanced (implementing e.g. sparsity constraints) Interpretability Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 41

43 Applications text mining surveillance music transcription bioinformatics source separation spatial data analysis etc Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 42

44 References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM [2] Berry, Browne, Langville, Pauca, Plemmons: Algorithms and Applications for Approximate Nonnegative Matrix Factorization Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 43

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University