Introduction to Data Mining

Size: px

Start display at page:

Download "Introduction to Data Mining"

Stephanie Brooks
5 years ago
Views:

1 Introduction to Data Mining Lecture #21: Dimensionality Reduction Seoul National University 1

2 In This Lecture Understand the motivation and applications of dimensionality reduction Learn the definition and properties of SVD, one of the most important tools in data mining Learn how to interpret the results of SVD, and how to use it for dimensionality reduction 2

3 Outline Overview Dim. Reduction with SVD 3

4 Dimensionality Reduction Assumption: Data lies on or near a low d-dimensional subspace Aes of this subspace are effective representation of the data 4

5 Dimensionality Reduction Compress / reduce dimensionality: 10 6 rows; 10 3 columns; no updates Random access to any cell(s); small error: OK The above matri is really 2-dimensional. All rows can be reconstructed by scaling [] or [ ] 5

6 Rank of a Matri Q: What is rank of a matri A? A: Number of linearly independent columns of A For eample: Matri A = has rank r=2 Why? Why do we care about low rank? We can write A as two basis vectors: [1 2 1] [-2-3 1] And new coordinates of : [1 0] [0 1] [1-1] 6

7 Rank is Dimensionality Cloud of points 3D space: Think of point positions as a matri: 1 row per point: A B C A We can rewrite coordinates more efficiently! Old basis vectors: [1 0 0] [0 1 0] [0 0 1] New basis vectors: [1 2 1] [-2-3 1] Then A has new coordinates: [1 0]. B: [0 1], C: [1-1] Notice: We reduced the number of coordinates! 7

8 Dimensionality Reduction Goal of dimensionality reduction is to discover the ais of data! Rather than representing every point with 2 coordinates we represent each point with 1 coordinate (corresponding to the position of the point on the red line). By doing this we incur a bit of error as the points do not eactly lie on the line 8

9 Why Reduce Dimensions? Why reduce dimensions? Discover hidden correlations/topics Words that occur commonly together Remove redundant and noisy features Not all words are useful Interpretation and visualization Easier storage and processing of the data 9

10 SVD (Singular Value Decomposition) A [m n] = U [m r] Σ [ r r] (V [n r] ) T A: Input data matri m n matri (e.g., m documents, n terms) U: Left singular vectors m r matri (m documents, r concepts) Σ: Singular values r r diagonal matri (strength of each concept ) (r : rank of the matri A) V: Right singular vectors n r matri (n terms, r concepts) 10

11 SVD (Singular Value Decomposition) T n n m A m Σ V T U 11

12 SVD (Singular Value Decomposition) T n σ 1 u 1 v 1 σ 2 u 2 v 2 m A + Also called spectral decomposition σ i scalar u i vector v i vector 12

13 SVD - Properties It is always possible to decompose a real matri A into A = U Σ V T, where U, Σ, V: unique U, V: column orthonormal U T U = I; V T V = I (I: identity matri) (Columns are orthogonal unit vectors) Σ: diagonal Entries (singular values) are positive, and sorted in decreasing order (σ 1 σ ) Nice proof of uniqueness: 13

14 SVD - Eample A = U Λ V T - eample: CS MD data infṛetrieval brain lung =

15 SVD - Eample A = U Λ V T - eample: data infṛetrieval CS-concept MD-concept CS MD = brain lung

16 SVD - Eample A = U Λ V T - eample: data infṛetrieval doc-to-concept similarity matri CS-concept MD-concept CS MD = brain lung

17 SVD - Eample A = U Λ V T - eample: data infṛetrieval brain lung strength of CS-concept CS MD =

18 SVD - Eample CS MD A = U Λ V T - eample: data infṛetrieval brain lung = CS-concept term-to-concept similarity matri

19 SVD - Eample CS MD A = U Λ V T - eample: data infṛetrieval brain lung = CS-concept term-to-concept similarity matri

20 Outline Overview Dim. Reduction with SVD 20

21 SVD - Interpretation #1 documents, terms and concepts : U: document-to-concept similarity matri V: term-to-concept sim. matri Λ: its diagonal elements: strength of each concept 21

22 SVD Interpretation #1 documents, terms and concepts : Q: if A is the document-to-term matri, what is A T A? A: Q: A A T? A: 22

23 SVD Interpretation #1 documents, terms and concepts : Q: if A is the document-to-term matri, what is A T A? A: term-to-term ([m m]) similarity matri Q: A A T? A: document-to-document ([n n]) similarity matri 23

24 SVD - Interpretation #2 best ais to project on: ( best = min sum of squares of projection errors) 24

25 SVD - Motivation 25

26 SVD - interpretation #2 SVD: gives best ais to project first singular vector v1 minimum RMS error 26

27 SVD - Interpretation #2 27

28 SVD - Interpretation #2 A = U Λ V T - eample: = v

29 SVD - Interpretation #2 A = U Λ V T - eample: = variance ( spread ) on the v1 ais

30 SVD - Interpretation #2 A = U Λ V T - eample: U Λ gives the coordinates of the points in the projection ais =

31 SVD - Interpretation #2 More details Q: how eactly is dim. reduction done? =

32 SVD - Interpretation #2 More details Q: how eactly is dim. reduction done? A: set the smallest singular values to zero: =

33 SVD - Interpretation # ~

34 SVD - Interpretation # ~

35 SVD - Interpretation # ~

36 SVD - Interpretation # ~

37 SVD Best Low Rank Appro. Sigma A = U V T B is the best approimation of A B = U Sigma V T 37

38 SVD Best Low Rank Appro. Theorem: Let A = U Σ V T and B = U S V T where S = diagonal rr matri with s i =σ i (i=1 k) else s i =0 then B is a best rank(b)=k appro. to A What do we mean by best : B is a solution to min B ǁA-Bǁ F where rank(b)=k σσ 11 Σ σσ rrrr AA BB FF = iiii AA iiii BB iiii 2 38

39 SVD - Interpretation #2 Equivalent: spectral decomposition of the matri: = u 1 u 2 σ 1 σ 2 v 1 v 2 39

40 SVD - Interpretation #2 Equivalent: spectral decomposition of the matri n m = σ 1 u 1 v T 1 + σ 2 u 2 v T n 1 1 m k terms Assume: σ 1 σ 2 σ Why is setting small σ i to 0 the right thing to do? Vectors u i and v i are unit length, so σ i scales them. So, zeroing small σ i introduces less error. 40

41 SVD - Interpretation #2 n Q: How many σs to keep? A: Rule-of-a thumb: keep 80-90% of energy = ii σσ ii 22 m σ 1 v T 1 + σ 2 u 2 v T = u 1 Assume: σ 1 σ 2 σ

42 SVD - Compleity To compute SVD (for n m matri): But: O(nm 2 ) or O(n 2 m) (whichever is less) Less work, if we just want singular values or if we want first k singular vectors or if the matri is sparse Implemented in linear algebra packages like LINPACK, Matlab, SPlus, Mathematica... 42

43 What You Need to Know SVD: A= U Σ V T : unique U: user-to-concept similarities V: movie-to-concept similarities Σ : strength of each concept Dimensionality reduction by SVD: Keep the few largest singular values (80-90% of energy ) 43

44 Questions? 44

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a