Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Size: px

Start display at page:

Download "Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY"

Thomasina Preston
6 years ago
Views:

1 Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

2 OUTLINE Why We Need Matrix Decomposition SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

3 A TYPICAL TERM-BY-DOCUMENT MATRIX 1. All entries are nonnegative 2. Most entries are zeros 3. Large dimensions 4. Disorganized 5. Lots of noise

4 A SUPERMARKET TRANSCATION MATRIX 1. All entries are nonnegative 2. Most entries are zeros 3. Large dimensions 4. Disorganized 5. Lots of noise

5 WHY WE NEED MATRIX DECOMPOSITION? Compact representation of data in the form of matrix Original matrix == Factor matrix * * Factor matrix Original matrix: sparse, no ordered Factor matrix: compact, ordered. Easy to find hidden relationships in data, e.g., orthogonal, correlation, etc.

6 COMPACT REPRESENTATION OF ORIGINAL DATA Column clustering = x x Row clustering

7 REDUCE 2-D DATA TO 1-D DATA 1-D data 2-D data Reference:Faloutsos et. al., Large Graph Mining, KDD09

8 OUTLINE Why We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

9 SINGULAR VALUE DECOMPOSITION(SVD) A [n x m] = U [n x r] r x r] (V [m x r] ) T A: n * m matrix (E.g., n documents*m words, or n pages*m links) U: n x r matrix (e.g., n documents, r topics) : r x r diagonal matrix (strength of each topics) (r is rank of matrix A), Sometimes the diagonal matrix is denoted as V: m x r matrix (e.g., m words, r topics) P1-9

10 SVD A = U V T -example:

11 Gene H. Golub (February 29, 1932 November 16, 2007) American Mathematician and Computer Scientist 11

12 SVD - PROPERTIES Theorem [Press,92]: Any numerical matrix A can be decomposed in the form of A = U V T, U, V: unique (*) U, V: column orthogonal (i.e., Any column vectors of U and V matrices have unit norm, and they are mutually orthogonal) U T U = I; V T V = I (I: identity matrix) : diagonal matrix, diagonal entries are nonnegative, and in descending order

13 SVD EXAMPLE A = U V T -example: Eng Med data infṛetrieval brain lung = x x

14 SVD EXAMPLE A = U V T -example: data infṛetrieval Eng Topics Med Topics Eng Med = brain lung x x

15 Faloutsos, Miller, Tsourakakis KDD'09 SVD EXAMPLE A = U V T -example: Document-to-Topics Similarity Matrix data infṛetrieval Eng Topics Med Topics Eng Med = brain lung x x P1-15

16 SVD EXAMPLE A = U V T -example: data infṛetrieval brain lung Strength of Eng Topics Eng Med = x x

17 SVD EXAMPLE A = U V T -example: Eng Med data infṛetrieval brain lung = x Word-to-Topics Similarity Matrix x

18 SVD PROPERTIES Documents, Words and Concepts /Topics : U: Document-to-Topic Similarity Matrix V: Word-to-Topic Similarity Matrix : Strength of Every Topics

19 SVD PROPERTIES Documents, Words and Topics : Q: If A is document-to-word similarity matrix, then what can be said about A T A? A: Q: How about AA T? A:

20 SVD PROPERTIES Documents, Words and Topics : Q: If A is document-to-word similarity matrix, what can be said about A T A? A: Word-to-word similarity matrix Q: How about AA T? A: Document-to-document similarity matrix

21 PROPERTIES OF SVD The columns of V are the eigenvectors of the covariance matrix of A T A

22 PROPERTIES OF SVD The columns of V are the eigenvectors of the covariance matrix of A T A

23 PROPERTIES OF SVD The columns of U are the eigenvectors of the inner-product matrix of AA T

24 PROPERTIES OF SVD The columns of U are the eigenvectors of the inner-product matrix of AA T

25 PROPERTIES OF SVD SVD: best Projection coordinates First eigenvector v1 Best :min sum of squares of projection errors

26 SVD DIMENSION REDUCTION Original matrix

27 SVD DIMENSION REDUCTION A = U V T 分解 = x x v

28 SVD REDUCTION A = U V T : = v 1 covariance of coordinate x x

29 SVD DIMENSION REDUCTION A = U V T : U :The value of the data projected onto the projection axis = x x

30 SVD DIMENSION REDUCTION Remove small singular values and the corresponding singular vectors (setting them to zero): = x x

31 SVD DIMENSION REDUCTION Why is it called dimension reduction Original matrix: rank = x x

32 SVD DIMENSION REDUCTION Why is it called dimension reduction? Modified data: rank = x 9.64 x

33 SVD = x x u 1 u v 1 v 2

34 SVD = x x u 1 u v 1 v 2 = 1 u 1 v T u 2 v T

35 SVD n m r topics = 1 u 1 v T u 2 v T n x 1 1 x m

36 SVD Data approximation/dimension reduction n m = 1 u 1 v T u 2 v T >= 2 >=...

37 SVD A k = U k V T k Or, m n = 1 u 1 v T u k 1 >= 2 >=... v T k

38 SVD A k = U k V T k n or, m Eckart-Young-Misky Theorem: A k is the best rank-k matrix that minimizes A k A F = 1 u 1 v T u k 1 >= 2 >=... v T k

39 TRUNCATED SVD

40 OUTLINE Why Do We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

41 NONNEGATIVE MATRIX FACTORIZATION (NMF) Given a nonnegative matrix V, decompose it into the product or two (or more) nonnegative matrices W and H. V = n x m W = n x r H = r x m V WH (n+m)r < nm, original matrix is compressed/rank reduced

42 DIFFERENCE BETWEEN NMF AND SVD There is no negative value in NMF. NMF is additive combinations, and can be easily understood and linked to physical meanings SVD is unique, NMF is not unique. The nonuniqueness of NMF is both advantageous and disadvantageous Advantages: Better for privacy protection Disadvantages: How to find the optimal solution?

43 OBJECTIVE FUNCTIONS Quality of NMF:

44 FACTORIZATION:ITERATIVE UPDATES (OBJECTIVE FUNCTION 1) The following iterative updates guarantee 1) nonnegativity; 2)Elements of W and H do not increase

45 FACTORIZATION: ITERATIVE UPDATES (OBJECTIVE FUNCTION 2) The following iterative updates guarantee 1)Nonnegativity; 2)Elements of W and H doe not increase

46 INITIALIZATION OF NMF The final nonnegative matrices W and H depend on the initial choices of W and H. Different initial values will result in different NMF, even the iterative update rules are the same. (How to optimize the initial matrices, can use SVD approximations)

47 PROPERTIES OF NMF The final nonnegative matrices W and H depend on the initial choices of W and H. Differential initial values will result in different NMF, even the iterative update rules are the same. The update rules of NMF can only guarantee to converge to a local optimum. Why?

48 WHY ONLY LOCAL OPTIMUM The solution space of W is a convex set, that of H is also a convex set But the solution space of WH may not be a convex set There does not seem to have global optimum for an optimization problem on a non-convex set

49 NMF EXAMPLE

50 OUTLINE Why Do We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

51 DATA VALUE PERTURBATION SVD or NMF Perturbation

52 Objective: Balance privacy preservation and data utility

53 NMF DATA PERTURBATION

54 EXPERIMENTAL RESULTS OF NMF DATA PERTURBATION Upper left: Original data (3 clusters). Upper right: NMF perturbed data (large perturbation, good clusters). Lower left: Additive noise with Gauss distribution. Lower right: Additive noise with normal distribution (small perturbation, bad clusters)

55 SUPPORT VECTOR MACHINE CLASSIFICATION Top: SVM with original data (98% correct rate) Middle: SVM with NMF perturbed data (98% correct rate) Bottom: SVM with normal distribution noise added data (54% correct rate)

56 SVD DATA PERTURBATION

57 EXPERIMENTAL RESULTS (COMPLEXITY)

58 DATA PATTERN HIDING Data pattern: Records A and B are in the same cluster In original data, if A and B are in the same cluster, then A B, otherwise A B In privacy-preserving data mining, sometimes, data owner does not want to disclose the same cluster relationship (or not same cluster relationship)

59 EXAMPLE

60 METHOD Perform MNF on A (n*m): A WH W(n*r):Cluster basis: Assume there are r clusters H(r*m):coefficients for clusters Record A i is in cluster j,if j=arg max H it, t=1,,m

61 METHOD Perform NMF on A(n*m): A WH W(n*r): Cluster basis, assume r clusters H(r*m):Cluster coefficients Record A i is in cluster j, if j=arg max H it, t=1,,m Assume that A i and A j are in different clusters in the original data, but A i and A t are in the same cluster, i.e., A i A j, A i A t.

62 CHANGE CLUSTER MEMBERSHIP Assume that A i and A j are in different clusters in the original data, but A i and A t are in the same cluster, i.e., A i A j, A i A t. If the data owner wants to hide these data patterns, what can we do?

63 CHANGE CLUSTER MEMBERSHIP Remember: Record A i is in cluster j, if j=arg max H it, t=1,,m Method: To hide A i A j, Adjust the locations of the maximum values of H i and H j, and make them in the same column Method: To hide A i A t, adjust the positions of the maximum values of H i and H t, so that they are in different columns

64 MAXIMUM AND MINIMUM EXCHANGE In original data, data x is in cluster j, we want to hide this information H x =(H x1,, H xi,, H xj,,h xm ) Obviously, H xj >= H xt, t<>j We assume that H xi <= H xt, t<>i

65 MAXIMUM AND MINIMUM EXCHANGE In original data, data x is in cluster j, we want to hide this information H x =(H x1,, H xi,, H xj,,h xm ) Obviously, H xj >= H xt, t<>j We assume that H xi <= H xt, t<>i The modified data is H * x=(h x1,, H xj,, H xi,,h xm )

66 INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume

67 INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume If x y, i.e., x and y are not in same cluster (IdX max IdY max ), and this information should be hidden

68 INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume If x y, i.e., x and yare in same the cluster, (IdX max =IdY max ), this information should be hidden 1 t k, t IdX max

69 ALL EXCHANGE METHOD For records x and y Assume Modify H x and H y to be

70 EXAMPLE After NMF, we have H 50 H 80 (The largest coefficients and are in the 2 nd row) To hide H 50 H 80, modify H 80

71 PRACTICAL PROBLEMS The clustering from NMF is not accurate Membership exchange based on NMF may not be accurate However, we know the correct clustering results, we can modify data until the desired membership changes are achieved We may incorporate the clustering information into the NMF process

72 ANY QUESTION?

Jun Zhang Department of Computer Science University of Kentucky

Jun Zhang Department of Computer Science University of Kentucky Background on Privacy Attacks General Data Perturbation Model SVD and Its Properties Data Privacy Attacks Experimental Results Summary 2