Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

OUTLINE Why We Need Matrix Decomposition SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

A TYPICAL TERM-BY-DOCUMENT MATRIX 1. All entries are nonnegative 2. Most entries are zeros 3. Large dimensions 4. Disorganized 5. Lots of noise

A SUPERMARKET TRANSCATION MATRIX 1. All entries are nonnegative 2. Most entries are zeros 3. Large dimensions 4. Disorganized 5. Lots of noise

WHY WE NEED MATRIX DECOMPOSITION? Compact representation of data in the form of matrix Original matrix == Factor matrix * * Factor matrix Original matrix: sparse, no ordered Factor matrix: compact, ordered. Easy to find hidden relationships in data, e.g., orthogonal, correlation, etc.

COMPACT REPRESENTATION OF ORIGINAL DATA Column clustering 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x Row clustering 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

REDUCE 2-D DATA TO 1-D DATA 1-D data 2-D data Reference:Faloutsos et. al., Large Graph Mining, KDD09

OUTLINE Why We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

SINGULAR VALUE DECOMPOSITION(SVD) A [n x m] = U [n x r] r x r] (V [m x r] ) T A: n * m matrix (E.g., n documents*m words, or n pages*m links) U: n x r matrix (e.g., n documents, r topics) : r x r diagonal matrix (strength of each topics) (r is rank of matrix A), Sometimes the diagonal matrix is denoted as V: m x r matrix (e.g., m words, r topics) P1-9

SVD A = U V T -example:

Gene H. Golub (February 29, 1932 November 16, 2007) American Mathematician and Computer Scientist 11

SVD - PROPERTIES Theorem [Press,92]: Any numerical matrix A can be decomposed in the form of A = U V T, U, V: unique (*) U, V: column orthogonal (i.e., Any column vectors of U and V matrices have unit norm, and they are mutually orthogonal) U T U = I; V T V = I (I: identity matrix) : diagonal matrix, diagonal entries are nonnegative, and in descending order

SVD EXAMPLE A = U V T -example: Eng Med data infṛetrieval brain lung 0.18 0 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD EXAMPLE A = U V T -example: data infṛetrieval Eng Topics Med Topics Eng Med 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = brain lung 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

Faloutsos, Miller, Tsourakakis KDD'09 SVD EXAMPLE A = U V T -example: Document-to-Topics Similarity Matrix data infṛetrieval Eng Topics Med Topics Eng Med 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = brain lung 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 P1-15

SVD EXAMPLE A = U V T -example: data infṛetrieval 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 brain lung 0.18 0 Strength of Eng Topics Eng Med = 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD EXAMPLE A = U V T -example: Eng Med data infṛetrieval brain lung 0.18 0 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x Word-to-Topics Similarity Matrix 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD PROPERTIES Documents, Words and Concepts /Topics : U: Document-to-Topic Similarity Matrix V: Word-to-Topic Similarity Matrix : Strength of Every Topics

SVD PROPERTIES Documents, Words and Topics : Q: If A is document-to-word similarity matrix, then what can be said about A T A? A: Q: How about AA T? A:

SVD PROPERTIES Documents, Words and Topics : Q: If A is document-to-word similarity matrix, what can be said about A T A? A: Word-to-word similarity matrix Q: How about AA T? A: Document-to-document similarity matrix

PROPERTIES OF SVD The columns of V are the eigenvectors of the covariance matrix of A T A

PROPERTIES OF SVD The columns of U are the eigenvectors of the inner-product matrix of AA T

PROPERTIES OF SVD SVD: best Projection coordinates First eigenvector v1 Best :min sum of squares of projection errors

SVD DIMENSION REDUCTION Original matrix

SVD DIMENSION REDUCTION A = U V T 分解 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x v 1 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD REDUCTION A = U V T : 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 v 1 covariance of coordinate x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION A = U V T : U :The value of the data projected onto the projection axis 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION Remove small singular values and the corresponding singular vectors (setting them to zero): 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION Why is it called dimension reduction Original matrix: rank 2 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION Why is it called dimension reduction? Modified data: rank 1 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0.18 0.36 0.18 0.90 0 0 0 x 9.64 x 0.58 0.58 0.58 0 0

SVD 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = x x u 1 u 2 1 2 v 1 v 2

SVD 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = x x u 1 u 2 1 2 v 1 v 2 = 1 u 1 v T 1 + 2 u 2 v T 2 +...

SVD n m 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 r topics = 1 u 1 v T 1 + 2 u 2 v T 2 +... n x 1 1 x m

SVD Data approximation/dimension reduction n m 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 1 u 1 v T 1 + 2 u 2 v T 2 +... 1 >= 2 >=...

SVD A k = U k V T k Or, m n 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 1 u 1 v T u 1 +... + k 1 >= 2 >=... v T k

SVD A k = U k V T k n or, m 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 Eckart-Young-Misky Theorem: A k is the best rank-k matrix that minimizes A k A F = 1 u 1 v T u 1 +... + k 1 >= 2 >=... v T k

TRUNCATED SVD

OUTLINE Why Do We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

NONNEGATIVE MATRIX FACTORIZATION (NMF) Given a nonnegative matrix V, decompose it into the product or two (or more) nonnegative matrices W and H. V = n x m W = n x r H = r x m V WH (n+m)r < nm, original matrix is compressed/rank reduced

DIFFERENCE BETWEEN NMF AND SVD There is no negative value in NMF. NMF is additive combinations, and can be easily understood and linked to physical meanings SVD is unique, NMF is not unique. The nonuniqueness of NMF is both advantageous and disadvantageous Advantages: Better for privacy protection Disadvantages: How to find the optimal solution?

OBJECTIVE FUNCTIONS Quality of NMF:

FACTORIZATION:ITERATIVE UPDATES (OBJECTIVE FUNCTION 1) The following iterative updates guarantee 1) nonnegativity; 2)Elements of W and H do not increase

FACTORIZATION: ITERATIVE UPDATES (OBJECTIVE FUNCTION 2) The following iterative updates guarantee 1)Nonnegativity; 2)Elements of W and H doe not increase

INITIALIZATION OF NMF The final nonnegative matrices W and H depend on the initial choices of W and H. Different initial values will result in different NMF, even the iterative update rules are the same. (How to optimize the initial matrices, can use SVD approximations)

PROPERTIES OF NMF The final nonnegative matrices W and H depend on the initial choices of W and H. Differential initial values will result in different NMF, even the iterative update rules are the same. The update rules of NMF can only guarantee to converge to a local optimum. Why?

WHY ONLY LOCAL OPTIMUM The solution space of W is a convex set, that of H is also a convex set But the solution space of WH may not be a convex set There does not seem to have global optimum for an optimization problem on a non-convex set

NMF EXAMPLE

OUTLINE Why Do We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

DATA VALUE PERTURBATION SVD or NMF Perturbation

Objective: Balance privacy preservation and data utility

NMF DATA PERTURBATION

EXPERIMENTAL RESULTS OF NMF DATA PERTURBATION Upper left: Original data (3 clusters). Upper right: NMF perturbed data (large perturbation, good clusters). Lower left: Additive noise with Gauss distribution. Lower right: Additive noise with normal distribution (small perturbation, bad clusters)

SUPPORT VECTOR MACHINE CLASSIFICATION Top: SVM with original data (98% correct rate) Middle: SVM with NMF perturbed data (98% correct rate) Bottom: SVM with normal distribution noise added data (54% correct rate)

SVD DATA PERTURBATION

EXPERIMENTAL RESULTS (COMPLEXITY)

DATA PATTERN HIDING Data pattern: Records A and B are in the same cluster In original data, if A and B are in the same cluster, then A B, otherwise A B In privacy-preserving data mining, sometimes, data owner does not want to disclose the same cluster relationship (or not same cluster relationship)

EXAMPLE

METHOD Perform MNF on A (n*m): A WH W(n*r):Cluster basis: Assume there are r clusters H(r*m):coefficients for clusters Record A i is in cluster j,if j=arg max H it, t=1,,m

METHOD Perform NMF on A(n*m): A WH W(n*r): Cluster basis, assume r clusters H(r*m):Cluster coefficients Record A i is in cluster j, if j=arg max H it, t=1,,m Assume that A i and A j are in different clusters in the original data, but A i and A t are in the same cluster, i.e., A i A j, A i A t.

CHANGE CLUSTER MEMBERSHIP Assume that A i and A j are in different clusters in the original data, but A i and A t are in the same cluster, i.e., A i A j, A i A t. If the data owner wants to hide these data patterns, what can we do?

CHANGE CLUSTER MEMBERSHIP Remember: Record A i is in cluster j, if j=arg max H it, t=1,,m Method: To hide A i A j, Adjust the locations of the maximum values of H i and H j, and make them in the same column Method: To hide A i A t, adjust the positions of the maximum values of H i and H t, so that they are in different columns

MAXIMUM AND MINIMUM EXCHANGE In original data, data x is in cluster j, we want to hide this information H x =(H x1,, H xi,, H xj,,h xm ) Obviously, H xj >= H xt, t<>j We assume that H xi <= H xt, t<>i

INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume

INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume If x y, i.e., x and y are not in same cluster (IdX max IdY max ), and this information should be hidden

INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume If x y, i.e., x and yare in same the cluster, (IdX max =IdY max ), this information should be hidden 1 t k, t IdX max

ALL EXCHANGE METHOD For records x and y Assume Modify H x and H y to be

EXAMPLE After NMF, we have H 50 H 80 (The largest coefficients 2.8354 and 2.6134 are in the 2 nd row) To hide H 50 H 80, modify H 80

PRACTICAL PROBLEMS The clustering from NMF is not accurate Membership exchange based on NMF may not be accurate However, we know the correct clustering results, we can modify data until the desired membership changes are achieved We may incorporate the clustering information into the NMF process

ANY QUESTION?