Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Similar documents
Jun Zhang Department of Computer Science University of Kentucky

Dimension Reduction and Iterative Consensus Clustering

Introduction to Data Mining

Faloutsos, Tong ICDE, 2009

The Singular Value Decomposition

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Properties of Matrices and Operations on Matrices

PCA, Kernel PCA, ICA

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

CS47300: Web Information Search and Management

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Foundations of Computer Vision

Parallel Singular Value Decomposition. Jiaxing Tan

STA141C: Big Data & High Performance Statistical Computing

Jun Zhang Department of Computer Science University of Kentucky

EECS 275 Matrix Computation

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Singular Value Decomposition and Polar Form

Introduction to Information Retrieval

Principal Component Analysis

Singular Value Decomposition and Polar Form

Introduction to Machine Learning

STA141C: Big Data & High Performance Statistical Computing

Eigenvalues and diagonalization

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Clustering. SVD and NMF

Lecture 02 Linear Algebra Basics

Preserving Privacy in Data Mining using Data Distortion Approach

Computational Methods. Eigenvalues and Singular Values

Preprocessing & dimensionality reduction

Information Retrieval

Manning & Schuetze, FSNLP, (c)

Summary of Week 9 B = then A A =

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Linear Algebra (Review) Volker Tresp 2017

Chapter XII: Data Pre and Post Processing

Linear Algebra - Part II

Parallel Numerical Algorithms

PV211: Introduction to Information Retrieval

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Dimensionality Reduction

1 Singular Value Decomposition and Principal Component

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Image Registration Lecture 2: Vectors and Matrices

Study Notes on Matrices & Determinants for GATE 2017

Singular value decomposition

The Singular Value Decomposition

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Linear Algebra (Review) Volker Tresp 2018

Functional Analysis Review

Singular Value Decomposition

CS 572: Information Retrieval

Quick Introduction to Nonnegative Matrix Factorization

2.3. Clustering or vector quantization 57

Singular Value Decomposition

Manning & Schuetze, FSNLP (c) 1999,2000

Linear Algebra Methods for Data Mining

Review problems for MA 54, Fall 2004.

COMP 558 lecture 18 Nov. 15, 2010

UNIT 6: The singular value decomposition.

Deep Learning Book Notes Chapter 2: Linear Algebra

Machine Learning (BSMC-GA 4439) Wenke Liu

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Chapter 3 Transformations

Probabilistic Latent Semantic Analysis

Linear Algebra. Session 12

Linear Methods for Regression. Lijun Zhang

Privacy-Preserving Data Mining

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

14 Singular Value Decomposition

MATH 350: Introduction to Computational Mathematics

CS60021: Scalable Data Mining. Dimensionality Reduction

IV. Matrix Approximation using Least-Squares

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

CS 143 Linear Algebra Review

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

7. Symmetric Matrices and Quadratic Forms

Lecture 8: Linear Algebra Background

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Lecture: Face Recognition and Feature Reduction

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Spectral Clustering. by HU Pili. June 16, 2013

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

1 Non-negative Matrix Factorization (NMF)

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Non-negative matrix factorization with fixed row and column sums

Machine learning for pervasive systems Classification in high-dimensional spaces

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Non-Negative Matrix Factorization

Matrices, Vector Spaces, and Information Retrieval

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Knowledge Discovery and Data Mining 1 (VO) ( )

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

EUSIPCO

Machine Learning - MT & 14. PCA and MDS

Transcription:

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

OUTLINE Why We Need Matrix Decomposition SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

A TYPICAL TERM-BY-DOCUMENT MATRIX 1. All entries are nonnegative 2. Most entries are zeros 3. Large dimensions 4. Disorganized 5. Lots of noise

A SUPERMARKET TRANSCATION MATRIX 1. All entries are nonnegative 2. Most entries are zeros 3. Large dimensions 4. Disorganized 5. Lots of noise

WHY WE NEED MATRIX DECOMPOSITION? Compact representation of data in the form of matrix Original matrix == Factor matrix * * Factor matrix Original matrix: sparse, no ordered Factor matrix: compact, ordered. Easy to find hidden relationships in data, e.g., orthogonal, correlation, etc.

COMPACT REPRESENTATION OF ORIGINAL DATA Column clustering 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x Row clustering 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

REDUCE 2-D DATA TO 1-D DATA 1-D data 2-D data Reference:Faloutsos et. al., Large Graph Mining, KDD09

OUTLINE Why We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

SINGULAR VALUE DECOMPOSITION(SVD) A [n x m] = U [n x r] r x r] (V [m x r] ) T A: n * m matrix (E.g., n documents*m words, or n pages*m links) U: n x r matrix (e.g., n documents, r topics) : r x r diagonal matrix (strength of each topics) (r is rank of matrix A), Sometimes the diagonal matrix is denoted as V: m x r matrix (e.g., m words, r topics) P1-9

SVD A = U V T -example:

Gene H. Golub (February 29, 1932 November 16, 2007) American Mathematician and Computer Scientist 11

SVD - PROPERTIES Theorem [Press,92]: Any numerical matrix A can be decomposed in the form of A = U V T, U, V: unique (*) U, V: column orthogonal (i.e., Any column vectors of U and V matrices have unit norm, and they are mutually orthogonal) U T U = I; V T V = I (I: identity matrix) : diagonal matrix, diagonal entries are nonnegative, and in descending order

SVD EXAMPLE A = U V T -example: Eng Med data infṛetrieval brain lung 0.18 0 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD EXAMPLE A = U V T -example: data infṛetrieval Eng Topics Med Topics Eng Med 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = brain lung 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

Faloutsos, Miller, Tsourakakis KDD'09 SVD EXAMPLE A = U V T -example: Document-to-Topics Similarity Matrix data infṛetrieval Eng Topics Med Topics Eng Med 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = brain lung 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 P1-15

SVD EXAMPLE A = U V T -example: data infṛetrieval 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 brain lung 0.18 0 Strength of Eng Topics Eng Med = 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD EXAMPLE A = U V T -example: Eng Med data infṛetrieval brain lung 0.18 0 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x Word-to-Topics Similarity Matrix 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD PROPERTIES Documents, Words and Concepts /Topics : U: Document-to-Topic Similarity Matrix V: Word-to-Topic Similarity Matrix : Strength of Every Topics

SVD PROPERTIES Documents, Words and Topics : Q: If A is document-to-word similarity matrix, then what can be said about A T A? A: Q: How about AA T? A:

SVD PROPERTIES Documents, Words and Topics : Q: If A is document-to-word similarity matrix, what can be said about A T A? A: Word-to-word similarity matrix Q: How about AA T? A: Document-to-document similarity matrix

PROPERTIES OF SVD The columns of V are the eigenvectors of the covariance matrix of A T A

PROPERTIES OF SVD The columns of V are the eigenvectors of the covariance matrix of A T A

PROPERTIES OF SVD The columns of U are the eigenvectors of the inner-product matrix of AA T

PROPERTIES OF SVD The columns of U are the eigenvectors of the inner-product matrix of AA T

PROPERTIES OF SVD SVD: best Projection coordinates First eigenvector v1 Best :min sum of squares of projection errors

SVD DIMENSION REDUCTION Original matrix

SVD DIMENSION REDUCTION A = U V T 分解 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x v 1 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD REDUCTION A = U V T : 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 v 1 covariance of coordinate x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION A = U V T : U :The value of the data projected onto the projection axis 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION Remove small singular values and the corresponding singular vectors (setting them to zero): 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION Why is it called dimension reduction Original matrix: rank 2 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 0.18 0 0.36 0 0.18 0 0.90 0 0 0.53 0 0.80 0 0.27 x 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71

SVD DIMENSION REDUCTION Why is it called dimension reduction? Modified data: rank 1 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0.18 0.36 0.18 0.90 0 0 0 x 9.64 x 0.58 0.58 0.58 0 0

SVD 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = x x u 1 u 2 1 2 v 1 v 2

SVD 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = x x u 1 u 2 1 2 v 1 v 2 = 1 u 1 v T 1 + 2 u 2 v T 2 +...

SVD n m 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 r topics = 1 u 1 v T 1 + 2 u 2 v T 2 +... n x 1 1 x m

SVD Data approximation/dimension reduction n m 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 1 u 1 v T 1 + 2 u 2 v T 2 +... 1 >= 2 >=...

SVD A k = U k V T k Or, m n 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 = 1 u 1 v T u 1 +... + k 1 >= 2 >=... v T k

SVD A k = U k V T k n or, m 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 5 5 5 0 0 0 0 0 2 2 0 0 0 3 3 0 0 0 1 1 Eckart-Young-Misky Theorem: A k is the best rank-k matrix that minimizes A k A F = 1 u 1 v T u 1 +... + k 1 >= 2 >=... v T k

TRUNCATED SVD

OUTLINE Why Do We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

NONNEGATIVE MATRIX FACTORIZATION (NMF) Given a nonnegative matrix V, decompose it into the product or two (or more) nonnegative matrices W and H. V = n x m W = n x r H = r x m V WH (n+m)r < nm, original matrix is compressed/rank reduced

DIFFERENCE BETWEEN NMF AND SVD There is no negative value in NMF. NMF is additive combinations, and can be easily understood and linked to physical meanings SVD is unique, NMF is not unique. The nonuniqueness of NMF is both advantageous and disadvantageous Advantages: Better for privacy protection Disadvantages: How to find the optimal solution?

OBJECTIVE FUNCTIONS Quality of NMF:

FACTORIZATION:ITERATIVE UPDATES (OBJECTIVE FUNCTION 1) The following iterative updates guarantee 1) nonnegativity; 2)Elements of W and H do not increase

FACTORIZATION: ITERATIVE UPDATES (OBJECTIVE FUNCTION 2) The following iterative updates guarantee 1)Nonnegativity; 2)Elements of W and H doe not increase

INITIALIZATION OF NMF The final nonnegative matrices W and H depend on the initial choices of W and H. Different initial values will result in different NMF, even the iterative update rules are the same. (How to optimize the initial matrices, can use SVD approximations)

PROPERTIES OF NMF The final nonnegative matrices W and H depend on the initial choices of W and H. Differential initial values will result in different NMF, even the iterative update rules are the same. The update rules of NMF can only guarantee to converge to a local optimum. Why?

WHY ONLY LOCAL OPTIMUM The solution space of W is a convex set, that of H is also a convex set But the solution space of WH may not be a convex set There does not seem to have global optimum for an optimization problem on a non-convex set

NMF EXAMPLE

OUTLINE Why Do We Need Matrix Decomposition? SVD (Singular Value Decomposition) NMF (Nonnegative Matrix Factorization) Applications in Privacy-Preserving Data Mining

DATA VALUE PERTURBATION SVD or NMF Perturbation

Objective: Balance privacy preservation and data utility

NMF DATA PERTURBATION

EXPERIMENTAL RESULTS OF NMF DATA PERTURBATION Upper left: Original data (3 clusters). Upper right: NMF perturbed data (large perturbation, good clusters). Lower left: Additive noise with Gauss distribution. Lower right: Additive noise with normal distribution (small perturbation, bad clusters)

SUPPORT VECTOR MACHINE CLASSIFICATION Top: SVM with original data (98% correct rate) Middle: SVM with NMF perturbed data (98% correct rate) Bottom: SVM with normal distribution noise added data (54% correct rate)

SVD DATA PERTURBATION

EXPERIMENTAL RESULTS (COMPLEXITY)

DATA PATTERN HIDING Data pattern: Records A and B are in the same cluster In original data, if A and B are in the same cluster, then A B, otherwise A B In privacy-preserving data mining, sometimes, data owner does not want to disclose the same cluster relationship (or not same cluster relationship)

EXAMPLE

METHOD Perform MNF on A (n*m): A WH W(n*r):Cluster basis: Assume there are r clusters H(r*m):coefficients for clusters Record A i is in cluster j,if j=arg max H it, t=1,,m

METHOD Perform NMF on A(n*m): A WH W(n*r): Cluster basis, assume r clusters H(r*m):Cluster coefficients Record A i is in cluster j, if j=arg max H it, t=1,,m Assume that A i and A j are in different clusters in the original data, but A i and A t are in the same cluster, i.e., A i A j, A i A t.

CHANGE CLUSTER MEMBERSHIP Assume that A i and A j are in different clusters in the original data, but A i and A t are in the same cluster, i.e., A i A j, A i A t. If the data owner wants to hide these data patterns, what can we do?

CHANGE CLUSTER MEMBERSHIP Remember: Record A i is in cluster j, if j=arg max H it, t=1,,m Method: To hide A i A j, Adjust the locations of the maximum values of H i and H j, and make them in the same column Method: To hide A i A t, adjust the positions of the maximum values of H i and H t, so that they are in different columns

MAXIMUM AND MINIMUM EXCHANGE In original data, data x is in cluster j, we want to hide this information H x =(H x1,, H xi,, H xj,,h xm ) Obviously, H xj >= H xt, t<>j We assume that H xi <= H xt, t<>i

MAXIMUM AND MINIMUM EXCHANGE In original data, data x is in cluster j, we want to hide this information H x =(H x1,, H xi,, H xj,,h xm ) Obviously, H xj >= H xt, t<>j We assume that H xi <= H xt, t<>i The modified data is H * x=(h x1,, H xj,, H xi,,h xm )

INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume

INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume If x y, i.e., x and y are not in same cluster (IdX max IdY max ), and this information should be hidden

INDEX EXCHANGE METHOD If we have records x and y, after NMF Assume If x y, i.e., x and yare in same the cluster, (IdX max =IdY max ), this information should be hidden 1 t k, t IdX max

ALL EXCHANGE METHOD For records x and y Assume Modify H x and H y to be

EXAMPLE After NMF, we have H 50 H 80 (The largest coefficients 2.8354 and 2.6134 are in the 2 nd row) To hide H 50 H 80, modify H 80

PRACTICAL PROBLEMS The clustering from NMF is not accurate Membership exchange based on NMF may not be accurate However, we know the correct clustering results, we can modify data until the desired membership changes are achieved We may incorporate the clustering information into the NMF process

ANY QUESTION?