Faloutsos, Tong ICDE, PDF Free Download

Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity Part 4: Case Studies Copyright: Faloutsos, Tong (29) 2-2 Outline: Part 2 Matrix Tools SVD, PC HITS, PageRank Co-clustering Tensor Tools Copyright: Faloutsos, Tong (29) 2-3 1

Examples of Matrices Example/Intuition: Documents and terms Find patterns, groups, concepts Paper#1 Paper#2 Paper#3 Paper#4... data mining classif. tree... 13 11 22 55... 5 4 6 7................................................ Copyright: Faloutsos, Tong (29) 2-4 Singular Value Decomposition (SVD) X UΣV T X U Σ V T σ 1 v 1 x (1) x (2) x (M) u 1 u 2 u k. σ 2. v 2 σ k v k singular values right singular vectors input data left singular vectors Copyright: Faloutsos, Tong (29) 2-5 SVD as spectral decomposition n n σ 1 u 1 v 1 σ 2 u 2 v 2 m Σ m + V T U Best rank-k approximation in L2 and Frobenius SVD only works for static matrices (a single 2 nd order tensor) See also PRFC Copyright: Faloutsos, Tong (29) 2-6 2

Vector outer product intuition: owner age 2; 3; 4 car type VW Volvo BMW 2; 3; 4 VW Volvo BMW 2-d histogram 1-d histograms + independence assumption Copyright: Faloutsos, Tong (29) 2-7 SVD - Example U Σ V T - example: CS MD lung data infṛetrieval brain 1 1 1 18.18 2 2 2.36 1 1 1.18 5 5 5.9 2 2.53 3 3.8 1 1.27 x 9.64 5.29 x.58.58.58.71.71 Copyright: Faloutsos, Tong (29) 2-8 CS MD SVD - Example U Σ V T - example: data infṛetrieval CS-concept brain lung MD-concept 1 1 1 18.18 2 2 2.36 1 1 1.18 9.64 5 5 5 x x.9 5.29 2 2.53 3 3.8.58.58.58 1 1.27.71.71 Copyright: Faloutsos, Tong (29) 2-9 3

CS MD SVD - Example U Σ V T - example: doc-to-concept similarity matrix data infṛetrieval CS-concept brain lung MD-concept 1 1 1 18.18 2 2 2.36 1 1 1.18 9.64 5 5 5 x x.9 5.29 2 2.53 3 3.8.58.58.58 1 1.27.71.71 Copyright: Faloutsos, Tong (29) 2-1 CS MD SVD - Example U Σ V T - example: data infṛetrieval brain lung strength of CS-concept 1 1 1 18.18 2 2 2.36 1 1 1.18 9.64 5 5 5 x x.9 5.29 2 2.53 3 3.8.58.58.58 1 1.27.71.71 Copyright: Faloutsos, Tong (29) 2-11 CS MD SVD - Example U Σ V T - example: lung data infṛetrieval brain 1 1 1 18 2 2 2 1 1 1 5 5 5 2 2 3 3 1 1.18.36.18.9.53.8.27 CS-concept x term-to-concept similarity matrix 9.64 5.29 x.58.58.58.71.71 Copyright: Faloutsos, Tong (29) 2-12 4

CS MD SVD - Example U Σ V T - example: lung data infṛetrieval brain 1 1 1 18 2 2 2 1 1 1 5 5 5 2 2 3 3 1 1.18.36.18.9.53.8.27 CS-concept x term-to-concept similarity matrix 9.64 5.29 x.58.58.58.71.71 Copyright: Faloutsos, Tong (29) 2-13 SVD - Interpretation documents, terms and concepts : Q: if is the document-to-term matrix, what is T? : term-to-term t t ([m x m]) similarity il it matrix Q: T? : document-to-document ([n x n]) similarity matrix Copyright: Faloutsos, Tong (29) 2-14 SVD properties V are the eigenvectors of the covariance matrix T U are the eigenvectors of the Gram (innerproduct) matrix T Further reading: 1. Ian T. Jolliffe, Principal Component nalysis (2 nd ed), Springer, 22. Copyright: Faloutsos, Tong (29) 2. Gilbert Strang, Linear lgebra and Its pplications (4 th ed), Brooks Cole, 25. 2-15 5

Principal Component nalysis (PC) SVD n k R kr n V T kr m m U Σ Loading PCs PC is an important application of SVD Note that U and V are dense and may have negative entries Copyright: Faloutsos, Tong (29) 2-16 PC interpretation best axis to project on: ( best min sum of squares of projection errors) Term2 ( lung ) Term1 ( data ) 2-17 PC - interpretation U Σ V T Term2 ( retrieval ) PC projects points Onto the best axis first singular vector v1 minimum RMS error Term1 ( data ) Copyright: Faloutsos, Tong (29) 2-18 6

Outline: Part 2 Matrix Tools SVD, PC HITS, PageRank Co-clustering Tensor Tools Copyright: Faloutsos, Tong (29) 2-19 Kleinberg s algorithm HITS Problem dfn: given the web and a query find the most authoritative web pages for this query Step : find all pages containing the query terms Step 1: expand by one move forward and backward Further reading: Copyright: Faloutsos, Tong (29) 2-2 1. J. Kleinberg. uthoritative sources in a hyperlinked environment. SOD 1998 Kleinberg s algorithm HITS Step 1: expand by one move forward and backward Copyright: Faloutsos, Tong (29) 2-21 7

Kleinberg s algorithm HITS on the resulting graph, give high score ( authorities ) to nodes that many important nodes point to give high importance score ( hubs )to nodes that point to good authorities hubs authorities Copyright: Faloutsos, Tong (29) 2-22 Kleinberg s algorithm HITS observations recursive definition! each node (say, i -th node) has both an authoritativeness score a i and a hubness score h i Copyright: Faloutsos, Tong (29) 2-23 Kleinberg s algorithm: HITS Let be the adjacency matrix: the (i,j) entry is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the hubness and authoritativiness scores. Then: Copyright: Faloutsos, Tong (29) 2-24 8

Kleinberg s algorithm: HITS Then: k l m i a i h k + h l + h m that is a i Sum (h j ) over all j that t (j,i) edge exists or a T h Copyright: Faloutsos, Tong (29) 2-25 i Kleinberg s algorithm: HITS n p q symmetrically, for the hubness : h i a n + a p + a q that is h i Sum (q j ) over all j that t (i,j) edge exists or h a Copyright: Faloutsos, Tong (29) 2-26 Kleinberg s algorithm: HITS In conclusion, we want vectors h and a such that: h a a T h That is: a T a Copyright: Faloutsos, Tong (29) 2-27 9

Kleinberg s algorithm: HITS a is a right singular vector of the adjacency matrix (by dfn!), a.k.a the eigenvector of T Starting from random a and iterating, we ll eventually converge Q: to which of all the eigenvectors? why? : to the one of the strongest eigenvalue, ( T ) k a λ 1 k a Copyright: Faloutsos, Tong (29) 2-28 Kleinberg s algorithm - discussion authority score can be used to find similar pages (how?) closely related to citation analysis, social networks / small world phenomena See also TOPHITS Copyright: Faloutsos, Tong (29) 2-29 Motivating problem: PageRank Given a directed graph, find its most interesting/central node node is important, if it is connected with important nodes (recursive, but OK!) Copyright: Faloutsos, Tong (29) 2-3 1

Motivating problem PageRank solution Given a directed graph, find its most interesting/central node Proposed solution: Random walk; spot most popular node (-> steady state prob. (ssp)) node has high ssp, if it is connected with high ssp nodes (recursive, but OK!) Copyright: Faloutsos, Tong (29) 2-31 (Simplified) PageRank algorithm 1 Let be the transition matrix ( adjacency matrix); let B be the transpose, column-normalized - then 4 2 5 From To 3 B 1 1 1 1/2 1/2 1/2 1/2 p1 p2 p3 p4 p5 p1 p2 p3 p4 p5 Copyright: Faloutsos, Tong (29) 2-32 (Simplified) PageRank algorithm B p p B p p 1 4 2 5 3 1 1 1 1/2 1/2 1/2 1/2 p1 p2 p3 p4 p5 p1 p2 p3 p4 p5 Copyright: Faloutsos, Tong (29) 2-33 11

(Simplified) PageRank algorithm B p 1 * p thus, p is the eigenvector that corresponds to the highest eigenvalue (1, since the matrix is column-normalized) normalized) Why does such a p exist? p exists if B is nxn, nonnegative, irreducible [Perron Frobenius theorem] Copyright: Faloutsos, Tong (29) 2-34 (Simplified) PageRank algorithm In short: imagine a particle randomly moving along the edges compute its steady-state probabilities (ssp) Full version of algo: with occasional random jumps Why? To make the matrix irreducible Copyright: Faloutsos, Tong (29) 2-35 Full lgorithm With probability 1-c, fly-out to a random node Then, we have p c B p +(1-c)/n 1 > p (1-c)/n [I - c B] -1 1 Copyright: Faloutsos, Tong (29) 2-36 12

Outline: Part 2 Matrix Tools SVD, PC HITS, PageRank Co-clustering Tensor Tools Copyright: Faloutsos, Tong (29) 2-37 Co-clustering Given data matrix and the number of row and column groups k and l Simultaneously Cluster rows of p(x, Y) into k disjoint groups Cluster columns of p(x, Y) into l disjoint groups Copyright: Faloutsos, Tong (29) 2-38 Co-clustering Let X and Y be discrete random variables X and Y take values in {1, 2,, m} and {1, 2,, n} p(x, Y) denotes the joint probability distribution if not known, it is often estimated based on co-occurrence data pplication areas: text mining, market-basket analysis, analysis of browsing behavior, etc. Key Obstacles in Clustering Contingency Tables High Dimensionality, Sparsity, Noise Need for robust and scalable algorithms Reference: Copyright: Faloutsos, Tong (29) 2-39 1. Dhillon et al. Information-Theoretic Co-clustering, KDD 3 13

.5.5 m k.5.5.5.5 m l 3.36.36 k.3 l.2.2.5.5.5.5.5.5.4.4 n.4.4.5.5.5.5.5.5.4.4.4.4.4.4 n..28 [ ].28.36.36 eg, terms x documents.54.54.42.54.54.42.42.54.54.42.54.54.36.36 28.28.36.36.36.36.28.28.36.36 Copyright: Faloutsos, Tong (29) 2-4 med. doc cs doc term group x doc. group.5.5.5.5.5.5 3.3.2.2.5.5.4.4.5.5.5.5.4.4.4.4.4.5.5.5.5.5.5.4.4.4..36.36.28 [ ].28.36.36 doc x doc group med. terms cs terms common terms.54.54.42.54.54.42.42.54.54.42.54.54.36.36 28.28.36.36.36.36.28.28.36.36 term x term-group Copyright: Faloutsos, Tong (29) 2-41 Co-clustering Observations uses KL divergence, instead of L2 the middle matrix is not diagonal we ll see that again in the Tucker tensor decomposition Copyright: Faloutsos, Tong (29) 2-42 14

Outline: Part 2 Matrix Tools Tensor Tools Tensor Basics Tucker Tucker 1 Tucker 2 Tucker 3 PRFC Incrementalization Copyright: Faloutsos, Tong (29) 2-43 Tensor Basics Copyright: Faloutsos, Tong (29) 2-44 Reminder: SVD n n m m Σ V T U Best rank-k approximation in L2 See also PRFC Copyright: Faloutsos, Tong (29) 2-45 15

Reminder: SVD n σ 1 u 1 v 1 σ 2 u 2 v 2 + m + Best rank-k approximation in L2 See also PRFC Copyright: Faloutsos, Tong (29) 2-46 Goal: extension to >3 modes Ix R Jx R ¼ B R x R x R + + Copyright: Faloutsos, Tong (29) 2-47 Main points: 2 major types of tensor decompositions: PRFC and Tucker both can be solved with ``alternating least squares (LS) Details follow we start with terminology: Copyright: Faloutsos, Tong (29) 2-48 16

I Faloutsos, Tong ICDE, 29 [T. Kolda, 7] tensor is a multidimensional array n tensor Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers X 1,1,1 x ijk J 3 rd order tensor mode 1 has dimension I mode 2 has dimension J mode 3 has dimension K Note: Tutorial focus is on 3 rd order, but everything can be extended to higher orders. Horizontal Slices Lateral Slices Frontal Slices Copyright: Faloutsos, Tong (29) 2-49 [T. Kolda, 7] Matricization: Converting a Tensor to a Matrix Matricize (unfolding) (i,j,k) (i,j ) X (n) : The mode-n fibers are rearranged to be the columns of a matrix Reverse Matricize (i,j ) (i,j,k) 5 7 1 3 6 8 2 4 2-5 Tensor Mode-n Multiplication Tensor Times Matrix Tensor Times Vector Multiply each row (mode-2) fiber by B Compute the dot product of a and each column (mode-1) fiber Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-51 17

Pictorial View of Mode-n Matrix Multiplication Mode-2 multiplication (lateral slices) Mode-1 multiplication (frontal slices) Mode-3 multiplication (horizontal slices) Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-52 Mode-n product Example Tensor times a matrix Location Time Time Clusters Locatio ntime Clusters Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-53 Mode-n product Example Tensor times a vector Location Time Time Time n Locatio Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-54 18

Outer, Kronecker, & Khatri-Rao Products 3-Way Outer Product Review: Matrix Kronecker Product M x N P x Q MP x NQ Rank-1 Tensor Matrix Khatri-Rao Product M x R N x R MN x R Observe: For two vectors a and b, a ± b and a b have the same elements, but one is shaped into a matrix and the other into a vector. Copyright: Faloutsos, Tong (29) [T. Kolda, 7] 2-55 Specially Structured Tensors Copyright: Faloutsos, Tong (29) 2-56 Specially Structured Tensors Tucker Tensor Kruskal Tensor Our Notation o Our Notation core Ix R Jx S w 1 Ix R w R Jx R U V R x S x T u 1 v 1 U + + V R x R x R u R v R Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-57 19

Specially Structured Tensors Tucker Tensor Kruskal Tensor In matrix form: In matrix form: Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-58 Outline: Part 2 Matrix Tools Tensor Tools Tensor Basics Tucker Tucker 1 Tucker 2 Tucker 3 PRFC Incrementalization Copyright: Faloutsos, Tong (29) 2-59 Tensor Decompositions Copyright: Faloutsos, Tong (29) 2-6 2

Tucker Decomposition - intuition Ix R Jx S ~ B R x S x T author x keyword x conference : author x author-group B: keyword x keyword-group C: conf. x conf-group G: how groups relate to each other Copyright: Faloutsos, Tong (29) 2-61 term group x doc. group. 5 3. 5.3.5.2. 2.5.5. 5.5.5.5.5.5.5.4.4.4.4.5.5.5.5.5.5.4.4.4.4.4.4..36.36.28 [ ].28.36. 36 doc x doc group med. terms cs terms common terms.54.54.42.54.54.42.42.54.54.42.54.54.36.36 28.28.36.36.36.36.28.28.36. 36 term x term-group Copyright: Faloutsos, Tong (29) 2-62 Tucker Decomposition Ix R Jx S ~ B R x S x T Given, B, C, the optimal core is: Proposed by Tucker (1966) K: Three-mode factor analysis, three-mode PC, orthogonal array decomposition, B, and C generally assumed to be orthonormal (generally assume they have full column rank) is not diagonal Not unique Recall the equations for converting a tensor to a matrix Copyright: Faloutsos, Tong (29) 2-63 21

Tucker Variations See Kroonenberg & De Leeuw, Psychometrika,198 for discussion. Tucker2 Identity Matrix Ix R Jx S ~ B R x S x K Tucker1 Ix R ~ R x J x K Finding principal components in only mode 1 can be solved via rank-r matrix SVD Copyright: Faloutsos, Tong (29) 2-64 Solving for Tucker Given, B, C orthonormal, the optimal core is: Tensor norm is the square root of the sum of all the elements squared Eliminate the core to get: Ix R Jx S ~ B R x S x T Minimize s.t.,b,c orthonormal fixed maximize this If B & C are fixed, then we can solve for as follows: Optimal is R left leading singular vectors for 2-65 Higher Order SVD (HO-SVD) Ix R Jx S ~ B R x S x T Not optimal, but often used to initialize Tucker- LS algorithm. (Observe connection to Tucker1) De Lathauwer, De Moor, & Vandewalle, SIMX, 198 2-66 22

Tucker-lternating Least Squares (LS) Successively solve for each component (,B,C). Ix R Jx S B R x S x T Initialize Choose R, S, T Calculate, B, C via HO-SVD Until converged do R leading left singular vectors of X (1) (C B) B S leading left singular vectors of X (2) (C ) C T leading left singular vectors of X (3) (B ) Solve for core: Kroonenberg & De Leeuw, Psychometrika, 198 2-67 Tucker in Not Unique Ix R Jx S ~ B R x S x T Tucker decomposition is not unique. Let Y be an RxR orthogonal matrix. Then Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-68 Outline: Part 2 Matrix Tools Tensor Tools Tensor Basics Tucker Tucker 1 Tucker 2 Tucker 3 PRFC Copyright: Faloutsos, Tong (29) 2-69 23

CNDECOMP/PRFC Decomposition Ix R Jx R ~ B R x R x R + + CNDECOMP Canonical Decomposition (Carroll & Chang, 197) PRFC Parallel Factors (Harshman, 197) Core is diagonal (specified by the vector λ) Columns of, B, and C are not orthonormal If R is minimal, then R is called the rank of the tensor (Kruskal 1977) Can have rank ( ) > min{i,j,k} 2-7 PRFC-lternating Least Squares (LS) Successively solve for each component (,B,C). + + Khatri-Rao Product (column-wise Kronecker product) Find all the vectors in one mode at a time Hadamard Product If C, B, and Λ are fixed, the optimal is given by: Repeat for B,C, etc. Copyright: [T. Kolda, 7] Faloutsos, Tong (29) 2-71 PRFC is often unique c 1 a 1 b 1 + + c R a R b R ssume PRFC decomposition is exact. Sufficient condition for uniqueness (Kruskal, 1977): k k-rank of max number k such that every set of k columns of is linearly independent Copyright: Faloutsos, Tong (29) 2-72 24

Tucker vs. PRFC Decompositions Tucker Variable transformation in each mode Core G may be dense, B, C generally orthonormal Not unique PRFC Sum of rank-1 components No core, i.e., superdiagonal core, B, C may have linearly dependent columns Generally unique Ix R Jx S c 1 c R ¼ B R x S x T ¼ a 1 b 1 + + a R b R 2-73 Tensor tools - summary Two main tools PRFC Tucker Both find row-, column-, tube-groups but in PRFC the three groups are identical To solve: lternating Least Squares Copyright: Faloutsos, Tong (29) 2-74 Tensor tools - resources Toolbox: from Tamara Kolda: csmr.ca.sandia.gov/~tgkolda/tensortoolbox/ T. G. Kolda and B. W. Bader. Tensor Decompositions and pplications. SIM Review, to appear (accepted June 28) csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfil es/tensorreview-preprint.pdf T. Kolda and J. Sun: Scalable Tensor Decomposition for Multi-spect Data Mining (ICDM 28) Copyright: Faloutsos, Tong (29) 2-75 25

Faloutsos, Tong ICDE, 2009