Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November 11, 2010 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 1

Outline 1 Problem Description 2 Preliminaries 3 Distances on Undirected Graphs 4 From Distances to Embeddings 5 Distances on Directed Graphs Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 2

Benefits of Dimension Reduction High-dimensional data, i.e., a great many measurements taken on each member of a set of objects are now ubiquitous. Bellman s curse of dimensionality [Bellman(1957)] refers to the problems caused by the exponential increase in volume of a mathematical space as additional dimensions are added. A list of problems might include, Slow convergence of statistical estimators, e.g., density estimators. Overfitting of models to noise. Difficulties in performing exploratory data analysis. Nearest-neighbour searches are inefficient. Dimension reduction is the process of replacing a multivariate data set with a data set of lower dimension. Classical approaches to dimension reduction involve the use of principal component analysis (PCA) [Pearson(1901), Hotelling(1933)] or classical multidimensional scaling [Torgersen(1952), Gower(1966)]. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 3

Problem Description We consider the problem of constructing a low-dimensional Euclidean representation of data described by pairwise similarities. The low-dimensional representation can serve as the basis for other exploitation tasks, e.g., visualization, clustering, or classification. Our basic strategy is: 1 Transform the similarities into some notion of dissimilarities; 2 Embed the derived dissimilarities. Our concerns are closely related to the concerns of manifold learning. Various manifold learning techniques can be interpreted as transformations from similarities to dissimilarities. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 4

A Motivating Example: MNIST Dataset 1200 images of digits (4 or 5) from MNIST [LeCun et al.(1998)]. Each image is viewed as a point in R 28 28. How should we measure the proximity of a pair of images? δ ij = x i x j γ ij = exp( x i x j 2 σ 2 ) PC2 1000 500 0 500 1000 Digit 4 Digit 5 5 0 5 Digit 4 Digit 5 1000 500 0 500 1000 6 4 2 0 2 4 6 PC1 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 5

From Similarities to Graphs Isomap [Tenenbaum et al.(2000)] employed two commonly used approach to graph construction. In both approaches, vertices correspond to feature vectors. 1 ɛ-neighborhood approach. v i v j iff x i x j < ɛ. 2 K-NN approach. Connect v i v j iff x j is a K nearest neighbor of x i. This graph is directed, but in practice it is often symmetrized. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 6

Distances on Graphs Given a n n similarity matrix Γ = (γ ij ): 1 Transform the similarities to dissimilarities. (Isomap starts with distances in an ambient input space.) (a) Construct a weighted graph G = (V, E, ω) with n vertices and edge weights ω ij = γ ij. (b) Choose a suitable measure of dissimilarity (typically a distance) on G. Let denote the pairwise dissimilarities. 2 Embed. Several popular approaches that transform similarity to distance rely on the concept of a random walk. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 8

Random Walks on Graphs Let G = (V, E, ω) be an undirected graph. We define the transition matrix P = (p uv ) of a Markov chain with state space V as p uv = { ω({u,v}) deg(u) if u v 0 otherwise (1) Suppose that G is connected. Then the stationary distribution π of P exists and is unique. Furthermore, if G is connected and not bipartite, then lim k Pk = 1π T := Q (2) Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 9

Distance Geometry Definition (Euclidean Distance Matrix) Let = (δ ij ) be a n n dissimilarity matrix. is a Type-2 Euclidean distance matrix (EDM-2) if there exists n points x 1, x 2,..., x n R p for some p such that δ ij = x i x j 2. Let A and B be n n matrices. Define two linear transforms τ and κ by τ(a) = 1 2 (I 11T /n)a(i 11 T /n). κ(b) = B dg 11 T B B T + 11 T B dg There is an equivalence between EDM-2 and p.s.d matrices. Theorem ([Schoenberg(1935), Young and Householder(1938)]) is EDM-2 iff τ( ) is p.s.d. As a corollary, if B is p.s.d, then = κ(b) is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 10

Expected Commute Time Following [Kemeny and Snell(1960)], let Π = diag(π) and Z = (I P + Q) 1. The expected first passage times are given by and the expected commute times are M = (11 T Z dg Z)Π 1 ect = M + M T = κ(zπ 1 ) It turns out that ZΠ 1 0; hence ect is EDM-2. This result is well known; see [Trosset and Tang(2010)] for an elementary proof. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 12

Diffusion Distances Let e i and e j denote point masses at vertices v i and v j. After r time steps, under the random walk model with transition matrix P, these distributions have diffused to e T i P r and e T j P r. The diffusion distance [Coifman and Lafon(2006)] at time r between v i and v j is ρ r (v i, v j ) = e T i P r e T j P r 1/π where the inner product, 1/π is defined as u, v 1/π = k u(k)v(k)/π(k) It turns out that ρ 2 r = κ(p 2r Π 1 ) and P 2r Π 1 0; hence, ρ 2 r is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 13

Some Remarks on ECT and Diffusion Distances 1 ect can be written as ( ect = κ(zπ 1 ) = κ (P Q) t Π 1). Even though ( κ((p Q) t Π 1 ) = κ(p t Π 1 ) for any t, ) ect κ t=0 Pt Π 1 because t=0 Pt Π 1 does not necessarily converge. 2 ρ 2 t can be written as t=0 ρ 2 t = κ(p 2t Π 1 ) = κ ( (P Q) 2t Π 1) 3 Diffusion distance between v i and v j at time t takes into account only paths of length 2t while expected commute time takes into account paths of all lengths. In fact, expected commute time with respect to P 2 is the sum of diffusion distances for t = 0, 1,... with respect to P. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 14

General Framework for Euclidean Distances on Graphs We introduce a general family of Euclidean distances constructed from random walks on graphs. Let f be a real-valued function with a series expansion f (x) = a 0 + a 1 x + a 2 x 2 + and radius of convergence R 1. For a square matrix X, define f (X) by Theorem ([Tang(2010)]) f (X) = a 0 I + a 1 X + a 2 X 2 + Assume that P is irreducible and aperiodic. If f (x) 0 for x ( 1, 1), then ( = κ(f (P Q)Π 1 ) = κ (a 0 I+a 1 (P Q)+a 2 (P Q) 2 + )Π 1) (3) is well-defined and EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 15

Euclidean Distances on Graphs: Some Examples = κ(f (P Q)Π 1 ) = κ((a 0 I + (a 1 P Q) + a 2 (P Q) 2 + )Π 1 ) f (x) Comments 1 trivial notion of distance x 2r diffusion distance at time r 1/(1 x) expected commute time 1/(1 x) k for k 2 longer paths have higher weights log (1 x 2 ) longer paths have lower weights exp(x) heavily weights paths of short lengths Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 16

Embedding = κ(f (P Q)Π 1 ) in R d : Method 1 Embed by CMDS. 1 Compute B = τ( ) = (I 11 T /n)f (P Q)Π 1 (I 11 T /n) 2 Let λ 1 λ 2 λ n denote the eigenvalues of B and let v 1, v 2,..., v n denote the corresponding set of orthonormal eigenvectors. Then [ X d = λ1 v 1 λ 2 v 2 ] λ d v d produces a configuration of points in R d whose interpoint distances approximate. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 18

Example: Embedding ect by CMDS Let G be an undirected graph and L be the combinatorial Laplacian of G. It turns out that L, the Moore-Penrose pseudoinverse of L, is related to ZΠ 1 by L = c(i 11 T /n)zπ 1 (I 11 T /n) (4) where c is a constant. Therefore, ect = cκ(l ). Furthermore, τ( ect ) = cl. The d-dimensional embedding of ect is thus given by [ X d = c λ1 ν 1 λ 2 ν 2... ] λ d ν d (5) where λ 1 λ 2... and ν 1, ν 2,... are the eigenvalues and corresponding eigenvectors of L. The embedding in Eq. (5) is called a combinatorial Laplacian eigenmap. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 19

Embedding = κ(f (P Q)Π 1 ) in R d : Method 2 Embed by the eigenvalues and eigenvectors of P. 1 Let µ 1, µ 2,..., µ n 1 < 1 = µ n denote the eigenvalues of P, sorted so that f (µ i ) f (µ i+1 ). Let u 1, u 2,..., u n denote the corresponding set of eigenvectors, orthonormal with respect to the inner product u, v π = k u(k)v(k)π(k) 2 Then [ X d = f (µ1 )u 1 f (µ 2 )u 2 ] f (µ d )u d produces a configuration of points in R d whose interpoint distances approximate. By rescaling individual coordinates, the embedding from any one f can be transformed to the embedding for any other f. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 20

Comparing the Embeddings Method 1: Classical MDS 1 [ The embedding X = λ1 v 1 ] λ n 1 v n 1 recovers completely. 2 The embedding dimension of is almost surely n 1. 3 The best (least squares) d-dim[ representation of X is λ1 X d = v 1 ] λ d v d. Method 2: Eigensystem of P 1 The [ embedding X = f (µ1 )u 1 ] f (µ n 1 )u n 1 recovers completely. 2 The embedding dimension of is almost surely n 1. 3 The best d-dim representation of X is (usually) [ not f X d = (µ1 )u 1 ] f (µ d )u d Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 21

Normalized Laplacian Eigenmaps 1 Construct a graph G = (V, E, ω) with V = X. 2 Compute the eigenvalues λ and eigenvectors f of the generalized eigenvalue problem Lf = λdf (6) 3 Let λ 0 λ 1 λ n 1 be the eigenvalues of Eq. (6) and f 0, f 1,..., f n 1 be the corresponding eigenvectors. ( ) 4 Embed into R d by x i 1 1 λ1 f 1 (i), 1 λ2, f 2 (i),..., λd f d (i). Under our framework, steps 2 to 4 are equivalent to embedding ect using the eigenvalues and eigenvectors of P (Method 2). This is not equivalent to embedding ect using the eigensystem of L (Method 1). Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 22

Diffusion Maps [Coifman and Lafon(2006)] 1 Construct a graph G = (V, E, ω) with V = X. 2 Generate the transition matrix P of G. 3 Let λ 0 λ 1 λ n 1 be the eigenvalues of P and f 0, f 1,..., f n 1 be the corresponding eigenvectors. 4 Embed into R d by x i ( λ t 1 f 1(i), λ t 2 f 2(i),..., λ t mf d (i)). Recall that ρ 2 t = κ((p Q) 2t Π 1 ) is the matrix of diffusion distances. Under our framework, steps 2 to 4 are equivalent to embedding ρ 2 t using the eigenvalues and eigenvectors of P (Method 2). Normalized Laplacian eigenmaps and diffusion maps are thus coordinate rescalings of one another. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 23

Coordinates rescaling of embeddings 5 0 5 10 Digit 4 Digit 5 0.5 0.0 0.5 1.0 1.5 Digit 4 Digit 5 5 0 5 1.0 0.5 0.0 0.5 (a) Normalized Laplacian eigenmap (b) Diffusion map at time t = 10 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 24

Paths of Even Length & Diffusion Distances Y 0.6 0.8 1.0 1.2 1.4 Y 0.10 0.05 0.00 0.05 0.10 0 1 2 3 4 5 6 X 0.05 0.00 0.05 X (a)original Data (b)diffusion map at time t = 5 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 25

Expected Commute Time for Directed Graphs Analogous to the case of expected commute time on undirected graphs, let Π = diag(π) and Z = (I P + Q) 1. The expected first passage times for directed graphs are also given by M = (11 T diag(z) Z)Π 1 and the expected commute times are ect = M + M T = κ(zπ 1 ) = κ(h(zπ 1 )) where H(A) = 1 2 (A + AT ) is the Hermitian part of A. It turns out that H(ZΠ 1 ) 0; hence, ect for directed graphs is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 27

Diffusion Distances for Directed Graphs Let e i and e j denote point masses at vertices v i and v j. Analogous to the case of diffusion distance on undirected graphs, after r time steps, under the random walk model with transition matrix P, these distributions had diffused to e T i P r and e T j P r. The diffusion distance on directed graph at time r between v i and v j is ρ r (v i, v j ) = e T i P r e T j P r 1/π where the inner product, 1/π is defined as u, v 1/π = k u(k)v(k)/π(k) ρ 2 r = κ(p r Π 1 (P r ) T ); hence ρ 2 r for directed graphs is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 28

Distances on Directed Graphs: Some Comments 1 It is harder to derive a framework for Euclidean distances on directed graphs. If G is a directed graph and P is the transition matrix on G, then there exists a k 2 such that = κ((i P + Q) k Π 1 ) is not EDM-2. 2 Expected commute time under the random walk model with transition matrix P 2 is no longer the sum of the squared diffusion distances through all time scales. We can interpret this as saying that the symmetrization performed in constructing expected commute time is incompatible with the symmetrization performed in constructing diffusion distances. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 29

From Distances to Embeddings Let G be a directed graph with transition matrix P. Suppose that is a dissimilarity/distance matrix constructed by considering random walks on G. Consider the problem of embedding into Euclidean space. Embedding using CMDS is straightforward. Because the eigenvalues and eigenvectors of P are possibly complex-valued, embedding using the eigenvalues and eigenvectors of P might not be possible. The notions of the combinatorial Laplacian and normalized Laplacian need to be extended to directed graphs. This is usually accomplished by symmetrization. However, the decision of how to symmetrize is not obvious. For example, the following definition of the combinatorial Laplacian for directed graphs ([Chung(2005)]) L = Π ΠP + PT Π 2 does not generate expected commute time for directed graphs. (7) Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 30

Embedding Distances for Directed Graphs We construct the directed K-NN graph with K = 100 for our motivating example. We compute the matrix of expected commute times and the matrix of diffusion distances at time t = 30. We then embed the resulting matrices using CMDS. 5 0 5 10 15 Digit 4 Digit 5 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Digit 4 Digit 5 25 20 15 10 5 0 5 0.5 0.0 0.5 1.0 (a) ect (b) ρ 2 t Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 31

An Auxiliary Result A small class of Euclidean distances on directed graphs can be established by considering relaxed random walks. Let G be a directed graph with transition matrix P. A relaxed random walk on G is a random walk with P α = αi + (1 α)p for some α (0, 1). Let f be a real-valued function with series expansion and radius of convergence R 1. f (x) = a 0 + a 1 x + a 2 x 2 + If a 0 > 0 and a i 0 for all i 1, then for any irreducible and aperiodic P, there exists a α [0, 1) such that is well-defined and EDM-2. = κ(f (P α Q)Π 1 ) Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 32

PC2 1000 500 0 500 1000 Digit 4 Digit 5 1000 500 0 500 1000 PC1 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 33

Digit 4 Digit 5 5 0 5 6 4 2 0 2 4 6 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 34

Digit 4 Digit 5 5 0 5 10 15 25 20 15 10 5 0 5 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 35

0.6 0.4 0.2 0.0 0.2 0.4 0.6 Digit 4 Digit 5 0.5 0.0 0.5 1.0 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15:1373 1396, 2003. R. E. Bellman. Dynamic programming. Princeton University Press, 1957. F. Chung. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9:1 19, 2005. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21:5 30, 2006. J. C. Gower. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53:325 338, 1966. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

H. Hotelling. Analysis of a complex of statistical variables into principle components. Journal of Educational Psychology, 24:417 441, 1933. J. G. Kemeny and J. L. Snell. Finite Markov Chains. Springer, 1960. Y. LeCun et al. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86, pages 2278 2324, 1998. K. Pearson. On lines and planes of closest fit to a system of points in space. Philosophical Magazine, 2:557 572, 1901. M. Saerens et al. The principal components analysis of a graph and its relationships to spectral clustering. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

Proceedings of the fifteenth European conference on machine learning, 2004. I. J. Schoenberg. Remarks to maurice frechet s article sur la definition axiomatique d une classe d espace distances vectoriellement applicable sur l espace de hilbert. The Annals of Mathematics, 36(3):pp. 724 732, 1935. J. Shi and J. Malik. Normalized cuts and image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 731 737, 1997. A. Smola and R. Kondor. Kernels and regularization on graphs. In Conference on Learning Theory, 2003. M. Tang. Graph metrics and dimensionality reduction. PhD thesis, Indiana University, Bloomington, 2010. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

J. B. Tenenbaum et al. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319 2323, December 2000. W. S. Torgersen. Multidimensional scaling: I. Theory and method. Psychometrika, 17:401 419, 1952. M. W. Trosset and C. E. Priebe. The out-of-sample problem for classical multidimensional scaling. Computational Statistics and Data Analysis, 52:4635 4642, 2008. M. W. Trosset and M. Tang. On combinatorial Laplacian eigenmaps. Technical report, Indiana University, Bloomington, 2010. G. Young and A. S. Householder. Discussion of a set of points in terms of their mutual distances. Psychometrika, 3:19 22, 1938. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36