Graph Metrics and Dimension Reduction

Similar documents
Data-dependent representations: Laplacian Eigenmaps

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Unsupervised dimensionality reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

The Out-of-Sample Problem for Classical Multidimensional Scaling

Non-linear Dimensionality Reduction

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Apprentissage non supervisée

Nonlinear Dimensionality Reduction. Jose A. Costa

Data dependent operators for the spatial-spectral fusion problem

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Lecture: Some Practical Considerations (3 of 4)

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Intrinsic Structure Study on Whale Vocalizations

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Nonlinear Dimensionality Reduction

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

How to learn from very few examples?

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Robust Laplacian Eigenmaps Using Global Information

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Dimensionality Reduction AShortTutorial

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Spherical Euclidean Distance Embedding of a Graph

Nonlinear Dimensionality Reduction

EECS 275 Matrix Computation

Statistical Machine Learning

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Graphs, Geometry and Semi-supervised Learning

Graph-Laplacian PCA: Closed-form Solution and Robustness

Locality Preserving Projections

Learning gradients: prescriptive models

Linearly-solvable Markov decision problems

Markov Chains, Random Walks on Graphs, and the Laplacian

Spectral Techniques for Clustering

Spectral Clustering on Handwritten Digits Database

Dimensionality Reduc1on

Lecture: Modeling graphs with electrical networks

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Dimension Reduction and Low-dimensional Embedding

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Spectral Clustering. Zitao Liu

Bi-stochastic kernels via asymmetric affinity functions

Clustering in kernel embedding spaces and organization of documents

Manifold Learning and it s application

Manifold Learning: Theory and Applications to HRI

High-Dimensional Pattern Recognition using Low-Dimensional Embedding and Earth Mover s Distance

Statistical and Computational Analysis of Locality Preserving Projection

Nonlinear Manifold Learning Summary

On the eigenvalues of Euclidean distance matrices

Lecture 10: Dimension Reduction Techniques

DIMENSION REDUCTION. min. j=1

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Francois Fouss, Alain Pirotte, Jean-Michel Renders & Marco Saerens. January 31, 2006

PARAMETERIZATION OF NON-LINEAR MANIFOLDS

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Diffusion Wavelets and Applications

L26: Advanced dimensionality reduction

Multiscale Manifold Learning

Manifold Regularization

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Linear and Non-Linear Dimensionality Reduction

Statistical Pattern Recognition

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Graph Partitioning Using Random Walks

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Spectral Clustering. by HU Pili. June 16, 2013

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Semi-supervised Eigenvectors for Locally-biased Learning

MATH 829: Introduction to Data Mining and Analysis Clustering II

Graph Partitioning Algorithms and Laplacian Eigenvalues

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Manifold Coarse Graining for Online Semi-supervised Learning

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University

Exploiting Sparse Non-Linear Structure in Astronomical Data

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Diffusion and random walks on graphs

Spectral approximations in machine learning

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Spectral Clustering on Handwritten Digits Database Mid-Year Pr

Proximity data visualization with h-plots

Spectral Clustering. Guokun Lai 2016/10

Signal processing methods have significantly changed. Diffusion Maps. Signal Processing

LECTURE NOTE #11 PROF. ALAN YUILLE

Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Linear algebra and applications to graphs Part 1

Linear Spectral Hashing

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

Spectral Feature Vectors for Graph Clustering

Transcription:

Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November 11, 2010 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 1

Outline 1 Problem Description 2 Preliminaries 3 Distances on Undirected Graphs 4 From Distances to Embeddings 5 Distances on Directed Graphs Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 2

Benefits of Dimension Reduction High-dimensional data, i.e., a great many measurements taken on each member of a set of objects are now ubiquitous. Bellman s curse of dimensionality [Bellman(1957)] refers to the problems caused by the exponential increase in volume of a mathematical space as additional dimensions are added. A list of problems might include, Slow convergence of statistical estimators, e.g., density estimators. Overfitting of models to noise. Difficulties in performing exploratory data analysis. Nearest-neighbour searches are inefficient. Dimension reduction is the process of replacing a multivariate data set with a data set of lower dimension. Classical approaches to dimension reduction involve the use of principal component analysis (PCA) [Pearson(1901), Hotelling(1933)] or classical multidimensional scaling [Torgersen(1952), Gower(1966)]. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 3

Problem Description We consider the problem of constructing a low-dimensional Euclidean representation of data described by pairwise similarities. The low-dimensional representation can serve as the basis for other exploitation tasks, e.g., visualization, clustering, or classification. Our basic strategy is: 1 Transform the similarities into some notion of dissimilarities; 2 Embed the derived dissimilarities. Our concerns are closely related to the concerns of manifold learning. Various manifold learning techniques can be interpreted as transformations from similarities to dissimilarities. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 4

A Motivating Example: MNIST Dataset 1200 images of digits (4 or 5) from MNIST [LeCun et al.(1998)]. Each image is viewed as a point in R 28 28. How should we measure the proximity of a pair of images? δ ij = x i x j γ ij = exp( x i x j 2 σ 2 ) PC2 1000 500 0 500 1000 Digit 4 Digit 5 5 0 5 Digit 4 Digit 5 1000 500 0 500 1000 6 4 2 0 2 4 6 PC1 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 5

From Similarities to Graphs Isomap [Tenenbaum et al.(2000)] employed two commonly used approach to graph construction. In both approaches, vertices correspond to feature vectors. 1 ɛ-neighborhood approach. v i v j iff x i x j < ɛ. 2 K-NN approach. Connect v i v j iff x j is a K nearest neighbor of x i. This graph is directed, but in practice it is often symmetrized. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 6

Outline 1 Problem Description 2 Preliminaries 3 Distances on Undirected Graphs 4 From Distances to Embeddings 5 Distances on Directed Graphs Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 7

Distances on Graphs Given a n n similarity matrix Γ = (γ ij ): 1 Transform the similarities to dissimilarities. (Isomap starts with distances in an ambient input space.) (a) Construct a weighted graph G = (V, E, ω) with n vertices and edge weights ω ij = γ ij. (b) Choose a suitable measure of dissimilarity (typically a distance) on G. Let denote the pairwise dissimilarities. 2 Embed. Several popular approaches that transform similarity to distance rely on the concept of a random walk. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 8

Random Walks on Graphs Let G = (V, E, ω) be an undirected graph. We define the transition matrix P = (p uv ) of a Markov chain with state space V as p uv = { ω({u,v}) deg(u) if u v 0 otherwise (1) Suppose that G is connected. Then the stationary distribution π of P exists and is unique. Furthermore, if G is connected and not bipartite, then lim k Pk = 1π T := Q (2) Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 9

Distance Geometry Definition (Euclidean Distance Matrix) Let = (δ ij ) be a n n dissimilarity matrix. is a Type-2 Euclidean distance matrix (EDM-2) if there exists n points x 1, x 2,..., x n R p for some p such that δ ij = x i x j 2. Let A and B be n n matrices. Define two linear transforms τ and κ by τ(a) = 1 2 (I 11T /n)a(i 11 T /n). κ(b) = B dg 11 T B B T + 11 T B dg There is an equivalence between EDM-2 and p.s.d matrices. Theorem ([Schoenberg(1935), Young and Householder(1938)]) is EDM-2 iff τ( ) is p.s.d. As a corollary, if B is p.s.d, then = κ(b) is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 10

Outline 1 Problem Description 2 Preliminaries 3 Distances on Undirected Graphs 4 From Distances to Embeddings 5 Distances on Directed Graphs Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 11

Expected Commute Time Following [Kemeny and Snell(1960)], let Π = diag(π) and Z = (I P + Q) 1. The expected first passage times are given by and the expected commute times are M = (11 T Z dg Z)Π 1 ect = M + M T = κ(zπ 1 ) It turns out that ZΠ 1 0; hence ect is EDM-2. This result is well known; see [Trosset and Tang(2010)] for an elementary proof. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 12

Diffusion Distances Let e i and e j denote point masses at vertices v i and v j. After r time steps, under the random walk model with transition matrix P, these distributions have diffused to e T i P r and e T j P r. The diffusion distance [Coifman and Lafon(2006)] at time r between v i and v j is ρ r (v i, v j ) = e T i P r e T j P r 1/π where the inner product, 1/π is defined as u, v 1/π = k u(k)v(k)/π(k) It turns out that ρ 2 r = κ(p 2r Π 1 ) and P 2r Π 1 0; hence, ρ 2 r is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 13

Some Remarks on ECT and Diffusion Distances 1 ect can be written as ( ect = κ(zπ 1 ) = κ (P Q) t Π 1). Even though ( κ((p Q) t Π 1 ) = κ(p t Π 1 ) for any t, ) ect κ t=0 Pt Π 1 because t=0 Pt Π 1 does not necessarily converge. 2 ρ 2 t can be written as t=0 ρ 2 t = κ(p 2t Π 1 ) = κ ( (P Q) 2t Π 1) 3 Diffusion distance between v i and v j at time t takes into account only paths of length 2t while expected commute time takes into account paths of all lengths. In fact, expected commute time with respect to P 2 is the sum of diffusion distances for t = 0, 1,... with respect to P. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 14

General Framework for Euclidean Distances on Graphs We introduce a general family of Euclidean distances constructed from random walks on graphs. Let f be a real-valued function with a series expansion f (x) = a 0 + a 1 x + a 2 x 2 + and radius of convergence R 1. For a square matrix X, define f (X) by Theorem ([Tang(2010)]) f (X) = a 0 I + a 1 X + a 2 X 2 + Assume that P is irreducible and aperiodic. If f (x) 0 for x ( 1, 1), then ( = κ(f (P Q)Π 1 ) = κ (a 0 I+a 1 (P Q)+a 2 (P Q) 2 + )Π 1) (3) is well-defined and EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 15

Euclidean Distances on Graphs: Some Examples = κ(f (P Q)Π 1 ) = κ((a 0 I + (a 1 P Q) + a 2 (P Q) 2 + )Π 1 ) f (x) Comments 1 trivial notion of distance x 2r diffusion distance at time r 1/(1 x) expected commute time 1/(1 x) k for k 2 longer paths have higher weights log (1 x 2 ) longer paths have lower weights exp(x) heavily weights paths of short lengths Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 16

Outline 1 Problem Description 2 Preliminaries 3 Distances on Undirected Graphs 4 From Distances to Embeddings 5 Distances on Directed Graphs Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 17

Embedding = κ(f (P Q)Π 1 ) in R d : Method 1 Embed by CMDS. 1 Compute B = τ( ) = (I 11 T /n)f (P Q)Π 1 (I 11 T /n) 2 Let λ 1 λ 2 λ n denote the eigenvalues of B and let v 1, v 2,..., v n denote the corresponding set of orthonormal eigenvectors. Then [ X d = λ1 v 1 λ 2 v 2 ] λ d v d produces a configuration of points in R d whose interpoint distances approximate. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 18

Example: Embedding ect by CMDS Let G be an undirected graph and L be the combinatorial Laplacian of G. It turns out that L, the Moore-Penrose pseudoinverse of L, is related to ZΠ 1 by L = c(i 11 T /n)zπ 1 (I 11 T /n) (4) where c is a constant. Therefore, ect = cκ(l ). Furthermore, τ( ect ) = cl. The d-dimensional embedding of ect is thus given by [ X d = c λ1 ν 1 λ 2 ν 2... ] λ d ν d (5) where λ 1 λ 2... and ν 1, ν 2,... are the eigenvalues and corresponding eigenvectors of L. The embedding in Eq. (5) is called a combinatorial Laplacian eigenmap. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 19

Embedding = κ(f (P Q)Π 1 ) in R d : Method 2 Embed by the eigenvalues and eigenvectors of P. 1 Let µ 1, µ 2,..., µ n 1 < 1 = µ n denote the eigenvalues of P, sorted so that f (µ i ) f (µ i+1 ). Let u 1, u 2,..., u n denote the corresponding set of eigenvectors, orthonormal with respect to the inner product u, v π = k u(k)v(k)π(k) 2 Then [ X d = f (µ1 )u 1 f (µ 2 )u 2 ] f (µ d )u d produces a configuration of points in R d whose interpoint distances approximate. By rescaling individual coordinates, the embedding from any one f can be transformed to the embedding for any other f. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 20

Comparing the Embeddings Method 1: Classical MDS 1 [ The embedding X = λ1 v 1 ] λ n 1 v n 1 recovers completely. 2 The embedding dimension of is almost surely n 1. 3 The best (least squares) d-dim[ representation of X is λ1 X d = v 1 ] λ d v d. Method 2: Eigensystem of P 1 The [ embedding X = f (µ1 )u 1 ] f (µ n 1 )u n 1 recovers completely. 2 The embedding dimension of is almost surely n 1. 3 The best d-dim representation of X is (usually) [ not f X d = (µ1 )u 1 ] f (µ d )u d Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 21

Normalized Laplacian Eigenmaps 1 Construct a graph G = (V, E, ω) with V = X. 2 Compute the eigenvalues λ and eigenvectors f of the generalized eigenvalue problem Lf = λdf (6) 3 Let λ 0 λ 1 λ n 1 be the eigenvalues of Eq. (6) and f 0, f 1,..., f n 1 be the corresponding eigenvectors. ( ) 4 Embed into R d by x i 1 1 λ1 f 1 (i), 1 λ2, f 2 (i),..., λd f d (i). Under our framework, steps 2 to 4 are equivalent to embedding ect using the eigenvalues and eigenvectors of P (Method 2). This is not equivalent to embedding ect using the eigensystem of L (Method 1). Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 22

Diffusion Maps [Coifman and Lafon(2006)] 1 Construct a graph G = (V, E, ω) with V = X. 2 Generate the transition matrix P of G. 3 Let λ 0 λ 1 λ n 1 be the eigenvalues of P and f 0, f 1,..., f n 1 be the corresponding eigenvectors. 4 Embed into R d by x i ( λ t 1 f 1(i), λ t 2 f 2(i),..., λ t mf d (i)). Recall that ρ 2 t = κ((p Q) 2t Π 1 ) is the matrix of diffusion distances. Under our framework, steps 2 to 4 are equivalent to embedding ρ 2 t using the eigenvalues and eigenvectors of P (Method 2). Normalized Laplacian eigenmaps and diffusion maps are thus coordinate rescalings of one another. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 23

Coordinates rescaling of embeddings 5 0 5 10 Digit 4 Digit 5 0.5 0.0 0.5 1.0 1.5 Digit 4 Digit 5 5 0 5 1.0 0.5 0.0 0.5 (a) Normalized Laplacian eigenmap (b) Diffusion map at time t = 10 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 24

Paths of Even Length & Diffusion Distances Y 0.6 0.8 1.0 1.2 1.4 Y 0.10 0.05 0.00 0.05 0.10 0 1 2 3 4 5 6 X 0.05 0.00 0.05 X (a)original Data (b)diffusion map at time t = 5 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 25

Outline 1 Problem Description 2 Preliminaries 3 Distances on Undirected Graphs 4 From Distances to Embeddings 5 Distances on Directed Graphs Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 26

Expected Commute Time for Directed Graphs Analogous to the case of expected commute time on undirected graphs, let Π = diag(π) and Z = (I P + Q) 1. The expected first passage times for directed graphs are also given by M = (11 T diag(z) Z)Π 1 and the expected commute times are ect = M + M T = κ(zπ 1 ) = κ(h(zπ 1 )) where H(A) = 1 2 (A + AT ) is the Hermitian part of A. It turns out that H(ZΠ 1 ) 0; hence, ect for directed graphs is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 27

Diffusion Distances for Directed Graphs Let e i and e j denote point masses at vertices v i and v j. Analogous to the case of diffusion distance on undirected graphs, after r time steps, under the random walk model with transition matrix P, these distributions had diffused to e T i P r and e T j P r. The diffusion distance on directed graph at time r between v i and v j is ρ r (v i, v j ) = e T i P r e T j P r 1/π where the inner product, 1/π is defined as u, v 1/π = k u(k)v(k)/π(k) ρ 2 r = κ(p r Π 1 (P r ) T ); hence ρ 2 r for directed graphs is EDM-2. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 28

Distances on Directed Graphs: Some Comments 1 It is harder to derive a framework for Euclidean distances on directed graphs. If G is a directed graph and P is the transition matrix on G, then there exists a k 2 such that = κ((i P + Q) k Π 1 ) is not EDM-2. 2 Expected commute time under the random walk model with transition matrix P 2 is no longer the sum of the squared diffusion distances through all time scales. We can interpret this as saying that the symmetrization performed in constructing expected commute time is incompatible with the symmetrization performed in constructing diffusion distances. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 29

From Distances to Embeddings Let G be a directed graph with transition matrix P. Suppose that is a dissimilarity/distance matrix constructed by considering random walks on G. Consider the problem of embedding into Euclidean space. Embedding using CMDS is straightforward. Because the eigenvalues and eigenvectors of P are possibly complex-valued, embedding using the eigenvalues and eigenvectors of P might not be possible. The notions of the combinatorial Laplacian and normalized Laplacian need to be extended to directed graphs. This is usually accomplished by symmetrization. However, the decision of how to symmetrize is not obvious. For example, the following definition of the combinatorial Laplacian for directed graphs ([Chung(2005)]) L = Π ΠP + PT Π 2 does not generate expected commute time for directed graphs. (7) Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 30

Embedding Distances for Directed Graphs We construct the directed K-NN graph with K = 100 for our motivating example. We compute the matrix of expected commute times and the matrix of diffusion distances at time t = 30. We then embed the resulting matrices using CMDS. 5 0 5 10 15 Digit 4 Digit 5 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Digit 4 Digit 5 25 20 15 10 5 0 5 0.5 0.0 0.5 1.0 (a) ect (b) ρ 2 t Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 31

An Auxiliary Result A small class of Euclidean distances on directed graphs can be established by considering relaxed random walks. Let G be a directed graph with transition matrix P. A relaxed random walk on G is a random walk with P α = αi + (1 α)p for some α (0, 1). Let f be a real-valued function with series expansion and radius of convergence R 1. f (x) = a 0 + a 1 x + a 2 x 2 + If a 0 > 0 and a i 0 for all i 1, then for any irreducible and aperiodic P, there exists a α [0, 1) such that is well-defined and EDM-2. = κ(f (P α Q)Π 1 ) Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 32

PC2 1000 500 0 500 1000 Digit 4 Digit 5 1000 500 0 500 1000 PC1 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 33

Digit 4 Digit 5 5 0 5 6 4 2 0 2 4 6 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 34

Digit 4 Digit 5 5 0 5 10 15 25 20 15 10 5 0 5 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 35

0.6 0.4 0.2 0.0 0.2 0.4 0.6 Digit 4 Digit 5 0.5 0.0 0.5 1.0 Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15:1373 1396, 2003. R. E. Bellman. Dynamic programming. Princeton University Press, 1957. F. Chung. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9:1 19, 2005. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21:5 30, 2006. J. C. Gower. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53:325 338, 1966. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

H. Hotelling. Analysis of a complex of statistical variables into principle components. Journal of Educational Psychology, 24:417 441, 1933. J. G. Kemeny and J. L. Snell. Finite Markov Chains. Springer, 1960. Y. LeCun et al. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86, pages 2278 2324, 1998. K. Pearson. On lines and planes of closest fit to a system of points in space. Philosophical Magazine, 2:557 572, 1901. M. Saerens et al. The principal components analysis of a graph and its relationships to spectral clustering. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

Proceedings of the fifteenth European conference on machine learning, 2004. I. J. Schoenberg. Remarks to maurice frechet s article sur la definition axiomatique d une classe d espace distances vectoriellement applicable sur l espace de hilbert. The Annals of Mathematics, 36(3):pp. 724 732, 1935. J. Shi and J. Malik. Normalized cuts and image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 731 737, 1997. A. Smola and R. Kondor. Kernels and regularization on graphs. In Conference on Learning Theory, 2003. M. Tang. Graph metrics and dimensionality reduction. PhD thesis, Indiana University, Bloomington, 2010. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36

J. B. Tenenbaum et al. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319 2323, December 2000. W. S. Torgersen. Multidimensional scaling: I. Theory and method. Psychometrika, 17:401 419, 1952. M. W. Trosset and C. E. Priebe. The out-of-sample problem for classical multidimensional scaling. Computational Statistics and Data Analysis, 52:4635 4642, 2008. M. W. Trosset and M. Tang. On combinatorial Laplacian eigenmaps. Technical report, Indiana University, Bloomington, 2010. G. Young and A. S. Householder. Discussion of a set of points in terms of their mutual distances. Psychometrika, 3:19 22, 1938. Tang & Trosset Graph Metrics and Dimension Reduction November 11, 2010 36