Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22

Outline Introduction 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 2/22

Dimensionality reduction Introduction Compact encoding of high-dimensional data Intinsic degree of freedom Preprocessing for supervised learning 3/22

Introduction 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 4/22

Introduction The dimensionality reduction problem Given n data points {x i } n i=1, each point lies in high dimesional space x i R D. We want to find a mapping φ that maps data x i to some y i in low dimensional space, φ(x i ) = y i R d, D d. PCA would require φ to be linear and keep most variance Linearity prevents reducing dimension of a lot of manifolds Nonlinear dimensionality reduction preverving local geometry 5/22

Introduction Step 1: constructing adjacency graph Use n points {x i } n i=1 to construct a graph G = (V, E) Each points x i corresponds to one vertex in v i V, V = n Put an edge between v i and v j if x i and x j are close Two common ways to measure closeness ɛ-neighborhood x i x j 2 < ɛ k nearest neighbor x j N i or x i N j 6/22

Introduction Step 2: choosing edge weights Step 1 only determines the existence of one edge Step 2 will assign a weight to each edge Simple-minded W ij = { 1, vi and v j are connected, 0, otherwise Head kernel (parameter t R) W ij = e x i x j 2 t 7/22

Step 3: eigenmaps Introduction Assume G constructed by step 1 and 2 is connected Let W be the adjacency matrix, D the degree matrix and L Laplacian matrix D = W1 L = D W Solve generaized eigenvalue problem Lv = λdv Equivalent to find eigenvalues of normalized Laplacian D 1/2 LD 1/2 v = λv 8/22

Optimal embedding Introduction Let {λ i, v i } n i=1 be sorted eigenvalue-eigenvector pairs 0 = λ 1 λ 2 λ 3... λ n The embedding defined by Laplacian eigenmap is φ(x i ) = y i = (v 2 (i), v 3 (i),..., v d+1 (i)) T R d Claim The embedding provided by Laplacian eigenmaps is optimal, in the sense that it is the best locality-preserving mapping. 9/22

One-dimensional case Introduction Consider mapping {x i R D } n i=1 to 1D points {y i R} n i=1 Close points in R D should also be close in R Criterion: find {y i } n i=1 that minimizes 1 2 (y i y j )W ij = y T Ly ij Add constraint y T Dy = 1 to remove scale ambiguity Solution given by generalized eigenvalue problem Ly = λdy 10/22

General case Introduction Map {x i R D } n i=1 to points {y i R d } n i=1 : closeness not changed by mapping Criterion: minimize y i y j 2 W ij = Tr(YLY T ) ij Add constraint Y T DY = I, where Y = (y 1, y 2,..., y n ) Solution given by generalized eigenvalue problem Lv = λdv 11/22

Introduction 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 12/22

Swiss roll Introduction 13/22

Introduction Swiss roll embedded to R 2 14/22

Vision example Introduction 15/22

Introduction Connections to spectral clustering Unified framework 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 16/22

Spectral clustering recap Introduction Connections to spectral clustering Unified framework Given graph G = (V, E), two subsets A, B, cut(a, B) = vol(a) = u A,v B u A,v V W (u, v) W (u, v) Note that v i A D ii = vol(a), v i B D ii = vol(b) Minimize normalized cut (Shi and Malik (2000)) ( 1 Ncut(A, B) = cut(a, B) vol(a) + 1 ) vol(b) 17/22

Variable definitions Introduction Connections to spectral clustering Unified framework Let a = vol(a), b = vol(b), and define x i = 1 vol(a), 1 vol(b), Recognizing that x T Lx = (x i x j ) 2 W ij = ij v i A,v j B if v i A if v i B ( 1 a + 1 ) 2 cut(a, B) b x T Dx = i xi 2 D ii = D ii a 2 + D ii b 2 = 1 a + 1 b v i A v i B 18/22

Introduction Connections to spectral clustering Unified framework Transform spectral clustering to eigenmaps Spectral clustering problem minimizes Put y = D 1/2 x x T ( Lx 1 x T Dx = cut(a, B) a + 1 ) = Ncut(A, B) b x T Lx x T Dx = yt D 1/2 LD 1/2 y y T y L = D 1/2 LD 1/2 is called normalized graph Laplacian Equivalent to find the smallest nontrivial eigenvalue of L 19/22

Unified framework Introduction Connections to spectral clustering Unified framework Similar algorithms: Kernel PCA, Laplacian eigenmaps, spectral clustering, LLE, etc. Bengio et al. (2003) came up with a unified framework 1 Given a data set {x i } n i=1, construct a n n similarity matrix M, M ij = k(x i, x j ); define D as D ii = j M ij 2 (Optional) Transform M, yielding a normalized matrix M. This corresponds a transformed kernel Mij = k(x i, x j ) 3 Compute m largest eigenvalue-eigenvector pairs (λ k, v k ) 4 Embedding x i R D y i R d, y ik = v ki 20/22

Introduction Fit algorithms into framework Connections to spectral clustering Unified framework Spectral clustering (Ng et al. (2001)) M = W, M = D 1/2 LD 1/2 Laplacian eigenmaps (Belkin and Niyogi (2003)) M = D 1/2 LD 1/2, M = µi M Smallest eigenvalues of M largest eigenvalues of M LLE (Roweis and Saul (2000)) M = (I W) T (I W), M = µi M Smallest eigenvalues of M largest eigenvalues of M 21/22

References References Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality. Speech Communication, 1(2-3):349 367. Bengio, Y., Paiement, J.-F., Vincent, P., Delalleau, O., Le Roux, N., and Ouimet, M. (2003). Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Mij, 1:2. Ng, A. Y., Jordan, M. I., Weiss, Y., and others (2001). On spectral clustering: Analysis and an algorithm. In NIPS, volume 14, pages 849 856. Roweis, S. T. and Saul, L. K. (2000). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323 2326. Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888 905. 22/22