Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Similar documents
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Nonlinear Dimensionality Reduction

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Introduction to Spectral Graph Theory and Graph Clustering

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Unsupervised dimensionality reduction

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Nonlinear Dimensionality Reduction

Non-linear Dimensionality Reduction

Statistical and Computational Analysis of Locality Preserving Projection

Data-dependent representations: Laplacian Eigenmaps

Spectral Clustering. by HU Pili. June 16, 2013

Manifold Learning: Theory and Applications to HRI

Spectral Techniques for Clustering

EECS 275 Matrix Computation

Locality Preserving Projections

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Spectral Clustering. Zitao Liu

Data dependent operators for the spatial-spectral fusion problem

Bi-stochastic kernels via asymmetric affinity functions

Dimensionality Reduc1on

Robust Laplacian Eigenmaps Using Global Information

Dimensionality Reduction AShortTutorial

Intrinsic Structure Study on Whale Vocalizations

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning Eigenfunctions Links Spectral Embedding

Statistical Machine Learning

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

CMPSCI 791BB: Advanced ML: Laplacian Learning

Graphs, Geometry and Semi-supervised Learning

Lecture: Some Practical Considerations (3 of 4)

Informative Laplacian Projection

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Spectral Hashing. Antonio Torralba 1 1 CSAIL, MIT, 32 Vassar St., Cambridge, MA Abstract

Graphs in Machine Learning

Apprentissage non supervisée

Graphs in Machine Learning

Graph-Laplacian PCA: Closed-form Solution and Robustness

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

MATH 567: Mathematical Techniques in Data Science Clustering II

Manifold Learning and it s application

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Lecture 10: Dimension Reduction Techniques

Learning a kernel matrix for nonlinear dimensionality reduction

Spectral Clustering on Handwritten Digits Database

LECTURE NOTE #11 PROF. ALAN YUILLE

Spectral Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Spectral Clustering on Handwritten Digits Database Mid-Year Pr

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension Reduction and Low-dimensional Embedding

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Spectral Clustering. Guokun Lai 2016/10

Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

Dimensionality Reduction:

Graph Metrics and Dimension Reduction

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Markov Chains and Spectral Clustering

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

DIMENSION REDUCTION. min. j=1

Discriminant Uncorrelated Neighborhood Preserving Projections

Graphs in Machine Learning

Manifold Regularization

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

Large-Scale Manifold Learning

The Curse of Dimensionality for Local Kernel Machines

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

MANIFOLD LEARNING: A MACHINE LEARNING PERSPECTIVE. Sam Roweis. University of Toronto Department of Computer Science. [Google: Sam Toronto ]

Local Learning Projections

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

A spectral clustering algorithm based on Gram operators

MATH 567: Mathematical Techniques in Data Science Clustering II

Advanced Machine Learning & Perception

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

L26: Advanced dimensionality reduction

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Improved Local Coordinate Coding using Local Tangents

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Supervised locally linear embedding

The Curse of Dimensionality for Local Kernel Machines

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Learning gradients: prescriptive models

Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)

Machine Learning for Data Science (CS4786) Lecture 11

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Statistical Pattern Recognition

Some notes on SVD, dimensionality reduction, and clustering. Guangliang Chen

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Manifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012

A Duality View of Spectral Methods for Dimensionality Reduction

Global vs. Multiscale Approaches

Iterative Laplacian Score for Feature Selection

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization

Analysis of Spectral Kernel Design based Semi-supervised Learning

Transcription:

Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22

Outline Introduction 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 2/22

Dimensionality reduction Introduction Compact encoding of high-dimensional data Intinsic degree of freedom Preprocessing for supervised learning 3/22

Introduction 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 4/22

Introduction The dimensionality reduction problem Given n data points {x i } n i=1, each point lies in high dimesional space x i R D. We want to find a mapping φ that maps data x i to some y i in low dimensional space, φ(x i ) = y i R d, D d. PCA would require φ to be linear and keep most variance Linearity prevents reducing dimension of a lot of manifolds Nonlinear dimensionality reduction preverving local geometry 5/22

Introduction Step 1: constructing adjacency graph Use n points {x i } n i=1 to construct a graph G = (V, E) Each points x i corresponds to one vertex in v i V, V = n Put an edge between v i and v j if x i and x j are close Two common ways to measure closeness ɛ-neighborhood x i x j 2 < ɛ k nearest neighbor x j N i or x i N j 6/22

Introduction Step 2: choosing edge weights Step 1 only determines the existence of one edge Step 2 will assign a weight to each edge Simple-minded W ij = { 1, vi and v j are connected, 0, otherwise Head kernel (parameter t R) W ij = e x i x j 2 t 7/22

Step 3: eigenmaps Introduction Assume G constructed by step 1 and 2 is connected Let W be the adjacency matrix, D the degree matrix and L Laplacian matrix D = W1 L = D W Solve generaized eigenvalue problem Lv = λdv Equivalent to find eigenvalues of normalized Laplacian D 1/2 LD 1/2 v = λv 8/22

Optimal embedding Introduction Let {λ i, v i } n i=1 be sorted eigenvalue-eigenvector pairs 0 = λ 1 λ 2 λ 3... λ n The embedding defined by Laplacian eigenmap is φ(x i ) = y i = (v 2 (i), v 3 (i),..., v d+1 (i)) T R d Claim The embedding provided by Laplacian eigenmaps is optimal, in the sense that it is the best locality-preserving mapping. 9/22

One-dimensional case Introduction Consider mapping {x i R D } n i=1 to 1D points {y i R} n i=1 Close points in R D should also be close in R Criterion: find {y i } n i=1 that minimizes 1 2 (y i y j )W ij = y T Ly ij Add constraint y T Dy = 1 to remove scale ambiguity Solution given by generalized eigenvalue problem Ly = λdy 10/22

General case Introduction Map {x i R D } n i=1 to points {y i R d } n i=1 : closeness not changed by mapping Criterion: minimize y i y j 2 W ij = Tr(YLY T ) ij Add constraint Y T DY = I, where Y = (y 1, y 2,..., y n ) Solution given by generalized eigenvalue problem Lv = λdv 11/22

Introduction 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 12/22

Swiss roll Introduction 13/22

Introduction Swiss roll embedded to R 2 14/22

Vision example Introduction 15/22

Introduction Connections to spectral clustering Unified framework 1 Introduction 2 3 4 Connections to spectral clustering Unified framework 16/22

Spectral clustering recap Introduction Connections to spectral clustering Unified framework Given graph G = (V, E), two subsets A, B, cut(a, B) = vol(a) = u A,v B u A,v V W (u, v) W (u, v) Note that v i A D ii = vol(a), v i B D ii = vol(b) Minimize normalized cut (Shi and Malik (2000)) ( 1 Ncut(A, B) = cut(a, B) vol(a) + 1 ) vol(b) 17/22

Variable definitions Introduction Connections to spectral clustering Unified framework Let a = vol(a), b = vol(b), and define x i = 1 vol(a), 1 vol(b), Recognizing that x T Lx = (x i x j ) 2 W ij = ij v i A,v j B if v i A if v i B ( 1 a + 1 ) 2 cut(a, B) b x T Dx = i xi 2 D ii = D ii a 2 + D ii b 2 = 1 a + 1 b v i A v i B 18/22

Introduction Connections to spectral clustering Unified framework Transform spectral clustering to eigenmaps Spectral clustering problem minimizes Put y = D 1/2 x x T ( Lx 1 x T Dx = cut(a, B) a + 1 ) = Ncut(A, B) b x T Lx x T Dx = yt D 1/2 LD 1/2 y y T y L = D 1/2 LD 1/2 is called normalized graph Laplacian Equivalent to find the smallest nontrivial eigenvalue of L 19/22

Unified framework Introduction Connections to spectral clustering Unified framework Similar algorithms: Kernel PCA, Laplacian eigenmaps, spectral clustering, LLE, etc. Bengio et al. (2003) came up with a unified framework 1 Given a data set {x i } n i=1, construct a n n similarity matrix M, M ij = k(x i, x j ); define D as D ii = j M ij 2 (Optional) Transform M, yielding a normalized matrix M. This corresponds a transformed kernel Mij = k(x i, x j ) 3 Compute m largest eigenvalue-eigenvector pairs (λ k, v k ) 4 Embedding x i R D y i R d, y ik = v ki 20/22

Introduction Fit algorithms into framework Connections to spectral clustering Unified framework Spectral clustering (Ng et al. (2001)) M = W, M = D 1/2 LD 1/2 Laplacian eigenmaps (Belkin and Niyogi (2003)) M = D 1/2 LD 1/2, M = µi M Smallest eigenvalues of M largest eigenvalues of M LLE (Roweis and Saul (2000)) M = (I W) T (I W), M = µi M Smallest eigenvalues of M largest eigenvalues of M 21/22

References References Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality. Speech Communication, 1(2-3):349 367. Bengio, Y., Paiement, J.-F., Vincent, P., Delalleau, O., Le Roux, N., and Ouimet, M. (2003). Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Mij, 1:2. Ng, A. Y., Jordan, M. I., Weiss, Y., and others (2001). On spectral clustering: Analysis and an algorithm. In NIPS, volume 14, pages 849 856. Roweis, S. T. and Saul, L. K. (2000). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323 2326. Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888 905. 22/22