Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Similar documents
Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Data-dependent representations: Laplacian Eigenmaps

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Non-linear Dimensionality Reduction

Nonlinear Dimensionality Reduction

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Data dependent operators for the spatial-spectral fusion problem

Intrinsic Structure Study on Whale Vocalizations

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction. Jose A. Costa

Dimensionality Reduc1on

Unsupervised dimensionality reduction

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

LECTURE NOTE #11 PROF. ALAN YUILLE

Dimension Reduction and Low-dimensional Embedding

Locality Preserving Projections

Statistical and Computational Analysis of Locality Preserving Projection

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Lecture: Some Practical Considerations (3 of 4)

Robust Laplacian Eigenmaps Using Global Information

EECS 275 Matrix Computation

Manifold Learning: Theory and Applications to HRI

Graph Metrics and Dimension Reduction

Graphs, Geometry and Semi-supervised Learning

Manifold Regularization

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Manifold Learning and it s application

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Apprentissage non supervisée

Is Manifold Learning for Toy Data only?

Lecture 10: Dimension Reduction Techniques

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

A Duality View of Spectral Methods for Dimensionality Reduction

Spectral Techniques for Clustering

Statistical Pattern Recognition

A Duality View of Spectral Methods for Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

DIMENSION REDUCTION. min. j=1

Nonlinear Manifold Learning Summary

Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation

Learning a kernel matrix for nonlinear dimensionality reduction

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

MATH 829: Introduction to Data Mining and Analysis Clustering II

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Statistical Machine Learning

Spectral Hashing: Learning to Leverage 80 Million Images

Metric Learning on Manifolds

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Hot Spots Conjecture and Its Application to Modeling Tubular Structures

Dimensionality Reduction:

Multiscale Manifold Learning

Spectral Clustering. Guokun Lai 2016/10

Fisher s Linear Discriminant Analysis

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Machine learning for pervasive systems Classification in high-dimensional spaces

Dimensionality Reduction: A Comparative Review

TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION. Shiliang Sun

L26: Advanced dimensionality reduction

Advanced Machine Learning & Perception

Discrete Optimization

Self-Tuning Semantic Image Segmentation

Dimensionality Reduction AShortTutorial

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

A graph based approach to semi-supervised learning

Theoretical analysis of LLE based on its weighting step

Informative Laplacian Projection

Justin Solomon MIT, Spring 2017

Hot Spots Conjecture and Its Application to Modeling Tubular Structures

Heat Kernel Signature: A Concise Signature Based on Heat Diffusion. Leo Guibas, Jian Sun, Maks Ovsjanikov

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

From graph to manifold Laplacian: The convergence rate

CMPSCI 791BB: Advanced ML: Laplacian Learning

Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Neural Networks, Convexity, Kernels and Curses

Graph Partitioning Using Random Walks

Hot Spots Conjecture and Its Application to Modeling Tubular Structures

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Statistical Learning Notes III-Section 2.4.3

STA141C: Big Data & High Performance Statistical Computing

Bi-stochastic kernels via asymmetric affinity functions

Supplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Dimensionality Reduction: A Comparative Review

Spectral Dimensionality Reduction

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

Nearly Isometric Embedding by Relaxation

A Tour of Unsupervised Learning Part I Graphical models and dimension reduction

Large-Scale Manifold Learning

Transcription:

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department of Mathematics 2 University of Chicago, Departments of Computer Science and Statistics 5/10/07 M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 1 / 24

Outline 1 Introduction Motivation The Algorithm 2 Explaining The Algorithm The optimization Continuous Manifolds - The Heat Kernel 3 Results The Swiss roll Syntactic structure of words M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 2 / 24

Outline 1 Introduction Motivation The Algorithm 2 Explaining The Algorithm The optimization Continuous Manifolds - The Heat Kernel 3 Results The Swiss roll Syntactic structure of words M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 3 / 24

The problem In AI, information retrieval and data mining we get Intrinsically Low m-dimensional data lying in high N-dimensional space. Recovering the initial dimensionality results Smaller data sets Faster computations Reduce the space needed More meaningful representations... much more M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 4 / 24

Non Linearity PCA, MDS and their variations create linear embeddings of the data For non-linear structures their linear embeddings fail. IsoMAP, LLE and Laplacian eigenmaps partially solve this problem. The heart of these approaches is some kind of non-linear method IsoMAP, Adjacency Graph of "Neighbor nodes"-edges LLE, Each node is expressed by a limited number of neighbor nodes In Laplacian Eigenmaps, Adjacency Graphs of "Neighbor nodes"-edges M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 5 / 24

Outline 1 Introduction Motivation The Algorithm 2 Explaining The Algorithm The optimization Continuous Manifolds - The Heat Kernel 3 Results The Swiss roll Syntactic structure of words M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 6 / 24

Step 1. Construct the Graph We construct the Adjacency Graph A putting (i, j) edges if x i, x j are "close" Close may mean: ɛ-neighborhoods x i x j 2 < ɛ n nearest neighbors Combination of the above (at most n nearest neighbors with x i x j 2 < ɛ) M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 7 / 24

Step 2. Choose the Weights The edges have weights that can be Simple-minded: 1 if connected 0 else Heat Kernel: W ij = e x i x j 2 t We note the with t = we get the simple-minded approach The second method includes more information in our model M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 8 / 24

Step 3. The Laplacian eigenvalues Let A be the Adjacency matrix and D the degree matrix of the graph We introduce the Laplacian matrix L = A D and we solve the generalized eigenvector problem Lf = λdf We sort the eigenvectors according to their eigenvalues 0 = λ + 0 λ 1 λ 2... λ k 1 We leave out f 0 and use the first m eigenvectors for embedding in m-dimensional Euclidian space. x i (f 1 (i),..., f m (i) M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 9 / 24

A quick overview There are several efficient approximate techniques for finding the nearest neighbors. The algorithm consists of a few local computations and one sparse eigenvalue problem. It is a member of the IsoMap-LLE family of algorithms. As LLE and IsoMap does not give good results for non-convex manifolds. M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 10 / 24

Outline 1 Introduction Motivation The Algorithm 2 Explaining The Algorithm The optimization Continuous Manifolds - The Heat Kernel 3 Results The Swiss roll Syntactic structure of words M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 11 / 24

Our minimization problem for 1 Dimension Let y = (y 1, y 2,..., y n ) T be our mapping. We would like to minimize (y i y j ) 2 W ij under appropriate constraints ij (y i y j ) 2 W ij = ij (y i 2 + yj 2 2y i y j )W ij = i y i 2 D ii + j y j 2 D jj 2 ij y iy j W ij = 2y T Ly So we need to find argmin y y T Ly ij M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 12 / 24

Our Constraints The constraint y T Dy = 1 removes an arbitrary scaling factor in the embedding argmin y y T Ly, y T Dy = 1 is exactly the generalized eigenvalue problem Ly = λdy The additional constraint y T D1 = 0 the solution 1 with λ 0 = 0 So the final problem is: argmin y y t Ly, y T Dy = 1, y T D1 = 0 The general problem for m dimensions is similar and gives exactly the first m smallest eigenvalues. M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 13 / 24

Sum Up So each eigenvector is a function from nodes to R in a way that "close by" points are assigned "close by" values. The eigenvalue of each eigenfunction gives a measure of how "close by" are the values of close by points By using the first m eigenfunctions for determining our m-dimensions we have our solution. M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 14 / 24

Outline 1 Introduction Motivation The Algorithm 2 Explaining The Algorithm The optimization Continuous Manifolds - The Heat Kernel 3 Results The Swiss roll Syntactic structure of words M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 15 / 24

Continuous Manifolds When we are using continuous spaces the Graph Laplacian becomes the Laplace Beltrami Operator (More on this by Nakul.) And our optimization problem is finding functions f that map the manifold points to R, in a way that M f (x) 2 is minimum. Intuitively minimizing the gradient minimizes the values assigned to close by points. M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 16 / 24

Heat Kernels Continuous manifolds give the idea for the Heat Kernel That is assigning weights W ij = e x i x j 2 t The heat function is a solution for the Laplace Beltrami Operator, and as a result by assigning the weights according to it we get better approximation of the "ideal" infinite points case. M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 17 / 24

Outline 1 Introduction Motivation The Algorithm 2 Explaining The Algorithm The optimization Continuous Manifolds - The Heat Kernel 3 Results The Swiss roll Syntactic structure of words M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 18 / 24

One more Swiss roll! M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 19 / 24

Unfolding the Swiss roll - the parameters t,n N=number of nearest neighbors, t the heat kernel parameter M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 20 / 24

Outline 1 Introduction Motivation The Algorithm 2 Explaining The Algorithm The optimization Continuous Manifolds - The Heat Kernel 3 Results The Swiss roll Syntactic structure of words M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 21 / 24

Understanding syntactic structure of words Input: 300 most frequent words of Brown corpus Information about the frequency of its left and right neighbors (600 Dimensional space.) The algorithm run with N = 14, t = M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 22 / 24

Understanding syntactic structure of Words Three parts of the output. We can see verbs, prepositions, and auxiliary and modal verbs are grouped. M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 23 / 24

Discussion Laplacian Eigenvalues is a part of the Isomap LLE family They actually prove that LLE minimizes approximately the same value Many times we need to skip eigenvectors, as eigenvectors with bigger eigenvalues give better results. However variations for doing so exist. Although all these algorithms give good results for convex non-linear structures, they do not give much better results in other cases. M. Belkin P. Niyogi () Laplacian eigs for dim reduction 5/10/07 CSE291 24 / 24