Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Similar documents
Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction

LECTURE NOTE #11 PROF. ALAN YUILLE

Manifold Learning: Theory and Applications to HRI

Non-linear Dimensionality Reduction

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Advanced Machine Learning & Perception

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Kernel Principal Component Analysis

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

L26: Advanced dimensionality reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Discriminative Direction for Kernel Classifiers

Unsupervised dimensionality reduction

Apprentissage non supervisée

Learning a kernel matrix for nonlinear dimensionality reduction

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Nonlinear Dimensionality Reduction. Jose A. Costa

Kernel methods for comparing distributions, measuring dependence

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Dimensionality Reduction AShortTutorial

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Statistical Machine Learning

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

PCA, Kernel PCA, ICA

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Statistical Pattern Recognition

Principal Component Analysis

Data-dependent representations: Laplacian Eigenmaps

EECS 275 Matrix Computation

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Spectral Clustering. by HU Pili. June 16, 2013

Intrinsic Structure Study on Whale Vocalizations

Multiple Similarities Based Kernel Subspace Learning for Image Classification

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

DIMENSION REDUCTION. min. j=1

Kernel methods, kernel SVM and ridge regression

Learning gradients: prescriptive models

Machine Learning - MT & 14. PCA and MDS

Dimension Reduction and Low-dimensional Embedding

Lecture 10: Dimension Reduction Techniques

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Approximate Kernel PCA with Random Features

Data dependent operators for the spatial-spectral fusion problem

MLCC 2015 Dimensionality Reduction and PCA

15 Singular Value Decomposition

Distance Preservation - Part 2

Chemometrics: Classification of spectra

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Graphs, Geometry and Semi-supervised Learning

Lecture: Face Recognition and Feature Reduction

CS798: Selected topics in Machine Learning

Dimensionality Reduc1on

Machine learning for pervasive systems Classification in high-dimensional spaces

Kernel Methods. Machine Learning A W VO

Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick

Lecture: Face Recognition and Feature Reduction

Advanced Machine Learning & Perception

The Curse of Dimensionality for Local Kernel Machines

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

A Duality View of Spectral Methods for Dimensionality Reduction

Introduction to Machine Learning

Data Mining and Analysis: Fundamental Concepts and Algorithms

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Dimensionality reduction

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Convergence of Eigenspaces in Kernel Principal Component Analysis

A Duality View of Spectral Methods for Dimensionality Reduction

Learning SVM Classifiers with Indefinite Kernels

Expectation Maximization

Bi-stochastic kernels via asymmetric affinity functions

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

1 Kernel methods & optimization

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Learning Eigenfunctions Links Spectral Embedding

Fisher s Linear Discriminant Analysis

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague

Announcements (repeat) Principal Components Analysis

Dimensionality reduction. PCA. Kernel PCA.

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Novelty Detection. Cate Welch. May 14, 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Kernel Methods. Charles Elkan October 17, 2007

Nonlinear Manifold Learning Summary

Advanced Machine Learning & Perception

Principal Component Analysis

1 Principal Components Analysis

Semi-Supervised Learning through Principal Directions Estimation

EE Technion, Spring then. can be isometrically embedded into can be realized as a Gram matrix of rank, Properties:

Robust Laplacian Eigenmaps Using Global Information

Locality Preserving Projections

Principal Component Analysis (PCA)

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Transcription:

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 1/21

Kernel Principal Component Analysis Figure 1: Basic idea of kpca Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 2/21

Kernel Principal Component Analysis Positive definite kernel k Given a nonempty set X and a positive definite kernel k, there exists a map Φ : X H into a dot product space H such that for all x,x X, Φ(x),Φ(x ) = k(x,x ). Thus k is a nonlinear similarity measure. Given centered data x 1,...,x m X, kpca computes the principal components of the points Φ(x 1 ),...,Φ(x m ) H (in feature space). Consider the covariance matrix in H, C = 1 m m Φ(x i )Φ(x i ) T (1) i=1 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 3/21

Kernel Principal Component Analysis We now have to find eigenvalues λ 0 and nonzero eigenvectors v H\{0} satisfying λv = Cv (2) By substituting C, we observe that all solutions v with λ 0 lie in the span of Φ( ). First observation: We may consider the set of equations for all n = 1,...,m. λ Φ(x n ),v = Φ(x n ),Cv (3) Second observation: There exist coefficients α i,i = 1,...,m such that m v = α i Φ(x i ) (4) i=1 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 4/21

Kernel Principal Component Analysis Combining the two observations, we get λ m α i Φ(x n ),Φ(x i ) = 1 m i=1 for all n = 1,..,m. m α i Φ(x n ), i=1 m j=1 With the m m Gram matrix is given as K ij = Φ(x i ),Φ(x j ), we rewrite the above equation as Φ(x j ) Φ(x j ),Φ(x i ) (5) mλkα = K 2 α (6) mλα = Kα (7) where α is the column vector with entries α 1,...,α m. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 5/21

Kernel Principal Component Analysis As analogous to PCA, we consider the largest p eigenvectors, and λ 1,...λ p are all non-zero eigenvalues. Normalization of α by imposing for all n = 1,...,p, 1 = v n,v n = m i,j=1 α n i α n j Φ(x i ),Φ(x j ) = m i,j=1 α n i α n j K ij = α n,kα n = λ n α n,α n (8) We want to compute projections onto the eigenvectors v n in H (n = 1,...,p). Let x be a test point with an image Φ(x) in H. Then v n,φ(x) = m αi n Φ(x i ),Φ(x) (9) i=1 are the nonlinear principal components corresponding to Φ. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 6/21

Summary of kpca algorithm 1. Compute the Gram matrix K ij = k(x i,x j ) ij. 2. Diagonalize K, and normalize eigenvector expansion coefficients α n by requiring λ n α n,α n = 1 3. Compute projections onto the eigenvectors. Final equation: v n,φ(x) = 1 λ n m αi n k(x i,x) i=1 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 7/21

Modification to account for zero mean The previous construction of kpca assumes that the data in feature space has zero mean. It is not true for arbitrary data, and hence there is a need to subtract the mean 1 Φ(x i ) m from all the points Φ( ) This leads to a different eigenvalue problem with the following Gram matrix i K = (I ee T )K(I ee T ) where e = 1 m (1,1,...,1) T Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 8/21

ISOMAP kpca and MDS Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 9/21

ISOMAP Recall ISOMAP= MDS on geodesic distances. Thus, one can write K ISOMAP = 1 2 (I eet )S(I ee T ) (10) where S is the squared distance matrix. There is no theoretical guarantee that K ISOMAP is positive definite. There is, however, a theorem by Grimes and Donoho (Hessian LLE) that apply in the continuum limit for a smooth manifold which says the geodesic distance between points on the manifold will be proportional to Euclidean distance in the low-dimensional parameter space of the manifold. In the continuum limit, ( S) will be conditional positive definite and so will K ISOMAP. Hence, ISOMAP is a form of kpca. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 10/21

LLE Recall in the last step of LLE, we find the smallest eigenvectors of the matrix M = (I W)(I W T ). Denote the maximum eigenvalue of M by λ max, we define the matrix K = λ max I M NOTE M = M T. Hence M is real symmetric. Positive definite matrix All eigenvalues are positive For a eigenvector v of M, Mv = λv. Consider Kv = (λ max I M)v = (λ max λ)v. By construction, K is positive definite. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 11/21

LLE K = λ max I M The leading eigenvector of K is e and the coordinates of the eigenvectors 2,...,d + 1 provide the LLE embedding. If we project out e and use the eigenvectors 1,...,d of the resulting matrix we get the embedding vectors. (I ee T )K(I ee T ) If we project out e and use the eigenvectors 1,...,d of the resulting matrix we get the embedding vectors. (I ee T )K(I ee T ) Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 12/21

LLE Remember that LLE is invariant under scaling, translation and rotation. LLE embedding is equivalent to the kpca projections up to a multiplication with λ n. This corresponds to the whitening step which is performed in LLE in order to fix the scaling, but not normally in kpca, where the scaling is determined by the variance of the data along the respective directions in H. Finally, there need not be an analytic form of a kernel k which gives rise to the LLE kernel matrix K. Hence, there need not be a feature map on the whole input domain. But one can give a feature map defined on the training points by writing the SVD of K. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 13/21

So far MDS, ISOMAP, LLE, and Laplacian eigenmaps can be interpreted as kpca with special kernels. Note that the kernel matrix is only defined on the training data, unlike kpca where one starts with the analytic form of the kernel and hence defining over entire space. Recap that MDS, ISOMAP, LLE, and Laplacian eigenmaps are manifold learning algorithms. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 14/21

Question: In classification algorithm such as Support Vector Machines, there is a set of kernels commonly used. These includes polynomial, Gaussian, sigmod (tanh), and B-spline kernels and they are used in "normal" kpca. In particular, consider the Gaussian kernel k(x,x ) = e x x 2 2σ 2 = e β x x 2 and we see that it preserves the mapping property of "nearby inputs into nearby features". Can we use the Gaussian kernel to do manifold learning? Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 15/21

Question: In classification algorithm such as Support Vector Machines, there is a set of kernels commonly used. These includes polynomial, Gaussian, sigmod (tanh), and B-spline kernels and they are used in "normal" kpca. In particular, consider the Gaussian kernel k(x,x ) = e x x 2 2σ 2 = e β x x 2 and we see that it preserves the mapping property of "nearby inputs into nearby features". Can we use the Gaussian kernel to do manifold learning? NO! Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 16/21

Answer: Consider the Gaussian kernel k(x,x ) = e x x 2 2σ 2 = e β x x 2 Not forgetting that k(x,x ) is the dot product of the input vectors in feature space, we see that if the two feature vectors are far away, k(x,x ) = 0. Physically, we are mapping distant patches of the manifold into orthogonal feature spaces. Thus, we can no longer unroll the swissroll. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 17/21

Answer: Consider the Gaussian kernel k(x,x ) = e x x 2 2σ 2 = e β x x 2 In addition, for every patch of radius β 1 2, the Gaussian kernel create an additional dimension. Hence, impossible to do dimension reduction. Physically, the Gaussian kernel is really doing dimension EXPANSION. This follows logically from the SVM school of thought of mapping inputs into higher dimensions in order to find a linear classifier in feature space. By creating orthogonal spaces for different patches, one can easily find a linear classifier. This is why the Gaussian kernel is commonly used in SVM whose purpose is to do clustering/classification. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 18/21

Summary What we covered so far 1. Principal Components Analysis PCA 2. Multi Dimensional Scaling MDS 3. ISOMAP 4. k-means 5. Locally Linear Embedding 6. Laplacian LLE 7. Hessian LLE 8. Kernel Principal Components Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 19/21

Topics not covered Dimension Reduction: Semi Definite Programming - Maximum variance unfolding - Conformal eigenmaps Dimension Reduction: Spectral Clustering - Laplacian matrix used in Laplacian eigenmaps and normalized cuts. - Manifold learning and clustering are linked because the clusters that spectral clustering manages to capture can be arbitrary curved manifolds as long as there is enough data to locally capture the curvature of the manifold. Feature Extraction: Probabilistic PCA - Probabilistic PCA - Oriented PCA Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 20/21

References Ham, J., D.D. Lee, S. Mika and B. Schölkopf. A kernel view of the dimensionality reduction of manifolds. Proceedings of the Twenty-First International Conference on Machine Learning, 369-376. B. Scholkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. and many many more. Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis p. 21/21