Unsupervised dimensionality reduction

Similar documents
Nonlinear Dimensionality Reduction

Non-linear Dimensionality Reduction

Manifold Learning: Theory and Applications to HRI

Intrinsic Structure Study on Whale Vocalizations

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Nonlinear Dimensionality Reduction. Jose A. Costa

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Lecture 10: Dimension Reduction Techniques

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Dimensionality Reduction AShortTutorial

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Statistical Machine Learning

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Data-dependent representations: Laplacian Eigenmaps

Apprentissage non supervisée

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Dimension Reduction and Low-dimensional Embedding

Data dependent operators for the spatial-spectral fusion problem

Nonlinear Dimensionality Reduction

LECTURE NOTE #11 PROF. ALAN YUILLE

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Manifold Learning and it s application

A Duality View of Spectral Methods for Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction

Dimensionality Reduc1on

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Robust Laplacian Eigenmaps Using Global Information

Learning a kernel matrix for nonlinear dimensionality reduction

L26: Advanced dimensionality reduction

Statistical Pattern Recognition

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Graphs, Geometry and Semi-supervised Learning

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Spectral Dimensionality Reduction

DIMENSION REDUCTION. min. j=1

Bi-stochastic kernels via asymmetric affinity functions

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

EECS 275 Matrix Computation

Statistical and Computational Analysis of Locality Preserving Projection

Nonlinear Manifold Learning Summary

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

Machine Learning (BSMC-GA 4439) Wenke Liu

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Chap.11 Nonlinear principal component analysis [Book, Chap. 10]

Spectral Dimensionality Reduction via Maximum Entropy

Locality Preserving Projections

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Dimensionality Reduction

Gaussian Process Latent Random Field

Learning Eigenfunctions Links Spectral Embedding

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Graph Metrics and Dimension Reduction

Large-Scale Manifold Learning

Manifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012

Principal Component Analysis

Learning gradients: prescriptive models

MLCC 2015 Dimensionality Reduction and PCA

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimensionality Reduction

Kernel Principal Component Analysis

The Curse of Dimensionality for Local Kernel Machines

Dimensionality Reduction: A Comparative Review

Advanced Machine Learning & Perception

Localized Sliced Inverse Regression

A graph based approach to semi-supervised learning

Fisher s Linear Discriminant Analysis

Statistical Learning. Dong Liu. Dept. EEIS, USTC

14 Singular Value Decomposition

Discriminative Direction for Kernel Classifiers

Linear Dimensionality Reduction

Table of Contents. Multivariate methods. Introduction II. Introduction I

Discriminant Uncorrelated Neighborhood Preserving Projections

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Graph-Laplacian PCA: Closed-form Solution and Robustness

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Kernel methods for comparing distributions, measuring dependence

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Lecture 7 Spectral methods

Preprocessing & dimensionality reduction

Manifold Regularization

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

L 2,1 Norm and its Applications

Lecture: Some Practical Considerations (3 of 4)

Discriminative K-means for Clustering

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Dimensionality Reduction:

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Dimensionality Reduction: A Comparative Review

Global Positioning from Local Distances

Linear and Non-Linear Dimensionality Reduction

Transcription:

Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30

Outline 1 PCA 2 Kernel PCA 3 Multidimensional scaling 4 Laplacian Eigenmaps 5 Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 2/30

PCA Guillaume Obozinski Unsupervised dimensionality reduction 4/30

A direction that maximizes the variance Data are points in R d. Looking for a direction v in R d such that the variance of the signals projected on v is maximized: Var((v x i )...n ) = 1 n = 1 n (v x i ) 2 v x i x i v = v ( 1 n = v Σ v x i x i ) v Need to solve max v 2 =1 v Σ v Solution: eigenvector associated to the largest eigenvalue of Σ Guillaume Obozinski Unsupervised dimensionality reduction 5/30

Principal directions Assume the design matrix is centered, i.e. X 1 = 0. Principal directions as eigenvectors of the covariance Consider the eigenvalue decomposition of Σ = X X: Σ = VS 2 V with S = Diag(s 1,..., s n ) and s 1... s n. The principal directions are the columns of V Principal directions as singular vectors of the design matrix Consider the singular value decomposition of X X = USV The principal directions are the right singular vectors. Guillaume Obozinski Unsupervised dimensionality reduction 6/30

Principal components Obtained by projection of the rows of X on V. But XV = USV V = US. So the principal components are obtained with the left singular vectors and the singular values. Guillaume Obozinski Unsupervised dimensionality reduction 7/30

Kernel PCA Guillaume Obozinski Unsupervised dimensionality reduction 9/30

Centering implicitly in feature space Assume that we use a mapping φ so that the representation of the data is the design matrix Then φ(x 1 ) Φ =.. φ(x n ) φ = 1 n φ(x i ) = Φ 1. So that if φ(x i ) = φ(x i ) φ then Φ = Φ 11 Φ = ( I n 1 n 11 ) Φ Finally, the center kernel matrix is computed as K = Φ Φ = HKH with H = I n 1 n 11. Guillaume Obozinski Unsupervised dimensionality reduction 10/30

Principal function in a RKHS (Schölkopf et al., 1998) Find a function f with f H = 1 that maximizes Equivalently, max f f, h xi 2 H. f (x i ) 2 s.t. f 2 H 1. By the representer theorem, f (x) = n j=1 α jk(x, x j ). So, f (x i ) 2 = = ( ) 2 α j K(x i, x j ) j=1 j,j =1 = α K K α α j α j K(x i, x j )K(x i, x j ) So the problem can be written as max α α K K α s.t. α K α. Guillaume Obozinski Unsupervised dimensionality reduction 11/30

Solution of kernel PCA Write K = US 2 U. If β = U α, then the problem is formulated as max β β2 i s 4 i s.t. β 2 i s 2 i 1 This is attained for β = ( 1 s 1, 0,..., 0) and thus α = 1 s 1 u 1. So the first principal function is f (x) = 1 s 1 U i1 K(x i, x) And the kth principal function is f (x) = 1 s k U ik K(x i, x) Guillaume Obozinski Unsupervised dimensionality reduction 12/30

Multidimensional scaling Guillaume Obozinski Unsupervised dimensionality reduction 14/30

Multidimensional scaling Goal: Given a collection of not necessarily Euclidean distances δ ij between pairs of points indexed by {1,..., n}. Construct a collection of points y i in a Euclidean space such that y i y j 2 δ ij Original formulation: Minimize a function called stress function ( ) 2 min yi y j 2 δ ij Y ij Classical formulation: min Y ( yi y j 2 2 δij 2 ij ) 2 Guillaume Obozinski Unsupervised dimensionality reduction 15/30

Centered kernel matrix from a Euclidean distance matrix Lemma If D 2 = ( dij 2 ) 1 i,j n is a matrix of squared Euclidean distances, then K = 1 2 HD 2H with H = I n 1 n 11, is the corresponding centered kernel matrix. Proof: d 2 ij = φ(x i ) φ(x j ) 2 2 = K ii + K jj 2K jj With κ = (K 11,..., K nn ), we have 2K = κ1 + 1κ D 2 K = HKH = 1 2 H ( κ1 + 1κ D 2 ) H. Guillaume Obozinski Unsupervised dimensionality reduction 16/30

Classical MDS algorithm Algorithm: 1 Compute K = 1 2 HD 2H. 2 Remove negative eigenvalues from K. 3 Solve kernel PCA on K If D 2 are Euclidean distances, step 2 is unnecessary and it can be shown that this solves the classical MDS problem. Guillaume Obozinski Unsupervised dimensionality reduction 17/30

Isomap (Tenenbaum et al., 2000) Algorithm: 1 Compute a k-nn graph on the data 2 Compute geodesic distances on the k-nn graph using the l 2 distance on each edge. 3 Apply classical MDS to the obtained distances Remarks: Isomap assumes that we can rely on the l 2 distance locally will fail if there are too many noise dimensions. geodesic distances can be computed with e.g. the Floyd Warshall algorithm Guillaume Obozinski Unsupervised dimensionality reduction 18/30

Laplacian Eigenmaps Guillaume Obozinski Unsupervised dimensionality reduction 20/30

Graph Laplacians Assume a similarity matrix W is available on the data, e.g. a kernel matrix such as W = (w ij ) 1 i,j n with ( w ij = exp 1 ) h x i x j 2 2 We can think of W as defining a weighted graph on the data. We say that a function is smooth on the weighted graph if its Laplacian L (f ) := 1 w ij (f (x i ) f (x j )) 2 2 ij is small. We say that a vector f is smooth on the weighted graph if its Laplacian L (f) := 1 w ij (f i f j ) 2 2 is small. ij Guillaume Obozinski Unsupervised dimensionality reduction 21/30

Laplacian and normalized Laplacian matrices Define D = Diag(d) with d i = j w ij. We then have L (f) = 1 w ij (f i f j ) 2 2 = 1 2 = 1 2 ij ij i w ij f 2 i + 1 2 d i f 2 i + 1 2 ij j w ij f 2 j d j f 2 j = f Df f Wf = f Lf Laplacian matrix: L Normalized Laplacian matrix: ij ij w ij f i f j w ij f i f j L := D 1 2 LD 1 2 = I D 1 2 WD 1 2. Guillaume Obozinski Unsupervised dimensionality reduction 22/30

Laplacian embeddings (Belkin and Niyogi, 2001) Principle: Given a weight matrix W find an embedding y i R K for point i that, given scaling and centering constraints on y i, solves We have min Y w ij y i y j 2 2 with Y = [ ] y 1... y n. ij w ij y i y j 2 2 = ( ) 2 w ij Yik Y ij ij ij k=1 K ( ) 2 = w ij Yik Y jk = k=1 ij K K Y k L Y k = tr(y L Y) k=1 Guillaume Obozinski Unsupervised dimensionality reduction 23/30

Laplacian embedding formulation min Y tr(y L Y) s.t. Y D Y = I, Y 1 = 0. With the change of variable Ỹ = D 1 2 Y, then Ỹ solves min Ỹ tr(ỹ L Ỹ) s.t. Ỹ Ỹ = I, Ỹ D 1 2 1 = 0. But LD 1 2 1 = 0, so the columns of Ỹ are the eigenvectors associated with the smallest eigenvalues except for the one D 1 2 1. Equivalently, the columns of Y are the solutions of the generalized eigenvalue problem Lu = λdu for the smallest generalized eigenvalues except for the one 1. The rows of the obtained matrix form the embedding. Guillaume Obozinski Unsupervised dimensionality reduction 24/30

Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 26/30

Locally Linear Embedding (Roweis and Saul, 2000) Let x 1,..., x p be a collection of vectors in R p. 1 Construct a k-nn graph 2 Approximate x i by a linear combination of its neighbors, by finding the vector of weights solving the constrained linear regressions: xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) 3 Set w ij = 0 for j / N (i). 4 Find a centered set of points y i in R d with white covariance, which minimizes 2 y i w ij y j j=1 2 Guillaume Obozinski Unsupervised dimensionality reduction 27/30

LLE step 2: constrained regressions xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) But if j N (i) w ij = 1, then x i 2 w ij x j = Need to solve j N (i) 2 j N (i) = j N (i) w ij (x i x j ) min u 1 2 u Ku s.t. u 1 = 1. 2 2 w ij w ik (x i x j ) (x i x k ) }{{} K (i) jk L(u, λ) = 1 2 u Ku λ(u 1 1) and u L = 0 Ku = λ1 Solved for u = K 1 1 1 K 1 1. Guillaume Obozinski Unsupervised dimensionality reduction 28/30

LLE step 4: final optimization problem min y 1,...,y n s.t. y i y i = 1, 2 w ij y j j=1 1 n 2 y i y i = I p. Equivalently, denoting Y = [ y 1... y n ], we have to solve Or, min Y min Y Y WY 2 F s.t. 1 Y = 0, Y (I n W) (I n W) Y s.t. Y1 = 0, 1 n Y Y = I p. 1 n Y Y = I p. So the columns of 1 n Y are the k eigenvectors associated with the smallest k non-zero eigenvalues. Guillaume Obozinski Unsupervised dimensionality reduction 29/30

References I Belkin, M. and Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, pages 585 591. Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323 2326. Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299 1319. Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319 2323. Guillaume Obozinski Unsupervised dimensionality reduction 30/30