Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec.

Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time series; etc. are very high dimensional in nature. Curse of dimensionality hinders the analysis of such datasets: 1. Poor statistical performance; 2. Unmanageable computational complexity.

Motivation However, the apparent high complexity of such signals is often just an artifact of the measurement process and its data representation. not all inputs carry independent information! 1 1 1 1 1 2 2 2 2 2 1 2 1 2 1 2 1 2 1 2 1 1 1 1 1 2 2 2 2 2 1 2 1 2 1 2 1 2 1 2 28x28 pixels images embedded in

Motivation Approach: design dimensionality reduction algorithms that learn compact representations of high dimensional data. For what purpose? 1 1 1 1 2 2 2 2 1 2 1 2 1 2 1 2 1 2 3 Estimate parameters that produced the data set Classify data

Outline 1. Finding Linear Data Representations: i. Principal Component Analysis (PCA) ii. Multidimensional Scaling (MDS) 2. From Linear to Nonlinear: MDS to ISOMAP 3. Spectral Graph Methods: Laplacian Eigenmaps 4. Adding Additional Constraints: The Supervised and Semi-Supervised Learning Problems.

Setup Assumption: The set of data points lives on a compact m-dimensional manifold 1 Swiss roll 2D manifold in

Manifold Learning Manifold learning problem setup: Input: a finite sampling of a m-dimensional manifold. Output: embedding of into a subset (usually ) without any prior knowledge about.

Background on Manifold Learning Reconstructing the mapping and attributes of the manifold from a finite dataset falls into the general manifold learning problem. Manifold reconstruction: 1. ISOMAP, Tenenbaum, de Silva, Langford (); 2. Locally Linear Embeddings (LLE), Roweiss, Saul (); 3. Laplacian Eigenmaps, Belkin, Niyogi (2); 4. Hessian Eigenmaps (HLLE), Grimes, Donoho (3);. Local Space Tangent Alignment (LTSA), Zhang, Za (3); 6. SemiDefinite Embedding (SDE), Weinberger, Saul (4), Sun, Boyd, Xiao, Diaconis (4).

Principal Component Analysis Given n data points,, what is the best approximating linear subspace of dimension m?

Principal Component Analysis 2. PCA as an optimization problem: a. objective function: where is the Graph Laplacian.

Manifold Learning and Classification 1 8 points uniform on Swiss roll, 4 each class

Manifold Learning and Classification Popular manifold learning algorithms: 1 1.6.4.2.2.4.6 1 6 4 4 6 ISOMAP.8.8.6.4.2.2.4.6 Laplacian Eigenmaps

Laplacian Eigenmaps Laplacian Eigenmaps: preserving local information (Belkin & Niyogi 2) 1. Constructing an Adjacency Graph: a. compute a k-nn graph on the dataset; b. compute a similarity/weight matrix W between data points, that encodes neighborhood information (e.g., heat kernel):

Laplacian Eigenmaps 2. Manifold learning as an optimization problem: a. objective function: where is the Graph Laplacian. b. embedding is solution of ( )

Laplacian Eigenmaps 3. Eigenmaps: a. solution to ( ) is given by the m generalized eigenvectors associated with the m smallest generalized eigenvalues that solve : or equivalent eigenvectors of the normalized Graph Laplacian b. if is the collection of such eigenvectors, then the embedded points are given by

Manifold Learning and Classification Adding class dependent constraints virtual class vertices. 1

Manifold Learning and Classification 1. If C is the class membership matrix (i.e., c_ij = 1 if point j is from class i), define the objective function: where centers and, are the virtual class is a regularization parameter. 2. Embedding is now solution of where L is Laplacian of augmented weight matrix

Manifold Learning and Classification 1.6.4.2.2.4.6 1 1 6 4 4 6.8.8.6.4.2.2.4.6 ISOMAP Laplacian Eigenmaps.3.2.2.1.1.. Classification Constrained Dimensionality Reduction.1.1.2.2.1.1...1.1.2

Manifold Learning and Classification Error rates for k-nn classifier using pre-processing dimensionality reduction versus full dimensional data

Manifold Learning and Classification Semi-Supervised Learning on Manifolds unlabeled samples labeled samples 1

Manifold Learning and Classification Algorithm: 1. Compute the constrained embedding of the entire data set, inserting a zero column in C for each unlabeled sample. labeled samples unlabeled samples

Manifold Learning and Classification 2. Fit a (e.g., linear) classifier to the labeled embedded points by minimizing a loss function (e.g., quadratic): 3. For an unlabeled point, label it using the fitted (linear) classifier:

Manifold Learning and Classification 6 Error rate (%) 4 3 k NN Laplacian CCDR 3 4 6 7 Number of labeled points Percentage of errors for labeling unlabeled samples as a function of the number of labeled points, out of a total of points on the Swiss roll.

Summary and Ongoing Work 1. Preservation of local geometric structures and class label information; 2. Optimization problem with global minimum; 3. Applicable to both supervised and semi-supervised learning paradigms; Out-of-sample extension; Connections between classification, dimensionality reduction and dimensionality expansion via kernel methods