Dimension Reduction and Low-dimensional Embedding

Similar documents
Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Non-linear Dimensionality Reduction

Nonlinear Dimensionality Reduction

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Nonlinear Dimensionality Reduction

Principal Component Analysis and Linear Discriminant Analysis

LECTURE NOTE #11 PROF. ALAN YUILLE

Statistical Pattern Recognition

Lecture 10: Dimension Reduction Techniques

Manifold Learning and it s application

Nonlinear Manifold Learning Summary

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Unsupervised dimensionality reduction

Manifold Learning: Theory and Applications to HRI

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

L26: Advanced dimensionality reduction

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Preprocessing & dimensionality reduction

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Intrinsic Structure Study on Whale Vocalizations

Apprentissage non supervisée

Nonlinear Dimensionality Reduction. Jose A. Costa

1 Principal Components Analysis

Principal Component Analysis

PCA and admixture models

Linear Dimensionality Reduction

Principal Component Analysis (PCA)

Advanced Machine Learning & Perception

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Dimensionality Reduction AShortTutorial

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Statistical Machine Learning

Lecture: Some Practical Considerations (3 of 4)

ISOMAP TRACKING WITH PARTICLE FILTER

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

ECE 661: Homework 10 Fall 2014

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Dimensionality Reduction

Dimensionality Reduction: A Comparative Review

Principal Components Analysis. Sargur Srihari University at Buffalo

Manifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012

Robust Laplacian Eigenmaps Using Global Information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Dimensionality Reduc1on

Dimensionality Reduction: A Comparative Review

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Dimensionality Reduction

Statistical and Computational Analysis of Locality Preserving Projection

Maximum variance formulation

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Unsupervised Learning: Dimensionality Reduction

DIMENSION REDUCTION. min. j=1

Lecture: Face Recognition

Computation. For QDA we need to calculate: Lets first consider the case that

Deriving Principal Component Analysis (PCA)

A Duality View of Spectral Methods for Dimensionality Reduction

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

PCA and LDA. Man-Wai MAK

LECTURE 16: PCA AND SVD

A Duality View of Spectral Methods for Dimensionality Reduction

15 Singular Value Decomposition

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Distance Preservation - Part 2

14 Singular Value Decomposition

EECS 275 Matrix Computation

Distance Preservation - Part I

Data dependent operators for the spatial-spectral fusion problem

Advanced Introduction to Machine Learning CMU-10715

Data-dependent representations: Laplacian Eigenmaps

Supplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Dimensionality reduction. PCA. Kernel PCA.

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Spectral Clustering. by HU Pili. June 16, 2013

Principal Component Analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Manifold Regularization

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Principal Component Analysis

Machine learning for pervasive systems Classification in high-dimensional spaces

Motivating the Covariance Matrix

Covariance and Correlation Matrix

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction

DIDELĖS APIMTIES DUOMENŲ VIZUALI ANALIZĖ

Dimensionality Reduction:

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Transcription:

Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26

Dimension Reduction High-dimensional raw data Difficult to visualize Difficult to find useful and meaningful information Uncertainties vary for different features Features may be correlated Low-dimensional structures The structures can be actually simple and linear They can also be complicated and nonlinear Can we project them to a low-dimensional space? 2/26

What to Preserve? Information loss in dimension reduction So, what do we want to preserve? This is critical How do we go from high-dim to low-dim? Linear vs. nonlinear 3/26

Outline Principal Component Analysis (PCA) Metric Multidimensional Scaling (MDS) Isometric Feature Mapping (ISOMAP) Locally Linear Embedding (LLE) 4/26

PCA Revisit Learning linear principal components from {x 1,...,x N }: N 1. calculating m = 1 N x k k=1 2. centering A = [x 1 m,...,x N m] N 3. calculating S = (x k m)(x k m) T = AA T k=1 4. eigenvalue decomposition 5. sorting λ i and e i 6. finding the bases Note: The components for x is S = U T ΣU W = [e 1,e 2,...,e m ] y = W T (x m), where x R n and y R m 5/26

PCA: Preserve the Variance We have a linear projection of x to a 1-d subspace y = w T x The first principal component of x is such that the variance of the projection y is maximized (we need to constrain w to be a unit vector.) so we have the following optimization problem max w J(w) = E{y2 } = E{(w T x) 2 } = w T Sw, s.t. w T w = 1 The sorted eigenvalues of S are λ 1 λ 2... λ n, and eigenvectors are {e 1,...,e n }. It is clearly that the first PC is y 1 = e T 1 x This can be generalized to m PCs (where m < n) 6/26

Outline Principal Component Analysis (PCA) Metric Multidimensional Scaling (MDS) Isometric Feature Mapping (ISOMAP) Locally Linear Embedding (LLE) 7/26

Formulation We have a set of samples {x 1,...,x n } in a high-dim space And we know their dissimilarity, i.e., pair-wise distance d ij = dist(x i,x j ) We want to find their projections {y 1,...,y n } in a low-dim linear subspace in which the dissimilarity is preserved δ ij = dist(y i,y j ) In other words, we want to reconstruct the configurations of this set of points in a low-dim space If we use the Euclidean distance, we have d 2 ij = (x i x j ) T (x i x j ) 8/26

Prerequisite: Centering Matrix In R n, denote by 1 1 =. 1 n 1, and H = I n 1 n 11T where I n is an identity matrix of size n Based the centering matrix H, give a vector x R n Hx = x ( ) 1 n 1T x 1 easy to see 1 n 1T x is the mean of the vector Its use is to make easier matrix manipulations Suppose we have X = [x 1,...,x n ] T Discuss the effects and difference of HX and XH So, what is (HX)(HX) T? 9/26

Classical Scaling Algorithm Suppose we have X = [x 1,...,x n ] T R n R d Given δ ij, construct a dissimilarity matrix A n n = { 1 2 δ ij} Let s do B n n = HAH What does B mean 1? Perform EVD on B B = UΣU T If d < n, B has n d zero eigenvalues Ordering the eigenvalues by λ 1..., λ n, then B = U d Σ d U T d If we use the k largest eigenvalues, we can have the k-dim reconstruction Y = U k Σ 1/2 k 1 Prove b ij = (x i x) T (x j x), where x = 1 n n i=1 x i, and B = (HX)(HX) T 10/26

Relation to PCA It is clear that B is the scatter matrix Let s see what PCA does. In PCA, we have the covariance matrix S = X T HX Actually, B and S are dual Suppose Sv = λv, we have BXv = λxv i.e., u = Xv is an eigenvector of B In PCA, the low-dimensional projection is Y = XV k This is U k (before normalization) This is what we had in MDS! 11/26

Outline Principal Component Analysis (PCA) Metric Multidimensional Scaling (MDS) Isometric Feature Mapping (ISOMAP) Locally Linear Embedding (LLE) 12/26

Motivation: Nonlinear Intrinsic Structures Both PCA and MDS are linear embedding What if the intrinsic structure is nonlinear 13/26

From Euclidean Distance to Geodesic Distance Sometimes Euclidean distance does not make sense If two points are located on a nonlinear surface, their Euclidean distance can be small, although they are far away (e.g., in Figure A) We need to consider the geodesic distance (in Figure B) We want to unfold the nonlinear surface preserving the geodesic distances 14/26

Computing Geodesic Distance This is the most important step in ISOMAP Given x i, find its close neighboring points (based on Euclidean distance) Euclidean distance approximates geodesic distance for neighboring points Construct a weighted graph based on these neighboring relationships For any two faraway point, find the shortest path connecting them, and sum up the distance over the path This can be done efficiently by any shortest-path algorithms We end up with a matrix D G, where D G (i, j) is the geodesic distance between x i and x j 15/26

Unfolding the Nonlinear Manifold Once D G is obtained, the rest is MDS i.e., find a low-dim configuration {y 1,...,y n } the preserve these pair-wise geodesic distance This can be easily done B = HD G H B = U k Σ k U T k centering EVD and Y = U k Σ 1/2 k principal coordinates 16/26

Summary: ISOMAP S1: construct neighborhood graph S2: approximate geodesic distance and obtain the pair-wise dissimilarity matrix D G S3: applying MDS on D G 17/26

Example: head pose and lighting 18/26

Outline Principal Component Analysis (PCA) Metric Multidimensional Scaling (MDS) Isometric Feature Mapping (ISOMAP) Locally Linear Embedding (LLE) 19/26

Motivation Nonlinear low-dimensional intrinsic structure The structure of a local neighborhood is linear! 20/26

What to Preserve Local linear reconstruction ˆx i W ij x j j N(i) Preserve local relationship y i = W ij y j j N(i) 21/26

Computing the Local Combination Given a set of high-dim vectors {x 1,...,x n } x i, find its neighbors N(i) Our goal n W = arg min x i W ij x j 2 W i=1 j N(i) W ij = 0 if x j / N(i) s.t. W ij = 1 j Once we find N(i), we can estimate the weights one by one for each x i This is a constrained least-squares fitting problem 22/26

Weighted Least-squares Fitting Let s consider x and its k-nn A = [x 1,...,x t ], where all x t N(x) Introduce a local covariance matrix C C jk = (x x j ) T (x x k ) We can rewrite the reconstruction error for x e(w) = x Aw 2 = (x1 T A)w 2 = w T Cw Construct the Lagrangian It easy to see To see it clearly L(w, λ) = w T Cw + λ(1 T w 1) w = C 1 1 1 T C 1 1 w j = k C 1 jk l m C 1 lm 23/26

Low-dimensional Embedding Denote the low-dim vectors by Y = [y 1,...,y n ] W.l.g, we assume they are centered to 0 and have unit covariance, i.e., Y1 = 0, YY T = I Our reconstruction problem is Y = arg min Y YW 2 = tr(ymy T ) Y s.t. YY T = I where M = (I W) T (I W) 24/26

Still an EVD Problem! We have the Lagrangian Partial derivative w.r.t y i See what we have here! L(Y, λ) = tr(ymy T ) + λ(i YY T ) L(Y, λ) y i = 2(My i λy i ) My i = λy i As we are minimizing it, we need to use the smallest eigenvalues Suppose d is the dimension of the low-dim space, we take the d + 1 smallest eigenvalues discard the bottom one (trivial solution) keep the rest d eigenvalues, and their corresponding eigenvectors are our low-dim reconstruction! 25/26

Example (head pose and facial expression) 26/26