Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Similar documents
Dimensionality Reduc1on

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Non-linear Dimensionality Reduction

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Data-dependent representations: Laplacian Eigenmaps

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Nonlinear Dimensionality Reduction

Intrinsic Structure Study on Whale Vocalizations

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Unsupervised dimensionality reduction

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Data dependent operators for the spatial-spectral fusion problem

LECTURE NOTE #11 PROF. ALAN YUILLE

Lecture 10: Dimension Reduction Techniques

Apprentissage non supervisée

Dimension Reduction and Low-dimensional Embedding

Robust Laplacian Eigenmaps Using Global Information

Manifold Learning and it s application

Statistical Pattern Recognition

Manifold Learning: Theory and Applications to HRI

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

PCA, Kernel PCA, ICA

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Statistical and Computational Analysis of Locality Preserving Projection

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Statistical Machine Learning

Principal Component Analysis

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Spectral Clustering. by HU Pili. June 16, 2013

Graph Metrics and Dimension Reduction

Locality Preserving Projections

Advanced Machine Learning & Perception

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Manifold Regularization

DIMENSION REDUCTION. min. j=1

Dimensionality Reduction AShortTutorial

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Dimensionality Reduction:

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Nonlinear Manifold Learning Summary

Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

Graph-Laplacian PCA: Closed-form Solution and Robustness

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

CMPSCI 791BB: Advanced ML: Laplacian Learning

Multiscale Manifold Learning

Informative Laplacian Projection

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Localized Sliced Inverse Regression

Is Manifold Learning for Toy Data only?

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Manifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012

Principal Component Analysis

Lecture: Some Practical Considerations (3 of 4)

Fisher s Linear Discriminant Analysis

Discriminant Uncorrelated Neighborhood Preserving Projections

Learning gradients: prescriptive models

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Diffusion Wavelets and Applications

MATH 829: Introduction to Data Mining and Analysis Clustering II

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Predicting Graph Labels using Perceptron. Shuang Song

Local Learning Projections

The Curse of Dimensionality for Local Kernel Machines

L26: Advanced dimensionality reduction

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota

Spectral Dimensionality Reduction via Maximum Entropy

Distance Preservation - Part 2

Introduction to Machine Learning

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Learning a kernel matrix for nonlinear dimensionality reduction

Supplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis

EECS 275 Matrix Computation

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang

Linear Dimensionality Reduction

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples

Advanced data analysis

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Data Preprocessing Tasks

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

A Tour of Unsupervised Learning Part I Graphical models and dimension reduction

Finding High-Order Correlations in High-Dimensional Biological Data

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Dimensionality Reduction: A Comparative Review

Dimension Reduction (PCA, ICA, CCA, FLD,

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Graphs, Geometry and Semi-supervised Learning

A Duality View of Spectral Methods for Dimensionality Reduction

TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION. Shiliang Sun

Transcription:

Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27

Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all points Laplacian Eigenmaps (key idea) preserve local information only Construct graph from data points (capture local information) Project points into a low-dim space using eigenvectors of the graph

Step 1 - Graph Construction Similarity Graphs: Model local neighborhood relations between data points G(V,E) V Vertices (Data points) (1) E Edge if xi xj ε ε neighborhood graph (2) E Edge if k-nn, yields directed graph connect A with B if A B OR A B (symmetric knn graph) connect A with B if A B AND A B (mutual knn graph) Directed nearest neighbors (symmetric) knn graph mutual knn graph

Step 1 - Graph Construction Similarity Graphs: Model local neighborhood relations between data points Choice of ε and k : Chosen so that neighborhood on graphs represent neighborhoods on the manifold (no shortcuts connect different arms of the swiss roll) Mostly ad-hoc

Step 1 - Graph Construction Similarity Graphs: Model local neighborhood relations between data points G(V,E,W) V Vertices (Data points) E Edges (nearest neighbors) W - Edge weights E.g. 1 if connected, 0 otherwise (Adjacency graph) Gaussian kernel similarity function (aka Heat kernel) s 2 results in adjacency graph

Step 2 Embed using Graph Laplacian Graph Laplacian (unnormalized version) L = D W. W Weight matrix D Degree matrix = diag(d 1,., d n ) Note: If graph is connected, 1 is an eigenvector

Step 2 Embed using Graph Laplacian Graph Laplacian (unnormalized version) L = D W Solve generalized eigenvalue problem Order eigenvalues 0 = l 1 l 2 l 3 l n To embed data points in d-dim space, project data points onto eigenvectors associated with l 2, l 3,, l d+1 ignore 1 st eigenvector same embedding for all points Original Representation Transformed representation data point projections x i (f 2 (i),, f d+1 (i)) (D-dimensional vector) (d-dimensional vector)

Understanding Laplacian Eigenmaps Best projection onto a 1-dim space Put all points in one place (1 st eigenvector all 1s) If two points are close on graph, their embedding is close (eigenvector values are similar captured by smoothness of eigenvectors) Laplacian eigenvectors of swiss roll example (for large # data points)

Step 2 Embed using Graph Laplacian Justification points connected on the graph stay as close as possible after embedding RHS = f T (D-W) f = f T D f - f T W f = LHS

Step 2 Embed using Graph Laplacian Justification points connected on the graph stay as close as possible after embedding constraint removes arbitrary scaling factor in embedding Lagrangian: Wrap constraint into the objective function

Example Unrolling the swiss roll N=number of nearest neighbors, t = the heat kernel parameter (Belkin & Niyogi 03)

Example Understanding syntactic structure of words 300 most frequent words of Brown corpus Information about the frequency of its left and right neighbors (600 Dimensional space.) The algorithm run with N = 14, t = 1 verbs prepositions

PCA vs. Laplacian Eigenmaps PCA Linear embedding based on largest eigenvectors of D x D correlation matrix S = XX T between features Laplacian Eigenmaps Nonlinear embedding based on smallest eigenvectors of n x n Laplacian matrix L = D W between data points eigenvectors give latent features eigenvectors directly give - to get embedding of points, embedding of data points project them onto the latent features x i [v1 T xi, v2 T xi, vd T xi] T x i [f 2 (i),, f d+1 (i)] T D x1 d x1 D x1 d x1

Dimensionality Reduction Methods Feature Selection - Only a few features are relevant to the learning task Score features (mutual information, prediction accuracy, domain knowledge) Regularization Latent features Some linear/nonlinear combination of features provides a more efficient representation than observed feature Linear: Low-dimensional linear subspace projection PCA (Principal Component Analysis), MDS (Multi Dimensional Scaling), Factor Analysis, ICA (Independent Component Analysis) Nonlinear: Low-dimensional nonlinear projection that preserves local information along the manifold Laplacian Eigenmaps ISOMAP, Kernel PCA, LLE (Local Linear Embedding), Many, many more 40