Apprentissage non supervisée

Similar documents
Non-linear Dimensionality Reduction

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Unsupervised dimensionality reduction

Lecture 10: Dimension Reduction Techniques

LECTURE NOTE #11 PROF. ALAN YUILLE

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Dimensionality Reduction AShortTutorial

L26: Advanced dimensionality reduction

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Manifold Learning and it s application

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Manifold Learning: Theory and Applications to HRI

Nonlinear Dimensionality Reduction. Jose A. Costa

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction

Dimension Reduction and Low-dimensional Embedding

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimensionality Reduc1on

Intrinsic Structure Study on Whale Vocalizations

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Statistical Pattern Recognition

Data-dependent representations: Laplacian Eigenmaps

Manifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Nonlinear Manifold Learning Summary

EECS 275 Matrix Computation

Lecture: Some Practical Considerations (3 of 4)

Maximum variance formulation

Locality Preserving Projections

Graph Metrics and Dimension Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

15 Singular Value Decomposition

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

A Duality View of Spectral Methods for Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

A Duality View of Spectral Methods for Dimensionality Reduction

Robust Laplacian Eigenmaps Using Global Information

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague

Learning a kernel matrix for nonlinear dimensionality reduction

Statistical and Computational Analysis of Locality Preserving Projection

DIMENSION REDUCTION. min. j=1

Fisher s Linear Discriminant Analysis

Data dependent operators for the spatial-spectral fusion problem

There are two things that are particularly nice about the first basis

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

Statistical Machine Learning

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advanced Machine Learning & Perception

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Preprocessing & dimensionality reduction

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

ISOMAP TRACKING WITH PARTICLE FILTER

Robustness of Principal Components

Dimensionality Reduction

Dimensionality Reduction and Principle Components

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Graph-Laplacian PCA: Closed-form Solution and Robustness

Part I Generalized Principal Component Analysis

1 Singular Value Decomposition and Principal Component

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

14 Singular Value Decomposition

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Lecture 3: Review of Linear Algebra

Methods for sparse analysis of high-dimensional data, II

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Dimensionality Reduction:

Principal Component Analysis!! Lecture 11!

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Chapter XII: Data Pre and Post Processing

Statistical Learning. Dong Liu. Dept. EEIS, USTC

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

EECS 275 Matrix Computation

1 Principal Components Analysis

Dimensionality Reduction: A Comparative Review

Large-Scale Manifold Learning

Spectral Dimensionality Reduction

Dimensionality Reduction and Principal Components

Discriminative K-means for Clustering

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Metric Learning on Manifolds

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Bi-stochastic kernels via asymmetric affinity functions

Methods for sparse analysis of high-dimensional data, II

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Is Manifold Learning for Toy Data only?

Chap.11 Nonlinear principal component analysis [Book, Chap. 10]

Diffusion Wavelets and Applications

EM Algorithm & High Dimensional Data

Dimensionality Reduction: A Comparative Review

PCA, Kernel PCA, ICA

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,

Principal Component Analysis

Transcription:

Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016

From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let s look at the Mean-Square-Error : E[ˆf h (x) f (x)] 2 Histogramme with p = 1 et h : MISE C/n 2/3 KDE with p = 1 : MISE C/n 4/5 KDE in p : MSE C/n 4/(4+p) So when p grows, the estimator is less attractive. This is a commonc behaviour in data analysis : we just met the curse of dimensionality!

Curse of dimensionality (Bellman, 1961) When p increases, the volume of the space increases so fast that the available data become sparse. Data needed to support a reliable result often grows exponentially with p.

High-dimensional spaces The curse of dimensionality Empty space phenomenon Norm concentration phenomenon And more funny things A hypercube looks like a sea urchin (many spiky corners!) Hypercube corners collapse towards the center in any projection The volume of a unit hypersphere tends to zero The sphere volume concentrates in a thin shell Tails of a Gaussian get heavier than the central bell Hopefully data convey some information / structure clusters of data manifold data Possible solutions are clustering, dimensionality reduction,...

Dimensionality reduction Some notation : Input data : x 1, x 2,..., x n R p Output data : f 1, f 2,..., f n R d, d p We want Observations close on R p should be close on R d Observations distant on R p should be distant on R d We ll try Linear methods (PCA, MDS) Nonlinear methods (IsoMap, LLE, EigenMaps)

PCA Pearson, 1901 ; Hotelling, 1933 ; Karhunen, 1946 ; Loève, 1948. Idea Decorrelate zero-mean data Keep large variance axes Fit a plane though the data cloud and project Representation quality

Assume inputs are centered (i.e. i x i = 0) Given a unit vector u and a point x, the length of the projection of x onto u is given by x T u Maximize projected variance The inner matrix is called Gramm matrix G = 1 n i x ixi T. Maximizing u T Gu s.t. u = 1 gives the principal eigenvector of G.

To project the data into a p dimensional subspace (d p) we take u 1,..., u d the top d eigenvectors of G (which forms a orthogonal basis) The low dimensional outputs are y i = (u T 1 x i, u T 2 x i,..., u T d x i) T How to interpret the PCA : Eigenvectors : principal axes of maximum variance subspace. Eigenvalues : variance of projected inputs along principle axes. Estimated dimensionality : number of significant (nonnegative) eigenvalues.

Multidimensional Scaling (MDS) Preserve pairwise distances Projet n points in an Euclidean space (e.g. R 2 ) using only information about the pairwise distances. Source : http://www.benfrederickson.com

MDS Input : a distance matrix Recall : A square matrix D of order n is a distance matrix if it is symmetric, d ii = 0 and d ij >= 0, i j. Aim : find the n data points y 1,..., y n in d dimensions such that y i y j 2 is similar to d ij. Let d (X ) ij be the original distances and d (Y ) ij the new ones, then one wants to min y 1,...,y n n n i=1 j=1 (d (X ) ij d (Y ) ij ) 2

Metric MDS Let 1 be a vector of ones Centering matrix H = I 1 n 11T Let A be a square matrix of order n with a ij = d2 ij 2 Then, we define the double certered matrix B B = HAH T B is a Gram matrix (SPD) iff D is an Euclidean distance matrix

Metric MDS If B is a Gram matrix we have B = (HX )(HX ) T Using SVD on B we have B = U U T The columns of Y = U 1/2 give the coordinates of the euclidean representation. Algorithm Construct A Compute B = HAH T SVD of B to get B = U U T Obtain Y = U 1/2

Metric MDS Interpreting MDS Eigenvectors : Ordered, scaled, and truncated to yield low dimensional embedding. Eigenvalues : Measure how each dimension contributes to dot products. Estimated dimensionality : Number of significant (nonnegative) eigenvalues.

Non linear structure

Graph-Based Methods Tenenbaum et. al s Isomap Algorithm Global approach Preserves global pairwise distances. Roweis and Saul s Locally Linear Embedding Algorithm Local approach Nearby points should map nearby Belkin and Niyogi Laplacian Eigenmaps Algorithm Local approach minimizes approximately the same value as LLE

ISOMAP Algorithm Compute the k-nearest neighbours Obtain the shortest paths through graph MDS on geodesic distances

Non linear structure