Apprentissage non supervisée

Size: px

Start display at page:

Download "Apprentissage non supervisée"

Julia Howard
5 years ago
Views:

1 Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD

2 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let s look at the Mean-Square-Error : E[ˆf h (x) f (x)] 2 Histogramme with p = 1 et h : MISE C/n 2/3 KDE with p = 1 : MISE C/n 4/5 KDE in p : MSE C/n 4/(4+p) So when p grows, the estimator is less attractive. This is a commonc behaviour in data analysis : we just met the curse of dimensionality!

3 Curse of dimensionality (Bellman, 1961) When p increases, the volume of the space increases so fast that the available data become sparse. Data needed to support a reliable result often grows exponentially with p.

4 High-dimensional spaces The curse of dimensionality Empty space phenomenon Norm concentration phenomenon And more funny things A hypercube looks like a sea urchin (many spiky corners!) Hypercube corners collapse towards the center in any projection The volume of a unit hypersphere tends to zero The sphere volume concentrates in a thin shell Tails of a Gaussian get heavier than the central bell Hopefully data convey some information / structure clusters of data manifold data Possible solutions are clustering, dimensionality reduction,...

5 Dimensionality reduction Some notation : Input data : x 1, x 2,..., x n R p Output data : f 1, f 2,..., f n R d, d p We want Observations close on R p should be close on R d Observations distant on R p should be distant on R d We ll try Linear methods (PCA, MDS) Nonlinear methods (IsoMap, LLE, EigenMaps)

6 PCA Pearson, 1901 ; Hotelling, 1933 ; Karhunen, 1946 ; Loève, Idea Decorrelate zero-mean data Keep large variance axes Fit a plane though the data cloud and project Representation quality

7 Assume inputs are centered (i.e. i x i = 0) Given a unit vector u and a point x, the length of the projection of x onto u is given by x T u Maximize projected variance The inner matrix is called Gramm matrix G = 1 n i x ixi T. Maximizing u T Gu s.t. u = 1 gives the principal eigenvector of G.

8 To project the data into a p dimensional subspace (d p) we take u 1,..., u d the top d eigenvectors of G (which forms a orthogonal basis) The low dimensional outputs are y i = (u T 1 x i, u T 2 x i,..., u T d x i) T How to interpret the PCA : Eigenvectors : principal axes of maximum variance subspace. Eigenvalues : variance of projected inputs along principle axes. Estimated dimensionality : number of significant (nonnegative) eigenvalues.

11 Multidimensional Scaling (MDS) Preserve pairwise distances Projet n points in an Euclidean space (e.g. R 2 ) using only information about the pairwise distances. Source :

12 MDS Input : a distance matrix Recall : A square matrix D of order n is a distance matrix if it is symmetric, d ii = 0 and d ij >= 0, i j. Aim : find the n data points y 1,..., y n in d dimensions such that y i y j 2 is similar to d ij. Let d (X ) ij be the original distances and d (Y ) ij the new ones, then one wants to min y 1,...,y n n n i=1 j=1 (d (X ) ij d (Y ) ij ) 2

13 Metric MDS Let 1 be a vector of ones Centering matrix H = I 1 n 11T Let A be a square matrix of order n with a ij = d2 ij 2 Then, we define the double certered matrix B B = HAH T B is a Gram matrix (SPD) iff D is an Euclidean distance matrix

14 Metric MDS If B is a Gram matrix we have B = (HX )(HX ) T Using SVD on B we have B = U U T The columns of Y = U 1/2 give the coordinates of the euclidean representation. Algorithm Construct A Compute B = HAH T SVD of B to get B = U U T Obtain Y = U 1/2

15 Metric MDS Interpreting MDS Eigenvectors : Ordered, scaled, and truncated to yield low dimensional embedding. Eigenvalues : Measure how each dimension contributes to dot products. Estimated dimensionality : Number of significant (nonnegative) eigenvalues.

17 Non linear structure

18 Graph-Based Methods Tenenbaum et. al s Isomap Algorithm Global approach Preserves global pairwise distances. Roweis and Saul s Locally Linear Embedding Algorithm Local approach Nearby points should map nearby Belkin and Niyogi Laplacian Eigenmaps Algorithm Local approach minimizes approximately the same value as LLE

19 ISOMAP Algorithm Compute the k-nearest neighbours Obtain the shortest paths through graph MDS on geodesic distances

20 Non linear structure

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)