Learning sets and subspaces: a spectral approach

Size: px

Start display at page:

Download "Learning sets and subspaces: a spectral approach"

Kimberly Cook
5 years ago
Views:

1 Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014

2 A world of data Data production grown exponentially in almost every field SCIENCE ENGINEERING ECONOMICS SOCIAL MEDIA How to interpret the data automatically? 2

3 Machine Learning POPULATION population property observation LEARNING THEORY LEARNING SYSTEM DATA Estimated property 3

4 Summary Intro Set Learning Subspace Learning Set Learning again Conclusions 4

5 Set Learning Data represented as points in a suitable space Goal: estimate the region of the space where the population lives Set learning algorithm Set learning theory DATA ESTIMATED REGION REGION of the population 5

6 Applications Anomaly, novelty, fault detection One class learning Shape reconstruction Dimensionality reduction, Representation learning 6

Analysis of a complex of statistical variables into principal components, 1933. Jolliffe.

7 Some algorithms Principal Component Analysis (1901) Convex Hull Estimator (1979) Devroye Wise Estimator (1980) 1. Pearson. On line and planes of closest fit to systems of points in space, Hotelling. Analysis of a complex of statistical variables into principal components, Jolliffe. Principal component analysis, 2005, (for reference). 2. Sager, An Iterative method for estimating a multivariate mode and isopleth, Devroye Wise, Detection of abnormal behavior via nonparametric estimation of the support,

8 Contribution: Unified Approach Data are points in a space The population is represented by a probability distribution collection of subsets of is the smallest set in where the population lives 8

9 Contribution: Unified Approach is the dataset (with ) is a distance on sets is the estimated region from the dataset The algorithm is able to learn the sets when 9

10 Z is in Unified approach Target class is well defined for any distribution when: All the sets of are closed is closed under the intersection In that case is named target class 10

11 Unified approach Target class (examples) Collection of linear subspaces of Collection of closed convex subsets of Collection of closed subsets of a metric space 11

12 Unified approach Target classes induced by functions Let F be a set of functions on Z constituted by sets of the form with is a target class 12

13 Unified approach Kernel target class Let be a feature map (injective) Let kernel function Let the set of all the linear combinations of vectors of the form The collection induced by is a target class 13

varieties (order p) Gaussian kernel Alg.

14 Unified approach Kernel target class (examples) Linear kernel Class of linear subspaces Quadratic kernel Quadrics and intersections Polynomial kernel Algebraic varieties (order p) Gaussian kernel Alg. varieties (any order) Abel kernel Closed sets 14

15 Separating property Each set in is represented by a unique linear subspace of SEPARATING PROPERTY De Vito, Rosasco, Toigo, Spectral Regularization for Support Estimation

16 Separating Property Thus (take ) If so then but because 16

17 Set Learning to Subspace Learning (and back) PCA PCA De Vito, Rosasco, Toigo, Spectral Regularization for Support Estimation

18 Summary Intro Set Learning Subspace Learning Set Learning again Conclusions 18

19 Subspace Learning Estimate the smallest linear subspace where the population lives Euclidean or generic Hilbert space distribution on smallest linear subspace containing the support of How to learn by using the data? 19

20 Spectral Estimator covariance of Empirical estimator Truncated estimator How fast the empirical estimator converges? How many eigenvectors k should we select? 20

21 Metrics on linear subspaces Generalized metric (Rudi 2013) Reconstruction error, commonly used in literature 2,3 Metric for support estimation 4 Other metrics 5 (gap distance, angular distance...) 1 Rudi, Canas, Rosasco, On the sample complexity of subspace learning, Shawe-Taylor et al, On the eigenspectrum of the gram matrix and the generalization error of kernel-pca, Blanchard et al, Statistical properties of kernel principal component analysis, De Vito et al, Learning sets with separating kernels, Beer, Topologies on closed and closed convex sets,

22 Main Result Theorem (Rudi 2013): with probability the k-th eigenvalue of generalization of the intrinsic dimension Rudi, Canas, Rosasco, On the sample complexity of subspace learning,

23 Learning rates (general metric) Corollary (Rudi 2013): When with with probability 23

24 Learning rates (PCA, kpca) distance k #components 24

25 Rates comparison on PCA, kpca Learning rate Note that 1 makes stronger assumptions (its best case showed, the worst is equal to 3) 1 Blanchard et al, Statistical properties of kernel principal component analysis, Rudi, Canas, Rosasco, On the sample complexity of subspace learning, Shawe-Taylor et al, On the eigenspectrum of the gram matrix and the generalization error of kernel-pca, r eigenvalue decay order of C 25

26 Summary Intro Set Learning Subspace Learning Set Learning again Conclusions 26

27 Learning rates (SSE metric on subspaces) distance Learning rate #components Eigenvalue decay order of C Left: Upper bounds of the distance w.r.t. the number of components Right: Learning rate with c depending on the eigenvalue decay order r of C 1 Rudi, Canas, Rosasco, On the Sample complexity of Subspace Learning, De Vito, Rosasco, Toigo, Spectral Regularization for Support Estimation

28 Experiments on real datasets 28

29 Experiments on real datasets 29

30 Conclusions Contributions: Wide family of set learning problems has been parametrized by a kernel subspace learning Study of subspace learning problem on arbitrary Hilbert spaces, with new and sharper learning rates valid for a wide family of metrics Future work: Generalize the theory to similar geometrical problems clustering Analyze in a unified way geometrical problems of unsupervised learning Thank you! 30

31 Publications Book Chapters: Rudi, Canas, De Vito, Rosasco. Learning Sets and Subspaces, in Regularization, Optimization, Kernels and Support Vector Machines, Chapman & Hall/CRC Machine Learning and Pattern Recognition Series, 2014 International Journals: Rudi, Odone, De Vito. Geometrical and computational aspects of spectral support estimation for novelty detection. Pattern Recognition Letters, International Conferences: Rudi, Canas, Rosasco. On the sample complexity of subspace learning, Neural Information Processing Systems, NIPS Rudi, Canas, Rosasco. Subspace learning and empirical operator estimation, Regularization, Optimization, Kernels and SVMs, ROKS Rudi, Chiusano, Verri. Adaptive optimization for cross validation. European Symposium on Artificial Neural Networks, ESANN

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation