Convergence of Eigenspaces in Kernel Principal Component Analysis

Similar documents
Principal Component Analysis

Learning sets and subspaces: a spectral approach

MLCC 2015 Dimensionality Reduction and PCA

Chapter 3 Transformations

Kernel Principal Component Analysis

Approximate Kernel Methods

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Approximate Kernel PCA with Random Features

Principal Component Analysis

Introduction to Machine Learning

Analysis Preliminary Exam Workshop: Hilbert Spaces

1 Principal Components Analysis

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

5 Compact linear operators

Principal Component Analysis

Statistical Machine Learning

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

COMPACT OPERATORS. 1. Definitions

Spectral Regularization

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Online Kernel PCA with Entropic Matrix Updates

Regularization via Spectral Filtering

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

MA 265 FINAL EXAM Fall 2012

PCA, Kernel PCA, ICA

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Linear Algebra and Dirac Notation, Pt. 2

LECTURE 7. k=1 (, v k)u k. Moreover r

Inverse Power Method for Non-linear Eigenproblems

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction

Lecture notes: Applied linear algebra Part 1. Version 2

Immediate Reward Reinforcement Learning for Projective Kernel Methods

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.

Spectral theory for compact operators on Banach spaces

1. Mathematical Foundations of Machine Learning

Methods for sparse analysis of high-dimensional data, II

Lecture 3: Review of Linear Algebra

Oslo Class 2 Tikhonov regularization and kernels

Lecture 3: Review of Linear Algebra

Kernel Methods. Machine Learning A W VO

Machine Learning - MT & 14. PCA and MDS

Geometry on Probability Spaces

Machine learning for pervasive systems Classification in high-dimensional spaces

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

Data Mining and Analysis: Fundamental Concepts and Algorithms

1 Singular Value Decomposition and Principal Component

A Coupled Helmholtz Machine for PCA

Basic Calculus Review

235 Final exam review questions

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Statistical Convergence of Kernel CCA

Quantum NP - Cont. Classical and Quantum Computation A.Yu Kitaev, A. Shen, M. N. Vyalyi 2002

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Your first day at work MATH 806 (Fall 2015)

The Representor Theorem, Kernels, and Hilbert Spaces

Unsupervised dimensionality reduction

PRACTICE FINAL EXAM. why. If they are dependent, exhibit a linear dependence relation among them.

Computation. For QDA we need to calculate: Lets first consider the case that

Advanced Introduction to Machine Learning CMU-10715

Methods for sparse analysis of high-dimensional data, II

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Principal Component Analysis

Basic Operator Theory with KL Expansion

Kernel methods for comparing distributions, measuring dependence

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Spectral Clustering. by HU Pili. June 16, 2013

MATH 221, Spring Homework 10 Solutions

Topics in Harmonic Analysis Lecture 1: The Fourier transform

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Linear Algebra using Dirac Notation: Pt. 2

2 Tikhonov Regularization and ERM

Online Kernel PCA with Entropic Matrix Updates

Covariance and Correlation Matrix

STATISTICAL LEARNING SYSTEMS

Principal Components Analysis (PCA)

Kernel Method: Data Analysis with Positive Definite Kernels

1 Feature Vectors and Time Series

Functional Analysis Review

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Solutions to Review Problems for Chapter 6 ( ), 7.1

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

IV. Matrix Approximation using Least-Squares

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

CHAPTER 5 REVIEW. c 1. c 2 can be considered as the coordinates of v

RegML 2018 Class 2 Tikhonov regularization and kernels

(a) II and III (b) I (c) I and III (d) I and II and III (e) None are true.

Lecture Note 5: Semidefinite Programming for Stability Analysis

Learning gradients: prescriptive models

Transcription:

Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18

Outline 1 Motivation 2 Result 3 Summary Shixin Wang Convergence of Eigenspaces April 19, 2016 2 / 18

Motivation Outline 1 Motivation 2 Result 3 Summary Shixin Wang Convergence of Eigenspaces April 19, 2016 3 / 18

Motivation Kernel Principal Component Analysis PCA: To find the most relevant lower-dimension projection of data. KPCA: Extend PCA to data mapped in a kernel feature space. Assumption: Target dimensionality of the projected data is fixed: D. Objective: To find the span S D of the first D eigenvectors of the covariance matrix. Shixin Wang Convergence of Eigenspaces April 19, 2016 4 / 18

Motivation How reliable is our estimation? Problem: The true covariance matrix is no known and has to be estimated from the available data. Question: How reliable is the D-eigenspace ŜD of the empirical covariance matrix compared to the D-eigenspace S D of the true covariance matrix? 1 The average reconstruction error of ŜD converges to the reconstruction error of S D. (Blanchard et al 2004, Shawe-Taylor 2005) 2 But this does not mean ŜD converges to S D! since different subspaces can have a very similar reconstruction error. Shixin Wang Convergence of Eigenspaces April 19, 2016 5 / 18

Motivation Why analyze the behavior of ŜD? PCA or KPCA is often used merely as a preprocessing step, so the behavior of ŜD is more important than just reconstruction error. We want to use u 1 in the future, so we need to show that ˆx i converges to the true ones, rather than only norm of x i ˆx i. Shixin Wang Convergence of Eigenspaces April 19, 2016 6 / 18

Outline 1 Motivation 2 Result 3 Summary Shixin Wang Convergence of Eigenspaces April 19, 2016 7 / 18

Notation X: interest variable taking values in measurable space X, with distribution P. φ(x) = k(x, ): feature mapping of x X into a reproduction kernel Hilbert space H D: target dimensionality of projected data C: covariance matrix of variable φ(x) λ 1 > λ 2 >...: simple nonzero eigenvalues of C ϕ 1, ϕ 2...: associated eigenvectors C n : empirical covariance matrix Shixin Wang Convergence of Eigenspaces April 19, 2016 8 / 18

Notation S D = span{ϕ 1,...ϕ D }: D-dimensional subspace of H such that the projection of φ(x) on S D has maximum average squared norm Ŝ D : subspace spanned by the first D eigenvectors of C n. P SD : the orthogonal projectors of X on S D P ŜD : the orthogonal projectors of X on ŜD Shixin Wang Convergence of Eigenspaces April 19, 2016 9 / 18

First Bound Tow steps to obtain the first bound: 1 A non-asympotic bound on the (Hilbert-Schmidt) norm of the difference between the empirical and the true covariance operators 2 An operator perturbation result bounding the difference between spectral projectors of two operators by the norm of their difference. Shixin Wang Convergence of Eigenspaces April 19, 2016 10 / 18

Lemma 1 Lemma Supposing that sup x X k(x, x) M, with probability greater than 1 e ξ, C n C 2M (1 + n ξ 2 ) Shixin Wang Convergence of Eigenspaces April 19, 2016 11 / 18

Proof of Lemma 1 Proof. Theorem (Bounded Differences Inequality) Suppose that X 1,..., X n X are independent, and f : X n R, Let c 1,..., c n satisfy sup f (x 1,..., x n ) f (x 1,..., x i 1, x i, x i+1,..., x n ) c i, x 1,...,x n,x i i = 1,..., n Then, P(f E[f ] t) exp( 2t2 n i=1 c2 i ) Shixin Wang Convergence of Eigenspaces April 19, 2016 12 / 18

Proof of Lemma 1 Proof. C n C = 1 n n C Xi E[C X ] i=1 C X = φ(x) φ(x) = k(x, X) M Here c i = 2M n, t = 2M ξ, then we get 2n P C n C E[ C n C ] 2M exp( 2t2 n i=1 c2 i ξ 2n ) = exp( 4M2 ξ n 4M 2 /n ) = e ξ Shixin Wang Convergence of Eigenspaces April 19, 2016 13 / 18

Theorem 2 Shixin Wang Convergence of Eigenspaces April 19, 2016 14 / 18

Improved bound Shixin Wang Convergence of Eigenspaces April 19, 2016 15 / 18

Improved bound Shixin Wang Convergence of Eigenspaces April 19, 2016 16 / 18

Summary Outline 1 Motivation 2 Result 3 Summary Shixin Wang Convergence of Eigenspaces April 19, 2016 17 / 18

Summary Summary Provide finite sample size confidence bounds of the eigenspaces of Kernel-PCA Prove a bound in which the complexity factor for estimating the eigenspace S D by its empirical counterpart depends only on the inverse gap between the D-th and D + 1-th eigenvalues Restriction: Assume that the eigenvalues to be simple. Shixin Wang Convergence of Eigenspaces April 19, 2016 18 / 18