Reproducing Kernel Hilbert Spaces

Similar documents
Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

Kernels A Machine Learning Overview

RegML 2018 Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels

Reproducing Kernel Hilbert Spaces

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

Kernel Methods. Outline

Spectral Regularization

Kernel Method: Data Analysis with Positive Definite Kernels

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

The Learning Problem and Regularization

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Regularization via Spectral Filtering

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Machine Learning A W VO

Lecture 4 February 2

Kernel Methods for Computational Biology and Chemistry

UNBOUNDED OPERATORS ON HILBERT SPACES. Let X and Y be normed linear spaces, and suppose A : X Y is a linear map.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

On the inverse matrix of the Laplacian and all ones matrix

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

10-701/ Recitation : Kernels

Hilbert Spaces: Infinite-Dimensional Vector Spaces

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Complexity and regularization issues in kernel-based learning

Hilbert Space Methods in Learning

Representer theorem and kernel examples

Less is More: Computational Regularization by Subsampling

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition

CIS 520: Machine Learning Oct 09, Kernel Methods

There are two things that are particularly nice about the first basis

MATH 590: Meshfree Methods

Integration. Tuesday, December 3, 13

Generalization theory

Multiscale Frame-based Kernels for Image Registration

Functional Analysis Review

22 : Hilbert Space Embeddings of Distributions

5.6 Nonparametric Logistic Regression

MLCC 2017 Regularization Networks I: Linear Models

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

Support Vector Machines

4 Riesz Kernels. Since the functions k i (ξ) = ξ i. are bounded functions it is clear that R

Kernels for Multi task Learning

Regularization in Reproducing Kernel Banach Spaces

Class 2 & 3 Overfitting & Regularization

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Mercer s Theorem, Feature Maps, and Smoothing

Online Gradient Descent Learning Algorithms

Kernel Principal Component Analysis

Basic Calculus Review

96 CHAPTER 4. HILBERT SPACES. Spaces of square integrable functions. Take a Cauchy sequence f n in L 2 so that. f n f m 1 (b a) f n f m 2.

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Bayesian Support Vector Machines for Feature Ranking and Selection

The Heat Equation John K. Hunter February 15, The heat equation on a circle

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

2.3 Variational form of boundary value problems

Orthonormal Systems. Fourier Series

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

REAL AND COMPLEX ANALYSIS

Applied/Numerical Analysis Qualifying Exam

Less is More: Computational Regularization by Subsampling

Notes on Integrable Functions and the Riesz Representation Theorem Math 8445, Winter 06, Professor J. Segert. f(x) = f + (x) + f (x).

Advanced Introduction to Machine Learning

MA677 Assignment #3 Morgan Schreffler Due 09/19/12 Exercise 1 Using Hölder s inequality, prove Minkowski s inequality for f, g L p (R d ), p 1:

BINARY CLASSIFICATION

ICML - Kernels & RKHS Workshop. Distances and Kernels for Structured Objects

Outline of Fourier Series: Math 201B

Erkut Erdem. Hacettepe University February 24 th, Linear Diffusion 1. 2 Appendix - The Calculus of Variations 5.

Introduction to Support Vector Machines

Tutorial 6 - MUB and Complex Inner Product

Math Real Analysis II

Introduction to Signal Spaces

Generalization Bounds and Stability

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances

Linear Algebra. Session 12

2 Tikhonov Regularization and ERM

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

2.4 Hilbert Spaces. Outline

Lecture 3: Statistical Decision Theory (Part II)

Support Vector Machine

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

Sturm-Liouville operators have form (given p(x) > 0, q(x)) + q(x), (notation means Lf = (pf ) + qf ) dx

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Kernel Methods. Barnabás Póczos

Linear Algebra Review

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Functional Analysis Review

9.520 Problem Set 2. Due April 25, 2011

Kernels to detect abrupt changes in time series

Statistical Machine Learning from Data

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

CS 7140: Advanced Machine Learning

Transcription:

Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011

About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert Spaces (RKHS) We will discuss several perspectives on RKHS. In particular in this class we investigate the fundamental definition of RKHS as Hilbert spaces with bounded, continuous evaluation functionals and the intimate connection with symmetric positive definite kernels.

Plan Part I: RKHS are Hilbert spaces with bounded, continuous evaluation functionals. Part II: Reproducing Kernels Part III: Mercer Theorem Part IV: Feature Maps Part V: Representer Theorem

Regularization The basic idea of regularization (originally introduced independently of the learning problem) is to restore well-posedness of ERM by constraining the hypothesis space H. Regularization A possible way to do this is considering regularized empirical risk minimization, that is we look for solutions minimizing a two term functional ERR(f ) }{{} empirical error +λ R(f ) }{{} regularizer the regularization parameter λ trade-offs the two terms.

Tikhonov Regularization Tikhonov regularization amounts to minimize 1 n n V (f (x i ), y i ) + λr(f ) λ > 0 (1) i=1 V (f (x), y) is the loss function, that is the price we pay when we predict f (x) in place of y R(f ) is a regularizer often R(f ) = H, the norm in the function space H The regularizer should encode some notion of smoothness of f.

The "Ingredients" of Tikhonov Regularization The scheme we just described is very general and by choosing different loss functions V (f (x), y) we can recover different algorithms The main point we want to discuss is how to choose a norm encoding some notion of smoothness/complexity of the solution Reproducing Kernel Hilbert Spaces allow us to do this in a very powerful way

Different Views on RKHS

Part I: Evaluation Functionals

Some Functional Analysis A function space F is a space whose elements are functions f, for example f : R d R. A norm is a nonnegative function such that f, g F and α R 1 f 0 and f = 0 iff f = 0; 2 f + g f + g ; 3 αf = α f. A norm can be defined via a inner product f = f, f. A Hilbert space is a complete inner product space.

Some Functional Analysis A function space F is a space whose elements are functions f, for example f : R d R. A norm is a nonnegative function such that f, g F and α R 1 f 0 and f = 0 iff f = 0; 2 f + g f + g ; 3 αf = α f. A norm can be defined via a inner product f = f, f. A Hilbert space is a complete inner product space.

Some Functional Analysis A function space F is a space whose elements are functions f, for example f : R d R. A norm is a nonnegative function such that f, g F and α R 1 f 0 and f = 0 iff f = 0; 2 f + g f + g ; 3 αf = α f. A norm can be defined via a inner product f = f, f. A Hilbert space is a complete inner product space.

Some Functional Analysis A function space F is a space whose elements are functions f, for example f : R d R. A norm is a nonnegative function such that f, g F and α R 1 f 0 and f = 0 iff f = 0; 2 f + g f + g ; 3 αf = α f. A norm can be defined via a inner product f = f, f. A Hilbert space is a complete inner product space.

Examples Continuous functions C[a, b] : a norm can be established by defining (not a Hilbert space!) f = max f (x) a x b Square integrable functions L 2 [a, b]: it is a Hilbert space where the norm is induced by the dot product f, g = b a f (x)g(x)dx

Examples Continuous functions C[a, b] : a norm can be established by defining (not a Hilbert space!) f = max f (x) a x b Square integrable functions L 2 [a, b]: it is a Hilbert space where the norm is induced by the dot product f, g = b a f (x)g(x)dx

Hypothesis Space: Desiderata Hilbert Space. Point-wise defined functions.

Hypothesis Space: Desiderata Hilbert Space. Point-wise defined functions.

RKHS An evaluation functional over the Hilbert space of functions H is a linear functional F t : H R that evaluates each function in the space at the point t, or F t [f ] = f (t). Definition A Hilbert space H is a reproducing kernel Hilbert space (RKHS) if the evaluation functionals are bounded and continuous, i.e. if there exists a M s.t. F t [f ] = f (t) M f H f H

RKHS An evaluation functional over the Hilbert space of functions H is a linear functional F t : H R that evaluates each function in the space at the point t, or F t [f ] = f (t). Definition A Hilbert space H is a reproducing kernel Hilbert space (RKHS) if the evaluation functionals are bounded and continuous, i.e. if there exists a M s.t. F t [f ] = f (t) M f H f H

Evaluation functionals Evaluation functionals are not always bounded. Consider L 2 [a, b]: Each element of the space is an equivalence class of functions with the same integral f (x) 2 dx. An integral remains the same if we change the function in a countable set of points.

Norms in RKHS and Smoothness Choosing different kernels one can show that the norm in the corresponding RKHS encodes different notions of smoothness. Band limited functions. Consider the set of functions H := {f L 2 (R) F(ω) [ a, a], a < } with the usual L 2 inner product. the function at every point is given by the convolution with a sinc function sin(ax)/ax. The norm a f 2 H = f (x) 2 dx = F(ω) 2 dω Where F(ω) = F{f }(ω) = f (t)e iωt dt is the Fourier tranform of f. a

Norms in RKHS and Smoothness Choosing different kernels one can show that the norm in the corresponding RKHS encodes different notions of smoothness. Band limited functions. Consider the set of functions H := {f L 2 (R) F(ω) [ a, a], a < } with the usual L 2 inner product. the function at every point is given by the convolution with a sinc function sin(ax)/ax. The norm a f 2 H = f (x) 2 dx = F(ω) 2 dω Where F(ω) = F{f }(ω) = f (t)e iωt dt is the Fourier tranform of f. a

Norms in RKHS and Smoothness Sobolev Space: consider f : [0, 1] R with f (0) = f (1) = 0. The norm f 2 H = (f (x)) 2 dx = ω 2 F(ω) 2 dω Gaussian Space: the norm can be written as f 2 H = 1 2π d F(ω) 2 exp σ2 ω 2 2 dω

Norms in RKHS and Smoothness Sobolev Space: consider f : [0, 1] R with f (0) = f (1) = 0. The norm f 2 H = (f (x)) 2 dx = ω 2 F(ω) 2 dω Gaussian Space: the norm can be written as f 2 H = 1 2π d F(ω) 2 exp σ2 ω 2 2 dω

Norms in RKHS and Smoothness Sobolev Space: consider f : [0, 1] R with f (0) = f (1) = 0. The norm f 2 H = (f (x)) 2 dx = ω 2 F(ω) 2 dω Gaussian Space: the norm can be written as f 2 H = 1 2π d F(ω) 2 exp σ2 ω 2 2 dω

Linear RKHS Our function space is 1-dimensional lines f (x) = w x where the RKHS norm is simply f 2 H = f, f H = w 2 so that our measure of complexity is the slope of the line. We want to separate two classes using lines and see how the magnitude of the slope corresponds to a measure of complexity. We will look at three examples and see that each example requires more "complicated functions, functions with greater slopes, to separate the positive examples from negative examples.

Linear case (cont.) 2 1.5 1 0.5 0 0.5 1 1.5 2 1.5 1 0.5 0 0.5 1 1.5 2 1.5 1 0.5 0 0.5 1 1.5 here are three datasets: a linear function should be used to separate the classes. Notice that as the class distinction becomes finer, a larger slope is required to separate the classes. f(x) f(x) f(x) 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x

Part II: Kernels

Different Views on RKHS

Representation of Continuous Functionals Let H be a Hilbert space and g H, then Φ g (f ) = f, g, f H is a continuous linear functional. Riesz representation theorem The theorem states that every continuous linear functional Φ can be written uniquely in the form, Φ(f ) = f, g for some appropriate element g H.

Reproducing kernel (rk) If H is a RKHS, then for each t X there exists, by the Riesz representation theorem a function K t in H (called representer) with the reproducing property F t [f ] = K t, f H = f (t). Since K t is a function in H, by the reproducing property, for each x X K t (x) = K t, K x H The reproducing kernel (rk) of H is K (t, x) := K t (x)

Reproducing kernel (rk) If H is a RKHS, then for each t X there exists, by the Riesz representation theorem a function K t in H (called representer) with the reproducing property F t [f ] = K t, f H = f (t). Since K t is a function in H, by the reproducing property, for each x X K t (x) = K t, K x H The reproducing kernel (rk) of H is K (t, x) := K t (x)

Positive definite kernels Let X be some set, for example a subset of R d or R d itself. A kernel is a symmetric function K : X X R. Definition A kernel K (t, s) is positive definite (pd) if n c i c j K (t i, t j ) 0 i,j=1 for any n N and choice of t 1,..., t n X and c 1,..., c n R.

RKHS and kernels The following theorem relates pd kernels and RKHS Theorem a) For every RKHS there exist an associated reproducing kernel which is symmetric and positive definite b) Conversely every symmetric, positive definite kernel K on X X defines a unique RKHS on X with K as its reproducing kernel

Sketch of proof a) We must prove that the rk K (t, x) = K t, K x H is symmetric and pd. Symmetry follows from the symmetry property of dot products K is pd because n c i c j K (t i, t j ) = i,j=1 K t, K x H = K x, K t H n c i c j K ti, K tj H = c j K tj 2 H 0. i,j=1

Sketch of proof (cont.) b) Conversely, given K one can construct the RKHS H as the completion of the space of functions spanned by the set {K x x X} with a inner product defined as follows. The dot product of two functions f and g in span{k x x X} f (x) = g(x) = s α i K xi (x) i=1 s i=1 β i K x i (x) is by definition f, g H = s s α i β j K (x i, x j ). i=1 j=1

Examples of pd kernels Very common examples of symmetric pd kernels are Linear kernel K (x, x ) = x x Gaussian kernel Polynomial kernel K (x, x ) = e x x 2 σ 2, σ > 0 K (x, x ) = (x x + 1) d, d N For specific applications, designing an effective kernel is a challenging problem.

Examples of pd kernels Kernel are a very general concept. We can have kernel on vectors, string, matrices, graphs, probabilities... Combinations of Kernels allow to do integrate different kinds of data. Often times Kernel are views and designed to be similarity measure (in this case it make sense to have normalized kernels) d(x, x ) 2 = K x K x 2 = 2(1 K (x, x )).