Online learning in Reproducing Kernel Hilbert Spaces

Similar documents
Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Introduction to Optimization Techniques

Convergence of random variables. (telegram style notes) P.J.C. Spreij

TENSOR PRODUCTS AND PARTIAL TRACES

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Chapter 7 Isoperimetric problem

Linear Classifiers III

Math Solutions to homework 6

Riesz-Fischer Sequences and Lower Frame Bounds

Sequences and Series of Functions

Brief Review of Functions of Several Variables

Abstract Vector Spaces. Abstract Vector Spaces

Chapter 3 Inner Product Spaces. Hilbert Spaces

Chapter 6 Infinite Series

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Lecture Notes for Analysis Class

Support vector machine revisited

lim za n n = z lim a n n.

6.867 Machine learning, lecture 7 (Jaakkola) 1

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

Optimally Sparse SVMs

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Introduction to Optimization Techniques. How to Solve Equations

Math 61CM - Solutions to homework 3

1 6 = 1 6 = + Factorials and Euler s Gamma function

Singular Continuous Measures by Michael Pejic 5/14/10

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

b i u x i U a i j u x i u x j

Real Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B)

Measure and Measurable Functions

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Infinite Sequences and Series

Linear Elliptic PDE s Elliptic partial differential equations frequently arise out of conservation statements of the form

Linear Regression Demystified

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j

A) is empty. B) is a finite set. C) can be a countably infinite set. D) can be an uncountable set.

(VII.A) Review of Orthogonality

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size.

Numerical Conformal Mapping via a Fredholm Integral Equation using Fourier Method ABSTRACT INTRODUCTION

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

6.3 Testing Series With Positive Terms

Solutions to home assignments (sketches)

Beurling Integers: Part 2

PAPER : IIT-JAM 2010

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Supplemental Material: Proofs

MAS111 Convergence and Continuity

5 Birkhoff s Ergodic Theorem

Lecture 3 The Lebesgue Integral

The Method of Least Squares. To understand least squares fitting of data.

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 7: The z-transform. Chih-Wei Liu

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Introduction to Functional Analysis

REGULARIZATION OF CERTAIN DIVERGENT SERIES OF POLYNOMIALS

Chapter IV Integration Theory

10-701/ Machine Learning Mid-term Exam Solution

A survey on penalized empirical risk minimization Sara A. van de Geer

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Lecture 19: Convergence

A Proof of Birkhoff s Ergodic Theorem

Mathematical Methods for Physics and Engineering

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Enumerative & Asymptotic Combinatorics

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

CARLEMAN INTEGRAL OPERATORS AS MULTIPLICATION OPERATORS AND PERTURBATION THEORY

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

CHAPTER I: Vector Spaces

TEACHER CERTIFICATION STUDY GUIDE

PRELIM PROBLEM SOLUTIONS

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.

Advanced Stochastic Processes.

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

Chapter 2. Periodic points of toral. automorphisms. 2.1 General introduction

The second is the wish that if f is a reasonably nice function in E and φ n

Rotationally invariant integrals of arbitrary dimensions

8. Applications To Linear Differential Equations

Complex Analysis Spring 2001 Homework I Solution

Ma 530 Introduction to Power Series

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

Support Vector Machines and Kernel Methods

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Questions and answers, kernel part

Chapter 10: Power Series

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

An Introduction to Randomized Algorithms

Stochastic Matrices in a Finite Field

Sequences. Notation. Convergence of a Sequence

MA131 - Analysis 1. Workbook 3 Sequences II

1. Hydrogen Atom: 3p State

Machine Learning Brett Bernstein

Transcription:

Olie learig i Reproducig Kerel Hilbert Spaces Patelis Bouboulis, Member, IEEE, 1 May 1, 1 1 P. Bouboulis is with the Departmet of Iformatics ad telecommuicatios, Uiversity of Athes, Greece, e-mail: (see bouboulis.mysch.gr).

Chapter 1 Reproducig Kerel Hilbert Spaces I kerel-based methods, the otio of the Reproducig Kerel Hilbert Space (RKHS) plays a crucial role. A RKHS is a rich costruct (roughly, a space of fuctios with a ier product), which has bee prove to be a very powerful tool. Kerel based methods are utilized i a icreasigly large umber of scietific areas, especially where o-liear models are required. For example, i patter aalysis, a classificatio task of a set X R m is usually reformed by mappig the data ito a higher dimesioal space (possibly of ifiite dimesio) H, which is a Reproducig Kerel Hilbert Space (RKHS). The advatage of such a mappig is to make the task more tractable, by employig a liear classifier i the feature space H, exploitig Cover s theorem (see [43, 37]). This is equivalet with solvig a o-liear problem i the origial space. Therefore, with the use of kerels, a ew techique has bee itroduced to trasform certai classes of o-liear tasks to equivalet liear oes, restated i a higher eve ifiite dimesioal space, but with avoidig the accompayig computatioal ad geeralizatio theory s (also kow as the curse of dimesioality ) problems, associated with the traditioal techiques, whe the dimesioality of the task icreases. Similar approaches have bee used i pricipal compoets aalysis, i Fisher s liear discrimiat aalysis, i clusterig, regressio, image processig ad i may other subdisciplies. Recetly, processig i RKHS is gaiig i popularity withi the Sigal Processig commuity i the cotext of adaptive learig. The itroductio of o-liearity is usually itroduced via a computatioally elegat way kow to the machie learig commuity as the kerel trick [36] (the formal defiitio of the positive defiite kerel is give i sectio 1.): Give a algorithm, which is formulated i terms of dot products, oe ca costruct a alterative algorithm by replacig each oe of the dot products with a positive defiite kerel κ. Although this trick works well for most applicatios, it coceals the basic mathematical steps that uderlie the procedure, which are essetial if oe seeks a deeper uderstadig of the problem. These steps are: 1) Map the fiite dimesioality iput data from the iput space X (usually X R ν ) ito a higher dimesioality (possibly ifiite) RKHS H (this is usually called the feature space) ad ) Perform a liear processig (e.g., adaptive filterig) o the mapped data i H. The procedure is equivalet with a o-liear processig (o-liear filterig) i X (see figure 1.1). The specific choice of the kerel κ defies, implicitly, a RKHS with a appropriate ier product. Moreover, the specific choice of the kerel defies the type of oliearity that uderlies the model to be used. 1.1 A Historical overview I the past, there have bee two treds i the study of these spaces by the mathematicias. The first oe origiated i the theory of itegral equatios by J. Mercer [5, 6]. He used the term positive defiite kerel to characterize a fuctio of two poits κ(x,y) defied o X, which satisfies Mercer s law:,m=1 a a m κ(x,x m ), (1.1) 3

4 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Figure 1.1: Mappig from iput space X to feature space H. for ay umbers a,a m ad poits x,x m. Later o, Moore [7, 8, 9] foud that to such a kerel there correspods a well determied class of fuctios, H, equipped with a specific ier product, H, i respect to which the kerel κ possesses the so called reproducig property: f(y) = f,κ(,y) H, (1.) for all fuctios f H ad y X. Those that followed this tred used to cosider a specific give positive defiite kerel κ ad studied it i itself, or evetually applied it i various domais (such as itegral equatios, theory of groups, geeral metric theory, iterpolatio, e.t.c.). The class H correspodig to κ was maily used as a tool of research ad it was usually itroduced a posteriori. The work of Bocher [5, 6], which itroduced the otio of the positive defiite fuctio i order to apply it i the theory of Fourier trasforms, also belogs to the same path as the oe followed by Mercer ad Moore. These are cotiuous fuctios φ of oe variable such that φ(x y) = κ(x,y), for some positive defiite kerel κ. O the other had, those who followed the secod tred were primarily iterested i the class of fuctios H, while the associated kerel was employed essetially as a tool i the study of the fuctios of this class. This tred is traced back to the works of S. Zaremba [47, 48] durig the first decade of the -th cetury. He was thefirstto itroducetheotio of akerel, whichcorrespodsto aspecificclass of fuctios adto state its reproducig property. However, he did ot develop ay geeral theory, or did he gave ay particular ame to the kerels he itroduced. I this, secod tred, the mathematicias were primarily iterested i the study of the class of fuctios H ad the correspodig kerel κ, which satisfies the reproducig property, was used as a tool i this study. To the same tred belog also the works of Bergma [4] ad Aroszaj []. Those two treds evolved separately durig the first decades of the -th cetury, but soo the liks betwee them were oticed. After the secod world war, it was kow that the two cocepts of defiig a kerel, either as a positive defiite kerel, or as a reproducig kerel, are equivalet. Furthermore, It was proved that there is a oe to oe correspodece betwee the space of positive defiite kerels ad the space of reproducig kerel Hilbert spaces. It has to be emphasized that examples of such kerels have bee kow for a log time prior to the works of Mercer ad Zaremba; for example, all the Gree s fuctios of self-adjoit ordiary differetial equatios belog to this type of kerels. However, the some of the importat properties that these kerels possess have oly bee realized ad used i the begiig of the -th cetury ad sice the have bee the focus of research. I the followig, we will give a more detailed descriptio of these spaces ad establish their mai properties, focussig o the essetials that elevate them to such a powerful tool i the cotext of machie learig. Most of the material preseted here ca also be foud i more detail i several other textbooks, such as the celebrated paper of Aroszaj [], the excellet itroductory text of Paulse [31] ad

1.. DEFINITION 5 the popular books of Schölkoph ad Smola [37] ad Shawe-Taylor ad Cristiaii [39]. Here, we attempt to portray both treds ad to highlight the importat liks betwee them. Although the geeral theory applies to complex spaces, to keep the presetatio as simple as possible, we will maily focus o real spaces. The complex case will be treated at the ed of this sectio. 1. Defiitio We begi our study with the classic defiitios o positive defiite matrices ad kerels as they were itroduced by Mercer. Give a fuctio κ : X X R ad x 1,...,x N X (typically X is a compact subset of R ν, ν > ), thesquarematrix K = (K,m ) N with elemets K,m = κ(x,x m ), for,m = 1,...,N, is called the Gram matrix (or kerel matrix) of κ with respect to x 1,...,x N. A symmetric matrix K = (K,m ) N satisfyig c T K c = =1,m=1 c c m K,m, for all c R N, = 1,...,N, where the otatio T deotes the traspose matrix, is called positive defiite. I matrix aalysis literature, this is the defiitio of a positive semidefiite matrix. However, as positive defiite matrices were origially itroduced by Mercer ad others i this cotext, we employ the term positive defiite, as it was already defied. If the iequality is strict, for all o-zero vectors c R N, the matrix will be called strictly positive defiite. A fuctio κ : X X R, which for all N N ad all x 1,...,x N X gives rise to a positive defiite Gram matrix K, is called a positive defiite kerel. I the followig, we will frequetly refer to a positive defiite kerel simply as kerel. We coclude that a positive defiite kerel is symmetric ad satisfies =1,m=1 c c m κ(x,x m ), for all c R N, = 1,...,N, ad x 1,...,x N X. Formally, a Reproducig kerel Hilbert space is defied as follows: Defiitio 1..1 (Reproducig Kerel Hilbert Space). Cosider a liear class H of real valued fuctios, f, defied o a set X. Suppose, further, that i H we ca defie a ier product, H with correspodig orm H ad that H is complete with respect to that orm, i.e., H is a Hilbert space. We call H a Reproducig Kerel Hilbert Space (RKHS), if there exists a fuctio κ : X X F with the followig two importat properties: 1. For every x X, κ(,x) belogs to H (or equivaletly κ spas H, i.e., H = spa{κ(,x), x X}).. κ has the so called reproducig property, i.e., f(x) = f,κ(,x) H, for all f H,x X, (1.3) i particular κ(x,y) = κ(,y),κ(,x) H. Furthermore, κ is a positive defiite kerel ad the mappig Φ : X H, with Φ(x) = κ(,x), for all x X is called the feature map of H. To deote the RKHS associated with a specific kerel κ we will also use the otatio H(κ). Note that H is ofte called the feature space associated with kerel κ. Furthermore, uder the aforemetioed otatios κ(x,y) = Φ(y),Φ(x) H, i.e., κ(x,y) is the ier product of Φ(y) ad Φ(x) i the feature space. This is the essece of the kerel trick metioed at the begiig of sectio 1. The feature map Φ trasforms the data from the low dimesioality space X to the higher dimesioality space H. Liear processig i H ivolves ier products i H, which ca be calculated via the kerel κ disregardig the actual structure of H. Roughly speakig, oe trades oliearities, which is ofte hard to hadle, for a icrease i the dimesioality of the space.

6 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES 1.3 Derivatio of the Defiitio I the followig, we cosider the defiitio of a RKHS as a class of fuctios with specific properties (followig the secod tred) ad show the key ideas that uderlie defiitio 1..1. To that ed, cosider a liear class H of real valued fuctios, f, defied o a set X. Suppose, further, that i H we ca defie a ier product, H with correspodig orm H ad that H is complete with respect to that orm, i.e., H is a Hilbert space. Cosider, also, a liear fuctioal T, from H ito the field R. A importat theorem of fuctioal aalysis states that such a fuctioal is cotiuous, if ad oly if it is bouded. The space cosistig of all cotiuous liear fuctioals from H ito the field R is called the dual space of H. I the followig, we will frequetly refer to the so called liear evaluatio fuctioal T y. This is a special case of a liear fuctioal that satisfies T y (f) = f(y), for all f H. We call H a Reproducig Kerel Hilbert Space (RKHS) o X over R, if for every y X, the liear evaluatio fuctioal, T y, is cotiuous. We will prove that such a space is related to a positive defiite kerel, thus providig the first lik betwee the two treds. Subsequetly, we will prove that ay positive defiite kerel defies implicitly a RKHS, providig the secod lik ad cocludig the equivalet defiitio of RKHS (defiitio 1..1), which is usually used i the machie learig literature. The followig theorem establishes a importat coectio betwee a Hilbert space H ad its dual space. Theorem 1.3.1 (Riesz Represetatio). Let H be a geeral Hilbert space ad let H deote its dual space. Every elemet Φ of H ca be uiquely expressed i the form: for some φ H. Moreover, Φ H = φ H. Φ(f) = f,φ H, Followig the Riesz represetatio theorem, we have that for every y X, there exists a uique elemet κ y H, such that for every f H, f(y) = T y (f) = f,κ y H. The fuctio κ y is called the reproducig kerel for the poit y ad the fuctio κ(x,y) = κ y (x) is called the reproducig kerel of H. I additio, ote that κ y,κ x H = κ y (x) = κ(x,y) ad T y H = κ y H = κ y,κ y H = κ(y,y). Propositio 1.3.1. The reproducig kerel of H is symmetric, i.e., κ(x, y) = κ(y, x). Proof. Observe that κ y,κ x H = κ y (x) = κ(x,y) ad κ x,κ y H = κ x (y) = κ(y,x). As the ier product of H is symmetric (i.e., κ y,κ x H = κ x,κ y H ) the result follows. I the followig, we will frequetly idetify the fuctio κ y with the otatio κ(,y). Thus, we write the reproducig property of H as: f(y) = f,κ(,y) H, (1.4) for ay f H, y X. Note that due to the uiqueess provided by the Riesz represetatio theorem, κ is the uique fuctio that satisfies the reproducig property. The followig propositio establishes the first lik betwee the positive defiite kerels ad the reproducig kerels. Propositio 1.3.. The reproducig kerel of H is a positive defiite kerel. Proof. Cosider N >, the real umbers a 1,a,...a N ad the elemets, x 1,x,...,x N X. The =1m=1 a a m κ(x,x m ) = =1m=1 m=1 a a m κ(,x m ),κ(,x ) H = N = a m κ(,x m ), a κ(,x ) =1 H N a a m κ(,x m ),κ(,x ) =1 m=1 = a κ(,x ) Combiig propositio 1.3.1 ad the previous result, we complete the proof. =1 H. H

1.3. DERIVATION OF THE DEFINITION 7 Remark 1.3.1. Geerally, for a reproducig kerel, the respective Gram matrix is strictly positive defiite. For if ot, the there must exist at least oe o zero vector a such that N =1 a κ(,x ) =. Hece, H for every f H we have that a f(x ) = f, a κ(,x ) H =. Thus, i this case there is a equatio of liear depedece betwee the values of every fuctio i H at some fiite set of poits. Such examples do exist (e.g. Sobolev spaces), but i most cases the reproducig kerels defie Gram matrices that are always strictly positive ad ivertible! The followig propositio establishes a very importat fact; ay RKHS, H, ca be geerated by the respective reproducig kerel κ. Note that the overbar deotes the closure of a set (i.e., if A is a subset of H, Ā is the closure of A). Propositio 1.3.3. Let H be a RKHS o the set X with reproducig kerel κ. The the liear spa of the fuctios κ(,x), x X is dese i H, i.e., H = spa{κ(,x), x X}. Proof. We will prove that the oly fuctio of H orthogoal to A = spa{κ(,x), x X} is the zero fuctio. Let f be such a fuctio. The, as f is orthogoal to A, we have that f(x) = f,κ(,x) H =, for every x X. This holds true if ad oly if f =. Thus A = A = {}. Suppose that there is f H such that f A. As A is a closed (covex) subspace of H, there is a g A which miimizes the distace betwee f ad poits i A (theorem of best approximatio). For the same g we have that f g A. Thus, the o-zero fuctio h = f g is orthogoal to A. However, we proved that there is t ay o-zero vector orthogoal to A. This leads us to coclude that A = H. I the followig we give some importat properties of the specific spaces. Propositio 1.3.4 (Norm covergece implies poit-wise covergece). Let H be a RKHS o X ad let {f } N H. If lim f f H =, the f(x) = lim f (x), for every x X. Coversely, if for ay sequece {f } N of a Hilbert space H, such that lim f f H = we have also that f(x) = lim f (x), the H is a RKHS. Proof. For every x X we have that f (x) f(x) H = f,κ(,x) H f,κ(,x) H = f f,κ(,x) H f f H κ(,x) H. As lim f f =, we have that lim f (x) f(x) =, for every x X. Hece f(x) = lim f (x), for every x X. For the coverse, cosider the evaluatio fuctioal T y : H R, T y (f) = f(y) for some y H. We will prove that T y is cotiuous for all y H. To this ed, cosider a sequece {f } N of H, with the property lim f f H =, i.e., f coverges to f i the orm. The T y (f ) T y (f) = f (y) f(y), as f(x) = lim f (x). Thus T y (f) = lim T y (f ) for all y X ad all covergig sequeces {f } N of H. Propositio 1.3.5 (Differet RKHS s caot have the same reproducig kerel). Let H 1,H be RKHS s o X with reproducig kerels κ 1,κ. If κ 1 (x,y) = κ (x,y), for all x,y X, the H 1 = H ad f H1 = f H for every f. Proof. Let κ(x,y) = κ 1 (x,y) = κ (x,y) ad A i = spa{κ i (,x),x X}, i = 1,. As show i propositio 1.3.3, H i = A i, i = 1,. Note that for ay f A i, i = 1,, we have that f(x) = a κ i (,x ), for some real umbers a ad thus the values of the fuctio are idepedet of whether we regard it as i A 1 or A. Furthermore, for ay f A i, i = 1,, as the two kerels are idetical, we have that f H 1 =,m a a m κ(x m,x ) = f H. Thus, f H1 = f H, for all f A 1 = A. Fially, we tur our attetio to the limit poits of A 1 ad A. If f H 1, the there exists a sequece of fuctios, {f } N A 1 such that lim f f H1 =. Sice {f } N is a covergig sequece, it is Cauchy i A 1 ad thus it is also Cauchy i A. Therefore, there exists g H such that lim g f H =. Employig propositio 1.3.4, we take that f(x) = lim f (x) = g(x). Thus, every f i H 1 is also i H ad by aalogous argumet we ca prove that every g H is also i H 1. Hece H 1 = H ad as f H1 = f H for all f i a dese subset (i.e., A 1 ), we have that the orms are equal for every f. To prove the latter, we use the relatio lim f Hi = f Hi, i = 1,.

8 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES The followig theorem is the coverse of propositio 1.3.. It was proved by Moore ad it gives us a characterizatio of reproducig kerel fuctios. Also, it provides the secod lik betwee the two treds that have bee metioed i sectio 1.1. Moore s theorem, together with propositio 1.3., propositio 1.3.5 ad the uiqueess property of the reproducig kerel of a RKHS, establishes a oe-to-oe correspodece betwee RKHS s o a set ad positive defiite fuctios o the set. Theorem 1.3. (Moore). Let X be a set ad let κ : X X R be a positive defiite kerel. The there exists a RKHS of fuctios o X, such that κ is the reproducig kerel of H. Proof. We will give oly a sketch of the proof. The iterested reader is referred to [31]. The first step is to defie A = spa{κ(,x), x X} ad the liear map P : A A R such that ( P a m κ(,y m ), ) b κ(,y ) = m b κ(y,y m ). m,ma We prove that P is well defied ad that it satisfies the properties of the ier product. The, give the vector space A ad the ier productp, oe may complete the space by takig equivalece classes of Cauchy sequeces from A to obtai the Hilbert space A. Fially, the reproducig property of the kerel κ with respect to the ier product P is proved. I view of the aforemetioed theorems, the defiitio 1..1 of the RKHS give i 1., which is usually used i the machie learig literature, follows aturally. We coclude this sectio with a short descriptio of the most importat poits of the theory developed by Mercer i the cotext of itegral operators. Mercer cosidered itegral operators T κ geerated by a kerel κ, i.e., T κ : L (X) L (X), such that (T κ f)(x) := X κ(x,y)f(y)dy. He cocluded the followig theorems [5]: Theorem 1.3.3 (MercerKerelsarepositivedefiite). Let X R ν be a oempty set ad let κ : X X R be cotiuous. The κ is a positive defiite kerel if ad oly if b b a a f(x)κ(x,y)f(y)dxdy, for all cotiuous fuctios f o X. Moreover, if κ is positive defiite, the itegral operator T κ : L (X) L (X) : (T κ f)(x) := X κ(x,y)f(y)dy is positive defiite ad if ψ i L (X) are the ormalized orthogoal eigefuctios of T k associated with the eigevalues λ i > the: κ(x,y) = i λ i ψ i (x)ψ i (y). Note that the origial form of above theorem is more geeral, ivolvig σ-algebras ad probability measures. However, as i the applicatios cocerig this mauscript such geeral terms are of o importace, we decided to iclude this simpler form. The previous theorems established that Mercer s kerels, as they are positive defiite kerels, are also reproducig kerels. Furthermore, the first part of theorem 1.3.3 provides a useful tool of determiig whether a specific fuctio is actually a reproducig kerel. Before closig this sectio, we should emphasize that the geeral theory of RKHS has bee developed by the mathematicias to treat complex spaces. However, for the sake of simplicity ad clarity, we decided to begi with the simplest real case. Besides, most kerel based methods ivolve real data sets. Nevertheless, keep i mid that all the theorems preseted here ca be geeralized to treat complex spaces. We will explore this issue further i sectio 1.8. 1.4 Examples of Kerels Before proceedig to some more advaced topics i the theory of RKHS, it is importat to give some examples of kerels that appear more ofte i the literature ad are used i various applicatios. Perhaps

1.5. PROPERTIES OF RKHS 9 the most widely used reproducig kerel is the Gaussia radial basis fuctio defied o X X, where X R ν, as: ) κ σ (x,y) = exp ( x y σ, (1.5) where σ >. Equivaletly the Gaussia RBF fuctio ca be defied as: for t >. κ t (x,y) = exp ( t x y ), (1.6) 1..5. (a) (b) Figure 1.: (a) The Gaussia kerel for the case X = R, σ =.5. (b) The elemet Φ() = κ(,) of the feature space iduced by the Gaussia kerel for various values of the parameter σ. Other well-kow kerels defied i X X, X R ν are: The homogeeous polyomial kerel: κ d (x,y) = x,y d. The ihomogeeous polyomial kerel: κ d (x,y) = ( x,y +c) d, where c a costat. The splie kerel: κ p (x,y) = B p+1 ( x y ), where B = i=1 I [ 1,1 ]. The cosie kerel: κ(x, y) = cos( (x, y)). The Laplacia kerel: κ t (x,y) = exp( t x y ). Figures 1., 1.3, 1.4, 1.5, 1.6, show some of the aforemetioed kerels together with a sample of the elemets κ(,x) that spa the respective RKHS s for the case X = R. Figures 1.7, 1.8, 1.9, show some of the elemets κ(,x) that spa the respective RKHS s for the case X = R. Iteractive figures regardig the aforemetioed examples ca be foud i http://bouboulis.mysch.gr/kerels.html. 1.5 Properties of RKHS I this sectio, we will refer to some more advaced topics o the theory of RKHS, which are useful for a deeper uderstadig of the uderlyig theory ad show why RKHS s costitute such a powerful tool. We begi our study with some properties of RKHS s ad coclude with the basic theorems that eable us to geerate ew kerels. As we work i Hilbert spaces, the two Parseval s idetities are a extremely helpful tool. Whe {e s : s S} (where S is a arbitrary set) is a orthoormal basis for a Hilbert space H, the for ay h H we have that: h = s S h,e s e s, (1.7) h = h,e s. (1.8) s S

1 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES 5 5 (a) (b) Figure 1.3: (a) The homogeeous polyomial kerel for the case X = R, d = 1. (b) The elemet Φ(x ) = κ(,x ) of the feature space iduced by the homogeeous polyomial kerel (d = 1) for various values of x. 4 (a) (b) Figure 1.4: (a) The homogeeous polyomial kerel for the case X = R, d =. (b) The elemet Φ(x ) = κ(,x ) of the feature space iduced by the homogeeous polyomial kerel (d = ) for various values of x. 4 3 1 (a) (b) Figure 1.5: (a) The ihomogeeous polyomial kerel for the case X = R, d =. (b) The elemet Φ(x ) = κ(,x ) of the feature space iduced by the ihomogeeous polyomial kerel (d = ) for various values of x.

11 1.5. PROPERTIES OF RKHS - 1..5. - (a) (b) Figure 1.6: (a) The Laplacia kerel for the case X = R, t = 1. (b) The elemet Φ() = κ(, ) of the feature space iduced by the Laplacia kerel for various values of the parameter t. 1. 1..5 1..5..5.. -.5. - 1. - - (a) - - (b) - (c) - (d) Figure 1.7: The elemet Φ() = κ(, ) of the feature space iduced by the Gaussia kerel (X = R ) for various values of the parameter σ. (a) σ =.5, (b) σ =.8, (c) σ = 1, (d) σ = 1.5. 1. 1..5 1..5..5.. -.5. - 1. - - (a) - - (b) - (c) - (d) Figure 1.8: The elemet Φ(x ) = κ(, x ) of the feature space iduced by the Gaussia kerel (X = R ) with σ =.5. (a) x = (, )T, (b) x = (, 1)T, (c) x = (1, )T, (d) x = (1, 1)T. 1. 1..5 1..5..5. (a) -. -.5. - 1. - - (b) - (c) - - (d) Figure 1.9: The elemet Φ() = κ(, ) of the feature space iduced by the Laplacia kerel (X = R ) for various values of the parameter t. (a) t =.5, (b) t = 1, (c) t =, (d) t = 4.

1 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Note that these two idetities hold for a geeral arbitrary set S (ot ecessarily ordered). The covergece i this case is defied somewhat differetly. We say that h = s S h s, if for ay ǫ >, there exists a fiite subset F F, such that for ay fiite set F: F F S, we have that h s S h s < ǫ. Propositio 1.5.1 (Cauchy-Schwarz Iequality). If κ is a reproducig kerel o X the κ(x,y) κ(x,x) κ(y,y). Proof. The proof is straightforward, as κ(x,y) is the ier product Φ(y),Φ(x) H of the space H(κ). Theorem 1.5.1. Every fiite dimesioal class of fuctios defied o X, equipped with a ier product, is a RKHS. Let h 1,...,h N costitute a basis of the space ad the ier product is defied as follows f,g =,m=1 α,m γ ζ m, for f = N =1 γ h ad g = N =1 ζ h. Let A = (α,m ) N, ad B = (β,m ) N be its iverse. The the kerel of the RKHS is give by κ(x,y) =,m=1 Proof. The reproducig property is immediately verified by equatio 1.9: N f,κ(,x) H = γ h, =,k=1 =f(x). =1,m=1 β,m h (x)h m (y), (1.9) β,m h m (x) h m ( N α,m β m,k )γ h k (x) = m=1 H =,m=1 γ h (x) =1 N α,m γ β m,k h k (x) k=1 The followig theorem gives the kerel of a RKHS (of fiite or ifiite dimesio) i terms of the elemets of a orthoormal basis. Theorem 1.5.. Let H be a RKHS o X with reproducig kerel κ. If {e s : s S N} is a orthoormal basis for H, the κ(x,y) = s S e s(y)e s (x), where this series coverges poitwise. Proof. For ay y X we have that κ(,y),e s H = e s,κ(,y) H = e s (y). Hece, employig Parseval s idetity (1.7), we have that κ(,y) = s S e s(y)e s ( ), where these sums coverge i the orm o H. Sice the sums coverge i the orm, they coverge at every poit. Hece, κ(x,y) = s S e s(y)e s (x). Propositio 1.5.. If H is a RKHS o X with respective kerel κ the every closed subspace F H is also a RKHS. I additio, if F 1 (κ 1 ) ad F (κ ) are complemetary subspaces of H the κ = κ 1 +κ. Propositio 1.5.3. Let H be a RKHS o X with kerel κ ad {g } is a orthoormal system i H. The for ay sequece of umbers {a } such that a < (i.e., {a } l ) we have ( )1 a g (x) κ(x,x) 1 a.

1.5. PROPERTIES OF RKHS 13 Proof. We have see that g (y) = g,κ(,y) ad that κ(,y) H = κ(y,y). Thus, cosiderig that g s are orthoormal ad takig the Parseval s idetity (1.8) for κ(, y) with respect to the orthoormal basis we have: g (y) = g,κ(,y) H = κ(,y) H = κ(y,y). Therefore, applyig the Cauchy-Schwartz iequality we take ( )1 ( )1 a g (x) a g (x) κ(x,x) 1 ( a )1. Theorem 1.5.3 (Represeter Theorem). Deote by Ω : [,+ ) R a strictly mootoic icreasig fuctio, by X a oempty set ad by L : X R R { } a arbitrary loss fuctio. The each miimizer f H of the regularized miimizatio problem: mi f L((x 1,y 1,f(x 1 )),...,(x N,y N,f(x N ))+Ω( f H ), admits a represetatio of the form f = N =1 a κ(,x ). Proof. We may decompose each f H ito a part cotaied i the spa of the kerels cetered at the traiig poits, i.e., κ(,x 1 ),...,κ(,x N ), (which is a closed liear subspace) ad a part i the orthogoal complemet of the previous spa. Thus each f ca be writte as: f = a κ(,x )+f. =1 Applyig the reproducig property ad cosiderig that f,κ(,x ) H =, for = 1,...,N, we take: f(x ) = f,κ(,x ) H = a i κ(x,x i )+ f,κ(,x ) H = i=1 a i κ(x,x i ). Thus, the value of the loss fuctio L depeds oly o the part cotaied i the spa of the kerels cetered at the traiig poits, i.e., o a 1,...,a N. Furthermore, for all f we have: Ω( f ) = Ω a κ(,x ) + f H Ω a κ(,x ). =1 Thus, for ay fixed a 1,...,a the value of the cost fuctio is miimized for f =. Hece, the solutio of the miimizatio task will have to obey this property too. Examples of loss fuctios L as the oes metioed i Theorem 1.5.3 are for example the MSE: =1 i=1 ad the l 1 mea error L((x 1,y 1,f(x 1 )),...,(x N,y N,f(x N )) = (f(x ) y ), =1 L((x 1,y 1,f(x 1 )),...,(x N,y N,f(x N )) = f(x ) y. The aforemetioed theorem is of great importace to practical applicatios. Although oe might be tryig to solve a optimizatio task i a ifiite dimesioal RKHS H (such as the oe that geerated by the =1

14 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES 55 5 45 55 5 45 4 35..4.6.8 1. (a)..4.6.8 1. (b) 55 5 45 55 5 45..4.6.8 1. (c)..4.6.8 1. (d) 1 Figure 1.1: Solvig the regressio problem mi N f N =1 (y f(x )) + λ f H, o a set of 11 poits (a), (c) with a bias, i.e., f admits the form of (1.1) ad (b), (d) without a bias, i.e., f admits the form of (1.11). I (a) ad (b) we set σ =.15, λ =.7. I (c) ad (d) we set σ =.15, λ =.1. Observe that for λ =.1, the ubiased solutio takes values sigificatly lower compared to the values of the traiig poits. For the smaller λ =.1, the differece betwee (c) ad (d) is reduced (compared to the case λ =.7). However, oe may observe that the ubiased solutio (d) is ot as smooth as the biased solutio (c), especially ear ad 1. Gaussia kerel), the Represeter Theorem states that the solutio of the problem lies i the spa of N particular kerels, those cetered o the traiig poits. I practice, we ofte iclude a bias factor to the solutio of kerel-based regularized miimizatio tasks, that is, we assume that f admits a represetatio of the form f = θ κ(,x )+b, (1.1) =1 where b R. This has bee show to improve the performace of the respective algorithms [43, 36], for two mai reasos. Firstly, the itroductio of the bias, b, elarges the family of fuctios i which we search for a solutio, thus leadig to potetially better estimatios. Moreover, as the regularizatio factor Ω( f H ) pealizes the values of f at the traiig poits, the resultig solutio teds to take values as close to zero as possible, for large values of λ (see figure 1.1). The use of the bias factor is theoretically justified by the semi-parametric represeter theorem. Theorem 1.5.4 (Semi-parametric Represeter Theorem). Suppose that i additio to the assumptios of theorem 1.5.3, we are give a set of M real valued fuctios {ψ m } M m=1 : X R, with the property that the N M matrix (ψ m (x )),m has rak M. The ay f := f+h, with f H ad h spa{ψ m ; m = 1,...,M}, solvig mi f L((x 1,y 1, f(x 1 )),...,(x N,y N, f(x N ))+Ω( f H ), admits a represetatio of the form M f = θ κ(,x )+ b m ψ m ( ), (1.11) =1 m=1 with θ R, b m R, for all = 1,dots,N, m = 1,...,M. The followig results ca be used for the costructio of ew kerels.

1.5. PROPERTIES OF RKHS 15 Propositio 1.5.4 (Coformal Trasformatios). If f : X R is ay fuctio, the κ 1 (x,y) = f(x)f(y) is a reproducig kerel. Moreover, if κ is ay other reproducig kerel the κ (x,y) = f(x)κ(x,y)f(y) is also a reproducig kerel. Proof. The first part is a direct cosequece of theorem 1.5.1. For the secod part, cosider x 1,...,x N X ad a 1,...,a N R. The Moreover, as,m=1 a a m f(x )κ(x,x m )f(x m ) = cos( (Φ (x),φ (y))) = =,m=1 m a a m f(x )f(x m ) Φ(x m ),Φ(x ) H N = a m f(x m )Φ(x m ), a f(x )Φ(x ) = a f(x )Φ(x ). f(x)κ(x,y)f(y) f(x)κ(x,x)f(x) (f(y)κ(y,y)f(y) κ(x, y) = cos( (Φ(x),Φ(y))), κ(x,x) (κ(y,y)) H this trasformatio of the origial kerel, preserves agles i the feature space. Theorem 1.5.5 (Restrictio of a kerel). Let H be a RKHS o X with respective kerel κ. The κ restricted to the set X 1 X is the reproducig kerel of the class H 1 of all restrictios of fuctios of H to the subset X 1. The respective orm of ay such restricted fuctio f 1 H 1 (origiatig from f H) has orm f 1 H1 = mi{ f H, f H : f X1 = f 1 }. Propositio 1.5.5 (Normalizatio of a kerel). Let H be a RKHS o X with respective kerel κ. The ˆκ(x,y) = κ(x, y) κ(x,x)κ(y,y), (1.1) is also a positive defiite kerel o X. Note that ˆκ(x,y) 1, for all x,y X. Proof. Let x 1,x,...,x N X ad c 1,...,c N be real umbers. The,m=1 as κ is a positive defiite kerel. c c mˆκ(x,x m ) = =,m=1,m=1 c c m κ(x,x m ) κ(x,x )κ(x m,x m ) c c m κ(x,x ) κ(xm,x m ) κ(x,x m ), Theorem 1.5.6 (Sum of kerels). Let H 1,H be two RKHS s o X with respective kerels κ 1,κ. The κ = κ 1 +κ is also a reproducig kerel. The correspodig RKHS, H, cotais the fuctios f = f 1 +f, where f i H i, i = 1,. The respective orm is defied by f H = mi{ f 1 + f, for all f = f 1 +f,f i H i,i = 1,}.

16 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Proof. It is trivial to show that κ 1 +κ is a positive defiite kerel. The difficult part is to associate this kerel with the specific RKHS H. Cosider the Hilbert space F = H 1 H. The respective ier product ad the correspodig orm are defied as (f 1,f ),(g 1,g ) F = f 1,g 1 H1 + f,g H, (f 1,f ) F = f 1 H 1 + f H, for f 1,g 1, H 1 ad f,g H. If H 1 ad H have oly i commo, it easy to show that there is a oe-to-oe correspodece betwee F ad H = H 1 +H, as each f H ca be decomposed ito two parts (oe belogig to H 1 ad the other i H ) uiquely. The difficult part is to discover such a relatio, if H = H 1 H is larger tha {}. To make this fact clear, cosider this simple example: Let H 1 ad H be the liear classes of polyomials of orders up to 1 ad up to respectively. Obviously, H = H 1 +H = H, as H 1 H. Let f(x) = x +5x, f H. The f ca be decomposed ito two parts (oe belogig to H 1 ad the other i H ) i more tha oe ways. For example f(x) = (x ) +(5x), or f(x) = (x +4x)+(x), or f(x) = (x + x) + (3x), e.t.c. Thus, the mappig betwee f = f 1 + f H ad (f 1,f ) F is ot oe-to-oe. However, i such cases, we ca still fid a smaller subspace of F, which ca be idetified to H. To this ed, defie F = {(f, f), f H }. It is clear that F is a liear subspace of F. We will show that it is a closed oe. Cosider the covergig sequece i F : (f, f ) ( f 1, f ). The f f 1 ad f f. Thus f 1 = f ad ( f 1, f ) is i F. As F is a closed liear subspace of F, we may cosider its complemetary subspace F : F = F F. As a ext step, cosider the liear trasformatio T : F H : T(f 1,f ) = f 1 + f. The kerel of this trasformatio is the subspace F. Hece, there is a oe-to-oe correspodece betwee F ad H. Cosider the iverse trasformatio T 1 ad let T 1 (f) = (f,f ), for f H, where f H 1 ad f H, i.e., through T 1 we decompose f uiquely ito two compoets, oe i H 1 ad the other i H. This decompositio eables us to defie a ier product i H, i.e., f,g H = f +f,g +g H = f,g H1 + f,g H = (f,f ),(g,g ) F, for f,g H. To prove that to this H there correspods the kerel κ = κ 1 + κ, we make the followig remarks: 1. For every y X, κ(,y) = κ 1 (,y)+κ (,y) H.. For every y X, let T 1 (κ(,y)) = (κ (,y),κ (,y)). Thus κ(x,y) = κ (x,y)+κ (x,y) = κ 1 (x,y)+κ (x,y), adcosequetlyκ 1 (x,y) k (x,y) = (κ (x,y) κ (x,y)). Thismeasthat(κ 1 (x,y) k (x,y),κ (x,y) κ (x,y)) F. Hece, for every y X we have f(y) =f (y)+f (y) = f,κ 1 (,y) H1 + f,κ (,y) H = (f,f ),(κ 1 (,y),κ (,y)) F = (f,f ),(κ (,y),κ (,y)) F + (f,f ),(κ 1 (,y) κ (,y),κ (,y) κ (,y) F. As (κ 1 (x,y) k (x,y),κ (x,y) κ (x,y)) F ad (f,f ) F, we coclude that f(y) = (f,f ),(κ (,y),κ (,y) F = f +f,κ (,y)+κ (,y) H = f,κ(,y) H. This is the reproducig property. Fially, to prove the last part of the theorem, cosider agai f H ad let f i H i, i = 1,, such that f = f 1 +f ad let f H 1 ad f H be the uiquedecompositio of f through T 1. As f 1 +f = f +f we obtai that f f 1 = (f f ), which implies that (f f 1,f f ) F. Thus, we take: f 1 H 1 + f H = (f 1,f ) F = (f,f ) F + (f 1 f,f f ) F = f H 1 + f H + (f 1 f,f f ) F = f H + (f 1 f,f f ) F.

1.6. DOT PRODUCT AND TRANSLATION INVARIANT KERNELS 17 From the last relatio we coclude that f H = f 1 H 1 + f H, if ad oly if f 1 = f ad f = f. I this case we take the miimum value of f 1 H 1 + f H, for all possible decompositios f = f 1 +f. This completes the proof. Despite the sum of kerels, other operatios preserve reproducig kerels as well. Below, we give a extesive list of such operatios. For a descriptio of the iduced RKHS ad a formal proof (i the cases that are ot cosidered here) the iterested reader may refer to [, 37]. 1. If κ(x,y) is a positive defiite kerel o X, the λκ(x,y) is also a positive defiite kerel for ay λ. It is obvious that i this case H(λκ) = H(κ), if λ >. If λ =, the H() = {}.. If κ 1 (x,y) ad κ (x,y) are positive defiite kerels o X, the κ 1 (x,y)+κ (x,y) is also a positive defiite kerel, as Theorem 1.5.6 established. 3. If κ 1 (x,y) ad κ (x,y) are positive defiite kerels o X, the κ 1 (x,y) κ (x,y) is also a positive defiite kerel. 4. If κ (x,y) are positive defiite kerels o X, such that lim κ (x,y) = κ(x,y), for all x,y X, the κ(x,y) is also a positive defiite kerel. 5. If κ(x,y) is a positive defiite kerel o X ad p(z) is a polyomial with o-egative coefficiets, the p(κ(x, y)) is also a positive defiite kerel. 6. If κ(x,y) is a positive defiite kerel o X, the e κ(x,y) is also a positive defiite kerel. To prove this, cosider the Taylor expasio formula of e z, which may be cosider as a limit of polyomials with o-egative coefficiets. 7. If κ(x,y) is a positive defiite kerel o X ad Ψ : X X is a fuctio, the κ(ψ(x),ψ(y)) is a positive defiite kerel o X. 8. If κ 1 (x,y) ad κ (x,y ) are positive defiite kerels o X ad X respectively, the their tesor product (κ 1 κ )(x,y,x,y ) = κ 1 (x,y)κ (x,y ), is a kerel o X X. 9. If κ 1 (x,y) ad κ (x,y ) are positive defiite kerels o X ad X respectively, the their direct sum (κ 1 κ )(x,y,x,y ) = κ 1 (x,y)+κ (x,y ), is a kerel o X X. 1.6 Dot product ad traslatio ivariat kerels There are two importat classes of kerels that follow certai rules ad are widely used i practice. The first oe icludes the dot product kerels, which are fuctios defied as κ(x,y) = f( x,y ), for some real fuctio f. The secod class are the traslatio ivariat kerels, which are defied as κ(x,y) = f(x y), for some real fuctio f defied o X. The followig theorems establish ecessary ad sufficiet coditios for such fuctios to be reproducig kerels. Theorem 1.6.1 (Power Series of dot product kerels). Let f : R R. A fuctio κ(x,y) = f( x,y ) defied o X, such that f has the power series expasio f(t) = a t, is a positive defiite kerel, if ad oly if we have a for all. Theorem 1.6. (Bocher s- Fourier Criterio for traslatio ivariat kerels). Let f : X R. A fuctio κ(x,y) = f(x y) defied o X R ν, is a positive defiite kerel, if the Fourier trasform F[k](ω) = (π) N e i ω,x f(x)dx is o-egative. Remark 1.6.1. Bocher s theorem is more geeral, ivolvig Borel measures ad topological spaces. For the sake of simplicity we give oly this simple form. X

18 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Employig the tools provided i this sectio, oe ca readily prove the positivity of some of the kerels give i sectio 1.4. For example: Homogeeous Polyomial Kerel: As x,y is a positive defiite kerel ad p(z) = z d is a polyomial with o-egative coefficiets, p( x,y ) = ( x,y ) d is a positive defiite kerel. Ihomogeeous Polyomial Kerel: As x,y is a positive defiite kerel, ad p(z) = (z + c) d is a polyomial with o-egative coefficiets (for positive c), p( x,y ) = (c+ x,y ) d is a positive defiite kerel. The cosie kerel: Note that cos( (x,y)) = x,y x y. Thus the cosie kerel is the ormalizatio of the simple kerel x, y. To prove that the Gaussia ad the Laplacia are positive kerels we eed aother set of tools. This is the topic of the ext sectio. 1.7 The Gaussia kerel ad other traslatio ivariat kerels As the Gaussia kerel is the most widely used i applicatios, we dedicate this sectio to preset some of its most importat properties. We begi our study showig that the gaussia radial basis fuctio is ideed a reproducig kerel. To this ed, we itroduce some ew otios. Defiitio 1.7.1 (Negative Defiite Kerel). Let X be a set. A fuctio κ : X X R is called a egative defiite kerel if it is symmetric, i.e., κ(y,x) = κ(x,y), ad,m=1 c c m κ(x,x m ), for ay x 1,...,x N X ad c 1,...,c N R, with N =1 c =. Examples of egative kerels are the costat fuctios ad all fuctios of the form κ, where κ is a positive defiite kerel. Furthermore, the followig propositio holds: Propositio 1.7.1. Let X be a o empty set, the fuctios ψ k : X X R be egative kerels ad α k >, for k N. The Ay positive combiatio of a fiite umber of egative kerels is also a egative kerel, i.e., ψ = k α kψ k, with α 1,...,α > is a egative kerel. The limit of ay covergig sequece of egative kerels is also a egative kerel, i.e. if ψ(x,y) = lim k ψ k (x,y), for all x,y X, the ψ is a egative kerel. Proof. For the first part, cosider the umbers c 1,...,c N such that N =1 c =, x 1,...,x N X ad K N. The,m=1 K c c m α k ψ k (x,x m ) = k=1 Fially, to prove the secod part we take: K N α k k=1,m=1 c c m ψ k (x,x m ). c c m ψ(x,x m ) =,m=1,m=1 c c m lim k ψ k (x,x m ) = lim k N,m=1 c c m ψ k (x,x m ).

1.7. THE GAUSSIAN KERNEL AND OTHER TRANSLATION INVARIANT KERNELS 19 Lemma 1.7.1. Let X be a oempty set, V be a vector space equipped with a ier product ad T : X V. The the fuctio is a egative defiite kerel o X. ψ(x,y) = T(x) T(y) V Proof. Cosider the umbers c 1,...,c N such that N =1 c = ad x 1,...,x N X. The,m=1 c c m T(x ) T(x m ) V = N = =,m=1,m=1 c c m T(x ) T(x m ),T(x ) T(x m V c c m ( T(x ) V + T(x m) V T(x ),T(x m ) V T(x m ),T(x ) V ) N c m c T(x ) V + m=1 =1 N c T(x ), =1 N c c m T(x m ) V =1 c m T(x m ) m=1 m=1 V N c m T(x m ), c T(x ) m=1 =1 V. As N =1 c =, the first two terms of the summatio vaish ad we take:,m=1 c c m T(x ) T(x m ) V = c T(x ) =1 V. Thus ψ is a egative defiite kerel. Lemma 1.7.. Let ψ : X X R be a fuctio. Fix x X ad defie κ(x,y) = ψ(x,y)+ψ(x,x )+ψ(x,y) ψ(x,x ). The ψ is a egative defiite kerel if ad oly if κ is a positive defiite kerel. Proof. Let x 1,...,x N X. For the if part, cosider the umbers c 1,...,c N such that N =1 c =. The,m=1 c c m κ(x,y m ) = + = +,m=1,m=1,m=1 c c m ψ(x,x m )+ c c m ψ(x,x m ) c c m ψ(x,x m )+ N c c m ψ(x,x m ) =1 m=1,m=1,m=1 c c m ψ(x,x ) c c m ψ(x,x ) N c m c ψ(x,x ) N c m c ψ(x,x ). As N =1 c = ad N,m=1 c c m κ(x,y m ), we take that N,m=1 c c m ψ(x,y m ). Thus ψ is a egative defiite kerel. For the coverse, take c 1,...,c N R ad defie c = N =1 c. By this simple trick, we geerate the umbers c,c 1,...,c N R, which have the property N = c =. As ψ is a egative defiite kerel, we m=1 m=1

CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES take that N,m= c c m ψ(x,x m ), for ay x X. Thus,,m= c c m ψ(x,x m ) = = =,m=1,m=1 =,m=1,m=1,m=1 c c m ψ(x,x m )+ c c m ψ(x,x m ) c c m ψ(x,x )+ c c m ψ(x,x m )+ m=1,m=1,m=1 c c m ψ(x,x m ) c c m ψ(x,x ) c c ψ(x,x )+c ψ(x,x ) =1 c c m (ψ(x,x m ) ψ(x,x m ) ψ(x,x )+ψ(x,x )) c c m κ(x,x m ). Thus N,m=1 c c m κ(x,x m ) ad κ is a positive defiite kerel. Theorem 1.7.1 (Schoeberg). Let X be a oempty set ad ψ : X X R. The fuctio ψ is a egative kerel if ad oly if exp( tψ) is a positive defiite kerel for all t. Proof. For the if part, recall that 1 exp( tψ(x,y)) ψ(x,y) = lim. t t As exp( tψ) is positive defiite, exp( tψ) is egative defiite ad the result follows from Propositio 1.7.1. It suffices to prove the coverse for t = 1, as if ψ is a egative defiite kerel so is tψ, for ay t. Take x X ad defie the positive defiite kerel κ(x,y) = ψ(x,y)+ψ(x,x )+ψ(x,y) ψ(x,x ) (Lemma 1.7.). The e ψ(x,y) = e ψ(x,x ) e κ(x,y) e ψ(x,y) e κ(x,x ). Let f(x) = e ψ(x,x ). The, as ψ is a egative kerel ad therefore symmetric, oe ca readily prove that the last relatio ca be rewritte as e ψ(x,y) = e κ(x,x ) f(x)e κ(x,y) f(y). Sice e κ(x,x ) is a positive umber, employig the properties of positive kerels give i sectio 1.5, we coclude that e ψ(x,y) is a positive defiite kerel. Corollary 1.7.1. The Gaussia radial basis fuctio is a reproducig kerel. Although all properties of positive kerels do ot apply to egative kerels as well (for example the product of egative kerels is ot a egative kerel), there are some other operatios that preserve egativity. Propositio 1.7.. Let ψ : X X R be egative defiite. I this case: 1. If ψ(x,x), for all x X, the ψ p (x,y) is egative defiite for ay < p 1.. If ψ(x,x), for all x X, the log(1+ψ(x,y)) is egative defiite. 3. If ψ : X X (,+ ), the logψ(y,x) is egative defiite. Proof. We give a brief descriptio of the proofs.

1.7. THE GAUSSIAN KERNEL AND OTHER TRANSLATION INVARIANT KERNELS 1 1. We use the formula: ψ(x,y) p = p Γ(1 p) t p 1( 1 e tψ(x,y)) dt, where the Gamma fuctio is give by Γ(z) = e t t z dt. As e tψ(x,y) is positive defiite (Theorem 1.7.1) ad 1, t p 1 are positive umbers, it is ot difficult to prove that the expressio iside the itegral is egative defiite for all t >.. Similarly, we use the formula: log(1+ψ(x,y)) = e t t ( 1 e tψ(x,y)) dt. 3. For ay c >, log(ψ(x,y)+1/c) = log(1+cψ(x,y)) log(c). We ca prove that the secod part is egative defiite. The, by takig the limit c, oe completes the proof. As a direct cosequece, oe ca prove that sice x y is a egative kerel, so is x y p, for ay < p 1. Thus, for ay < p, x y p is a egative kerel ad exp( t x y p ) is a positive kerel for ay t >. Therefore, for p = we take aother proof of the positivity of the Gaussia radial basis fuctio. I additio, for p = 1 oe cocludes that the Laplacia radial basis fuctio is also a positive kerel. Moreover, for the Gaussia kerel the followig importat property has bee proved. Theorem 1.7. (Full rak of thegaussia RBF Gram matrices). Suppose that x 1,...,x N X are distict poits ad σ. The Gram matrix give by K,m = exp ( x x m ) σ, has full rak. As a cosequece, for ay choice of discrete poits x 1,...,x N, we have that N m=1 a mκ(x,x m ) =, for all = 1,...,N, if ad oly if a 1 = = a N =. However, observe that for ay a 1,...,a N N a m κ(x,x m ) = a m κ(,x m ),κ(,x ) H = a m κ(,x m ),κ(,x ) = f,κ(,x ) H, m=1 m=1 wheref = N m=1 a mκ(,x m ) spa{κ(,x m ), m = 1,...,N}. Iadditio, if for a f spa{κ(,x m ), m = 1,...,N} we have that f,κ(,x ) H = for all = 1,...,N, if ad oly if f =. Hece, if f is orthogoal to all Φ(x ), the f =. We coclude that f = N m=1 a mκ(,x m ) = if ad oly if a 1 = = a N =. Therefore, the poits Φ(x m ) = κ(,x m ), m = 1,...,N, are liearly idepedet, provided that o two x m are the same. Hece, a Gaussia kerel defied o a domai of ifiite cardiality, produces a feature space of ifiite dimesio. Moreover, the Gram matrices defied by Gaussia kerels are always strictly positive defiite ad ivertible. I additio, for every x,y X we have that κ(x,x) = 1 ad κ(x,y). This meas that all x X are mapped through the feature map Φ to poits lyig i the surface of the uit sphere of the RKHS H ad that the agle betwee ay two mapped poits Φ(x) ad Φ(y) is betwee o ad 9 o degrees. We coclude this sectio with the followig two importat formulas, which hold for the case of the RKHS iduced by the Gaussia kerel. For the orm of f H, oe ca prove that: f H = X m=1 σ! (O f(x)) dx, (1.13) with O = ad O +1 =, beig the Laplacia ad the gradiet operator. The implicatio of this is that a regularizatio term of the form f H (which is usually adopted i practice) pealizes the H

CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES derivatives of the miimizer. This results to a very smooth solutio of the regularized risk miimizatio problem. Fially, the Fourier trasform of the Gaussia kerel κ σ is give by F[k](ω) = σ exp ( σ ω ). (1.14) 1.8 The Complex case It has already bee metioed i sectio 1., that the geeral theory of RKHS was developed by the mathematicias for geeral complex Hilbert spaces. I this cotext, a positive defiite matrix is defied as a Hermitia matrix K = (K i,j ) N satisfyig c H K c = N,N i=1,j=1 c ic j K i,j, for all c i C, i = 1,...,N, where the otatio deotes the cojugate elemet ad H the cojugate traspose matrix. The geeral theory cosiders liear classes of complex fuctios uder the field of complex umbers, i.e., the scalar product c f is defied with complex umbers (c C) where the multiplicatio is the stadard complex oe. The defiitio of a complex RKHS is idetical to oe give i the real case. A complex Hilbert space H will be called a RKHS, if the followig two importat properties hold: 1. For every x X, κ(,x) belogs to H.. κ has the so called reproducig property, i.e., i particular κ(x,y) = κ(,y),κ(,x) H. f(x) = f,κ(,x) H, for all f H, The mai differece with the real case lies i the defiitio of the complex ier product, where the liearity ad the symmetry properties do ot hold. Recall that i the case of complex Hilbert spaces the ier product is sesqui-liear (i.e., liear i oe argumet ad ati-liear i the other) ad Hermitia: af +bg,h H = a f,h H +b g,h H, f,ag +bh H = a f,g H +b f,h H, f,g H = g,f H, for all f,g,h H, ad a,b C. I the real case, we established the symmetry coditio κ(x,y) = κ(,y),κ(,x) H = κ(x,y) = κ(,x),κ(,y) H. However, sice i the complex case the ier product is Hermitia, the aforemetioed coditio is equivalet to κ(x,y) = ( κ(,x),κ(,y) H ). As a cosequece, almost all theorems that have bee established i sectios 1.,??, 1.5 ad 1.7, for the real case are actually special cases of more geeral oes, that ivolve complex fuctios ad umbers (excludig the oes that explicitly eed real umbers - e.g. Schoeberg s theorem, Propositio 1.7., some of the properties metioed i sectio??, e.t.c.). There are, however, certai differeces (due to the complex ier product) that must be stressed out. For example, the expasio of the kerel i terms of a orthoormal basis of H, which is give i Theorem 1.5. becomes κ(x,y) = s S e s(y) e s (x), the kerels i the Propositio 1.5.4 regardig the coformal trasformatios become κ(x,y) = f(x)f(y) ad κ (x,y) = f(x)κ(x,y)f(y), e.t.c. Complex reproducig kerels, that have bee extesively studied by the mathematicias, are, amog others, the Szego kerels, i.e, κ(z,w) = 1 1 w z, for Hardy spaces o the uit disk, ad the Bergma kerels, 1 i.e., κ(z,w) =, for Bergma spaces o the uit disk, where z, w < 1 [31]. Aother complex kerel (1 w z) of great importace is the complex Gaussia kerel: ( ) d i=1 κ σ,c d(z,w) := exp (z i wi ), (1.15) σ

1.9. DIFFERENTIATION IN HILBERT SPACES 3 defied o C d C d, where z,w C d, z i deotes the i-th compoet of the complex vector z C d ad exp is the exteded expoetial fuctio i the complex domai. It ca be show that κ σ,c d is a complex valued kerel with parameter σ. Its restrictio κ σ := ( ) κ σ,c d is the well kow real Gaussia kerel. A R d R d explicit descriptio of the RKHSs of these kerels, together with some importat properties ca be foud i [4]. 1.9 Differetiatio i Hilbert spaces 1.9.1 Fréchet s Differetiatio I the followig sectios we will develop cost fuctios defied o RKHS, that are suitable for miimizatio tasks related with adaptive filterig problems. As most miimizatio procedures ivolve computatio of gradiets or subgradiets, we devote this sectio to study differetiatio o Hilbert spaces. The otio of Fréchet s Differetiability, which geeralizes differetiability to geeral Hilbert spaces, lies at the core of this aalysis. Defiitio 1.9.1. (Fréchet s Differetial) Let H be a Hilbert space o a field F (typically R or C), T : H F a operator ad f H. The operator T is said to be Fréchet differetiable at f, if there exists a θ H such that T(f +h) T(f) h,θ H lim =, (1.16) h H h H where, H is the dot product of the Hilbert space H ad H =, H is the iduced orm. The elemet θ H is called the gradiet of the operator at f, ad is usually deoted as T(f). This relates to the stadard gradiet operator kow by Calculus i Euclidea spaces. The Fréchet s Differetial is also kow as Strog Differetial. There is also a weaker defiitio of Differetiability, amed Gâteaux s Differetial (or Weak Differetial), which is a geeralizatio of the directioal derivative. The Gâteaux differetial dt(f,ψ) F of T at f H i the directio ψ H is defied as T(f +ǫψ) T(f ) dt(f,ψ) = lim. (1.17) ǫ ǫ I the followig, wheever we are referrig to a derivative or a gradiet we will mea the oe produced by Fréchet s otio of differetiability. The iterested reader is addressed to [14, 3, 3, 18, 34, 44], (amogst others) for a more detailed discussio o the subject. The well kow properties of the derivative of a real valued fuctio of oe variable, which are kow from elemetary Calculus, apply to the Fréchet s derivatives as well. Below we summarize some of these properties. For the first three we cosider the operators T 1,T : H F differetiable at f H ad λ F: 1. Sum. (T 1 +T )(f) = T 1 (f)+ T (f).. Scalar Product. (λt 1 )(f) = λ T 1 (f). 3. Product Rule. (T 1 T )(f) = T (f) T 1 (f)+t 1 (f) T (f). 4. Chai Rule. Cosider T 1 : H F differetiable at f H ad T : F F differetiable at y = T 1 (f) F, the (T T 1 )(f) = T (T 1(f)) T 1 (f). The followig simple examples demostrate the differetiatio procedure i arbitrary spaces. Example 1.9.1. Cosider the real Hilbert space H, with ier product, H, ad T : H R : T(f) = f,ψ H, where ψ H fixed. We ca easily show (usig Fréchet s defiitio) that T is differetiable at ay f H ad that T(f) = ψ. Example 1.9.. Cosider the real Hilbert space H, with ier product, H, ad T : H R : T(f) = f,f H. We ca easily show (usig Fréchet s defiitio) that T is differetiable at ay f H ad that T(f) = f.