Online learning in Reproducing Kernel Hilbert Spaces

Size: px
Start display at page:

Download "Online learning in Reproducing Kernel Hilbert Spaces"

Transcription

1 Olie learig i Reproducig Kerel Hilbert Spaces Patelis Bouboulis, Member, IEEE, 1 May 1, 1 1 P. Bouboulis is with the Departmet of Iformatics ad telecommuicatios, Uiversity of Athes, Greece, (see bouboulis.mysch.gr).

2

3 Chapter 1 Reproducig Kerel Hilbert Spaces I kerel-based methods, the otio of the Reproducig Kerel Hilbert Space (RKHS) plays a crucial role. A RKHS is a rich costruct (roughly, a space of fuctios with a ier product), which has bee prove to be a very powerful tool. Kerel based methods are utilized i a icreasigly large umber of scietific areas, especially where o-liear models are required. For example, i patter aalysis, a classificatio task of a set X R m is usually reformed by mappig the data ito a higher dimesioal space (possibly of ifiite dimesio) H, which is a Reproducig Kerel Hilbert Space (RKHS). The advatage of such a mappig is to make the task more tractable, by employig a liear classifier i the feature space H, exploitig Cover s theorem (see [43, 37]). This is equivalet with solvig a o-liear problem i the origial space. Therefore, with the use of kerels, a ew techique has bee itroduced to trasform certai classes of o-liear tasks to equivalet liear oes, restated i a higher eve ifiite dimesioal space, but with avoidig the accompayig computatioal ad geeralizatio theory s (also kow as the curse of dimesioality ) problems, associated with the traditioal techiques, whe the dimesioality of the task icreases. Similar approaches have bee used i pricipal compoets aalysis, i Fisher s liear discrimiat aalysis, i clusterig, regressio, image processig ad i may other subdisciplies. Recetly, processig i RKHS is gaiig i popularity withi the Sigal Processig commuity i the cotext of adaptive learig. The itroductio of o-liearity is usually itroduced via a computatioally elegat way kow to the machie learig commuity as the kerel trick [36] (the formal defiitio of the positive defiite kerel is give i sectio 1.): Give a algorithm, which is formulated i terms of dot products, oe ca costruct a alterative algorithm by replacig each oe of the dot products with a positive defiite kerel κ. Although this trick works well for most applicatios, it coceals the basic mathematical steps that uderlie the procedure, which are essetial if oe seeks a deeper uderstadig of the problem. These steps are: 1) Map the fiite dimesioality iput data from the iput space X (usually X R ν ) ito a higher dimesioality (possibly ifiite) RKHS H (this is usually called the feature space) ad ) Perform a liear processig (e.g., adaptive filterig) o the mapped data i H. The procedure is equivalet with a o-liear processig (o-liear filterig) i X (see figure 1.1). The specific choice of the kerel κ defies, implicitly, a RKHS with a appropriate ier product. Moreover, the specific choice of the kerel defies the type of oliearity that uderlies the model to be used. 1.1 A Historical overview I the past, there have bee two treds i the study of these spaces by the mathematicias. The first oe origiated i the theory of itegral equatios by J. Mercer [5, 6]. He used the term positive defiite kerel to characterize a fuctio of two poits κ(x,y) defied o X, which satisfies Mercer s law:,m=1 a a m κ(x,x m ), (1.1) 3

4 4 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Figure 1.1: Mappig from iput space X to feature space H. for ay umbers a,a m ad poits x,x m. Later o, Moore [7, 8, 9] foud that to such a kerel there correspods a well determied class of fuctios, H, equipped with a specific ier product, H, i respect to which the kerel κ possesses the so called reproducig property: f(y) = f,κ(,y) H, (1.) for all fuctios f H ad y X. Those that followed this tred used to cosider a specific give positive defiite kerel κ ad studied it i itself, or evetually applied it i various domais (such as itegral equatios, theory of groups, geeral metric theory, iterpolatio, e.t.c.). The class H correspodig to κ was maily used as a tool of research ad it was usually itroduced a posteriori. The work of Bocher [5, 6], which itroduced the otio of the positive defiite fuctio i order to apply it i the theory of Fourier trasforms, also belogs to the same path as the oe followed by Mercer ad Moore. These are cotiuous fuctios φ of oe variable such that φ(x y) = κ(x,y), for some positive defiite kerel κ. O the other had, those who followed the secod tred were primarily iterested i the class of fuctios H, while the associated kerel was employed essetially as a tool i the study of the fuctios of this class. This tred is traced back to the works of S. Zaremba [47, 48] durig the first decade of the -th cetury. He was thefirstto itroducetheotio of akerel, whichcorrespodsto aspecificclass of fuctios adto state its reproducig property. However, he did ot develop ay geeral theory, or did he gave ay particular ame to the kerels he itroduced. I this, secod tred, the mathematicias were primarily iterested i the study of the class of fuctios H ad the correspodig kerel κ, which satisfies the reproducig property, was used as a tool i this study. To the same tred belog also the works of Bergma [4] ad Aroszaj []. Those two treds evolved separately durig the first decades of the -th cetury, but soo the liks betwee them were oticed. After the secod world war, it was kow that the two cocepts of defiig a kerel, either as a positive defiite kerel, or as a reproducig kerel, are equivalet. Furthermore, It was proved that there is a oe to oe correspodece betwee the space of positive defiite kerels ad the space of reproducig kerel Hilbert spaces. It has to be emphasized that examples of such kerels have bee kow for a log time prior to the works of Mercer ad Zaremba; for example, all the Gree s fuctios of self-adjoit ordiary differetial equatios belog to this type of kerels. However, the some of the importat properties that these kerels possess have oly bee realized ad used i the begiig of the -th cetury ad sice the have bee the focus of research. I the followig, we will give a more detailed descriptio of these spaces ad establish their mai properties, focussig o the essetials that elevate them to such a powerful tool i the cotext of machie learig. Most of the material preseted here ca also be foud i more detail i several other textbooks, such as the celebrated paper of Aroszaj [], the excellet itroductory text of Paulse [31] ad

5 1.. DEFINITION 5 the popular books of Schölkoph ad Smola [37] ad Shawe-Taylor ad Cristiaii [39]. Here, we attempt to portray both treds ad to highlight the importat liks betwee them. Although the geeral theory applies to complex spaces, to keep the presetatio as simple as possible, we will maily focus o real spaces. The complex case will be treated at the ed of this sectio. 1. Defiitio We begi our study with the classic defiitios o positive defiite matrices ad kerels as they were itroduced by Mercer. Give a fuctio κ : X X R ad x 1,...,x N X (typically X is a compact subset of R ν, ν > ), thesquarematrix K = (K,m ) N with elemets K,m = κ(x,x m ), for,m = 1,...,N, is called the Gram matrix (or kerel matrix) of κ with respect to x 1,...,x N. A symmetric matrix K = (K,m ) N satisfyig c T K c = =1,m=1 c c m K,m, for all c R N, = 1,...,N, where the otatio T deotes the traspose matrix, is called positive defiite. I matrix aalysis literature, this is the defiitio of a positive semidefiite matrix. However, as positive defiite matrices were origially itroduced by Mercer ad others i this cotext, we employ the term positive defiite, as it was already defied. If the iequality is strict, for all o-zero vectors c R N, the matrix will be called strictly positive defiite. A fuctio κ : X X R, which for all N N ad all x 1,...,x N X gives rise to a positive defiite Gram matrix K, is called a positive defiite kerel. I the followig, we will frequetly refer to a positive defiite kerel simply as kerel. We coclude that a positive defiite kerel is symmetric ad satisfies =1,m=1 c c m κ(x,x m ), for all c R N, = 1,...,N, ad x 1,...,x N X. Formally, a Reproducig kerel Hilbert space is defied as follows: Defiitio 1..1 (Reproducig Kerel Hilbert Space). Cosider a liear class H of real valued fuctios, f, defied o a set X. Suppose, further, that i H we ca defie a ier product, H with correspodig orm H ad that H is complete with respect to that orm, i.e., H is a Hilbert space. We call H a Reproducig Kerel Hilbert Space (RKHS), if there exists a fuctio κ : X X F with the followig two importat properties: 1. For every x X, κ(,x) belogs to H (or equivaletly κ spas H, i.e., H = spa{κ(,x), x X}).. κ has the so called reproducig property, i.e., f(x) = f,κ(,x) H, for all f H,x X, (1.3) i particular κ(x,y) = κ(,y),κ(,x) H. Furthermore, κ is a positive defiite kerel ad the mappig Φ : X H, with Φ(x) = κ(,x), for all x X is called the feature map of H. To deote the RKHS associated with a specific kerel κ we will also use the otatio H(κ). Note that H is ofte called the feature space associated with kerel κ. Furthermore, uder the aforemetioed otatios κ(x,y) = Φ(y),Φ(x) H, i.e., κ(x,y) is the ier product of Φ(y) ad Φ(x) i the feature space. This is the essece of the kerel trick metioed at the begiig of sectio 1. The feature map Φ trasforms the data from the low dimesioality space X to the higher dimesioality space H. Liear processig i H ivolves ier products i H, which ca be calculated via the kerel κ disregardig the actual structure of H. Roughly speakig, oe trades oliearities, which is ofte hard to hadle, for a icrease i the dimesioality of the space.

6 6 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES 1.3 Derivatio of the Defiitio I the followig, we cosider the defiitio of a RKHS as a class of fuctios with specific properties (followig the secod tred) ad show the key ideas that uderlie defiitio To that ed, cosider a liear class H of real valued fuctios, f, defied o a set X. Suppose, further, that i H we ca defie a ier product, H with correspodig orm H ad that H is complete with respect to that orm, i.e., H is a Hilbert space. Cosider, also, a liear fuctioal T, from H ito the field R. A importat theorem of fuctioal aalysis states that such a fuctioal is cotiuous, if ad oly if it is bouded. The space cosistig of all cotiuous liear fuctioals from H ito the field R is called the dual space of H. I the followig, we will frequetly refer to the so called liear evaluatio fuctioal T y. This is a special case of a liear fuctioal that satisfies T y (f) = f(y), for all f H. We call H a Reproducig Kerel Hilbert Space (RKHS) o X over R, if for every y X, the liear evaluatio fuctioal, T y, is cotiuous. We will prove that such a space is related to a positive defiite kerel, thus providig the first lik betwee the two treds. Subsequetly, we will prove that ay positive defiite kerel defies implicitly a RKHS, providig the secod lik ad cocludig the equivalet defiitio of RKHS (defiitio 1..1), which is usually used i the machie learig literature. The followig theorem establishes a importat coectio betwee a Hilbert space H ad its dual space. Theorem (Riesz Represetatio). Let H be a geeral Hilbert space ad let H deote its dual space. Every elemet Φ of H ca be uiquely expressed i the form: for some φ H. Moreover, Φ H = φ H. Φ(f) = f,φ H, Followig the Riesz represetatio theorem, we have that for every y X, there exists a uique elemet κ y H, such that for every f H, f(y) = T y (f) = f,κ y H. The fuctio κ y is called the reproducig kerel for the poit y ad the fuctio κ(x,y) = κ y (x) is called the reproducig kerel of H. I additio, ote that κ y,κ x H = κ y (x) = κ(x,y) ad T y H = κ y H = κ y,κ y H = κ(y,y). Propositio The reproducig kerel of H is symmetric, i.e., κ(x, y) = κ(y, x). Proof. Observe that κ y,κ x H = κ y (x) = κ(x,y) ad κ x,κ y H = κ x (y) = κ(y,x). As the ier product of H is symmetric (i.e., κ y,κ x H = κ x,κ y H ) the result follows. I the followig, we will frequetly idetify the fuctio κ y with the otatio κ(,y). Thus, we write the reproducig property of H as: f(y) = f,κ(,y) H, (1.4) for ay f H, y X. Note that due to the uiqueess provided by the Riesz represetatio theorem, κ is the uique fuctio that satisfies the reproducig property. The followig propositio establishes the first lik betwee the positive defiite kerels ad the reproducig kerels. Propositio The reproducig kerel of H is a positive defiite kerel. Proof. Cosider N >, the real umbers a 1,a,...a N ad the elemets, x 1,x,...,x N X. The =1m=1 a a m κ(x,x m ) = =1m=1 m=1 a a m κ(,x m ),κ(,x ) H = N = a m κ(,x m ), a κ(,x ) =1 H N a a m κ(,x m ),κ(,x ) =1 m=1 = a κ(,x ) Combiig propositio ad the previous result, we complete the proof. =1 H. H

7 1.3. DERIVATION OF THE DEFINITION 7 Remark Geerally, for a reproducig kerel, the respective Gram matrix is strictly positive defiite. For if ot, the there must exist at least oe o zero vector a such that N =1 a κ(,x ) =. Hece, H for every f H we have that a f(x ) = f, a κ(,x ) H =. Thus, i this case there is a equatio of liear depedece betwee the values of every fuctio i H at some fiite set of poits. Such examples do exist (e.g. Sobolev spaces), but i most cases the reproducig kerels defie Gram matrices that are always strictly positive ad ivertible! The followig propositio establishes a very importat fact; ay RKHS, H, ca be geerated by the respective reproducig kerel κ. Note that the overbar deotes the closure of a set (i.e., if A is a subset of H, Ā is the closure of A). Propositio Let H be a RKHS o the set X with reproducig kerel κ. The the liear spa of the fuctios κ(,x), x X is dese i H, i.e., H = spa{κ(,x), x X}. Proof. We will prove that the oly fuctio of H orthogoal to A = spa{κ(,x), x X} is the zero fuctio. Let f be such a fuctio. The, as f is orthogoal to A, we have that f(x) = f,κ(,x) H =, for every x X. This holds true if ad oly if f =. Thus A = A = {}. Suppose that there is f H such that f A. As A is a closed (covex) subspace of H, there is a g A which miimizes the distace betwee f ad poits i A (theorem of best approximatio). For the same g we have that f g A. Thus, the o-zero fuctio h = f g is orthogoal to A. However, we proved that there is t ay o-zero vector orthogoal to A. This leads us to coclude that A = H. I the followig we give some importat properties of the specific spaces. Propositio (Norm covergece implies poit-wise covergece). Let H be a RKHS o X ad let {f } N H. If lim f f H =, the f(x) = lim f (x), for every x X. Coversely, if for ay sequece {f } N of a Hilbert space H, such that lim f f H = we have also that f(x) = lim f (x), the H is a RKHS. Proof. For every x X we have that f (x) f(x) H = f,κ(,x) H f,κ(,x) H = f f,κ(,x) H f f H κ(,x) H. As lim f f =, we have that lim f (x) f(x) =, for every x X. Hece f(x) = lim f (x), for every x X. For the coverse, cosider the evaluatio fuctioal T y : H R, T y (f) = f(y) for some y H. We will prove that T y is cotiuous for all y H. To this ed, cosider a sequece {f } N of H, with the property lim f f H =, i.e., f coverges to f i the orm. The T y (f ) T y (f) = f (y) f(y), as f(x) = lim f (x). Thus T y (f) = lim T y (f ) for all y X ad all covergig sequeces {f } N of H. Propositio (Differet RKHS s caot have the same reproducig kerel). Let H 1,H be RKHS s o X with reproducig kerels κ 1,κ. If κ 1 (x,y) = κ (x,y), for all x,y X, the H 1 = H ad f H1 = f H for every f. Proof. Let κ(x,y) = κ 1 (x,y) = κ (x,y) ad A i = spa{κ i (,x),x X}, i = 1,. As show i propositio 1.3.3, H i = A i, i = 1,. Note that for ay f A i, i = 1,, we have that f(x) = a κ i (,x ), for some real umbers a ad thus the values of the fuctio are idepedet of whether we regard it as i A 1 or A. Furthermore, for ay f A i, i = 1,, as the two kerels are idetical, we have that f H 1 =,m a a m κ(x m,x ) = f H. Thus, f H1 = f H, for all f A 1 = A. Fially, we tur our attetio to the limit poits of A 1 ad A. If f H 1, the there exists a sequece of fuctios, {f } N A 1 such that lim f f H1 =. Sice {f } N is a covergig sequece, it is Cauchy i A 1 ad thus it is also Cauchy i A. Therefore, there exists g H such that lim g f H =. Employig propositio 1.3.4, we take that f(x) = lim f (x) = g(x). Thus, every f i H 1 is also i H ad by aalogous argumet we ca prove that every g H is also i H 1. Hece H 1 = H ad as f H1 = f H for all f i a dese subset (i.e., A 1 ), we have that the orms are equal for every f. To prove the latter, we use the relatio lim f Hi = f Hi, i = 1,.

8 8 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES The followig theorem is the coverse of propositio It was proved by Moore ad it gives us a characterizatio of reproducig kerel fuctios. Also, it provides the secod lik betwee the two treds that have bee metioed i sectio 1.1. Moore s theorem, together with propositio 1.3., propositio ad the uiqueess property of the reproducig kerel of a RKHS, establishes a oe-to-oe correspodece betwee RKHS s o a set ad positive defiite fuctios o the set. Theorem 1.3. (Moore). Let X be a set ad let κ : X X R be a positive defiite kerel. The there exists a RKHS of fuctios o X, such that κ is the reproducig kerel of H. Proof. We will give oly a sketch of the proof. The iterested reader is referred to [31]. The first step is to defie A = spa{κ(,x), x X} ad the liear map P : A A R such that ( P a m κ(,y m ), ) b κ(,y ) = m b κ(y,y m ). m,ma We prove that P is well defied ad that it satisfies the properties of the ier product. The, give the vector space A ad the ier productp, oe may complete the space by takig equivalece classes of Cauchy sequeces from A to obtai the Hilbert space A. Fially, the reproducig property of the kerel κ with respect to the ier product P is proved. I view of the aforemetioed theorems, the defiitio 1..1 of the RKHS give i 1., which is usually used i the machie learig literature, follows aturally. We coclude this sectio with a short descriptio of the most importat poits of the theory developed by Mercer i the cotext of itegral operators. Mercer cosidered itegral operators T κ geerated by a kerel κ, i.e., T κ : L (X) L (X), such that (T κ f)(x) := X κ(x,y)f(y)dy. He cocluded the followig theorems [5]: Theorem (MercerKerelsarepositivedefiite). Let X R ν be a oempty set ad let κ : X X R be cotiuous. The κ is a positive defiite kerel if ad oly if b b a a f(x)κ(x,y)f(y)dxdy, for all cotiuous fuctios f o X. Moreover, if κ is positive defiite, the itegral operator T κ : L (X) L (X) : (T κ f)(x) := X κ(x,y)f(y)dy is positive defiite ad if ψ i L (X) are the ormalized orthogoal eigefuctios of T k associated with the eigevalues λ i > the: κ(x,y) = i λ i ψ i (x)ψ i (y). Note that the origial form of above theorem is more geeral, ivolvig σ-algebras ad probability measures. However, as i the applicatios cocerig this mauscript such geeral terms are of o importace, we decided to iclude this simpler form. The previous theorems established that Mercer s kerels, as they are positive defiite kerels, are also reproducig kerels. Furthermore, the first part of theorem provides a useful tool of determiig whether a specific fuctio is actually a reproducig kerel. Before closig this sectio, we should emphasize that the geeral theory of RKHS has bee developed by the mathematicias to treat complex spaces. However, for the sake of simplicity ad clarity, we decided to begi with the simplest real case. Besides, most kerel based methods ivolve real data sets. Nevertheless, keep i mid that all the theorems preseted here ca be geeralized to treat complex spaces. We will explore this issue further i sectio Examples of Kerels Before proceedig to some more advaced topics i the theory of RKHS, it is importat to give some examples of kerels that appear more ofte i the literature ad are used i various applicatios. Perhaps

9 1.5. PROPERTIES OF RKHS 9 the most widely used reproducig kerel is the Gaussia radial basis fuctio defied o X X, where X R ν, as: ) κ σ (x,y) = exp ( x y σ, (1.5) where σ >. Equivaletly the Gaussia RBF fuctio ca be defied as: for t >. κ t (x,y) = exp ( t x y ), (1.6) (a) (b) Figure 1.: (a) The Gaussia kerel for the case X = R, σ =.5. (b) The elemet Φ() = κ(,) of the feature space iduced by the Gaussia kerel for various values of the parameter σ. Other well-kow kerels defied i X X, X R ν are: The homogeeous polyomial kerel: κ d (x,y) = x,y d. The ihomogeeous polyomial kerel: κ d (x,y) = ( x,y +c) d, where c a costat. The splie kerel: κ p (x,y) = B p+1 ( x y ), where B = i=1 I [ 1,1 ]. The cosie kerel: κ(x, y) = cos( (x, y)). The Laplacia kerel: κ t (x,y) = exp( t x y ). Figures 1., 1.3, 1.4, 1.5, 1.6, show some of the aforemetioed kerels together with a sample of the elemets κ(,x) that spa the respective RKHS s for the case X = R. Figures 1.7, 1.8, 1.9, show some of the elemets κ(,x) that spa the respective RKHS s for the case X = R. Iteractive figures regardig the aforemetioed examples ca be foud i Properties of RKHS I this sectio, we will refer to some more advaced topics o the theory of RKHS, which are useful for a deeper uderstadig of the uderlyig theory ad show why RKHS s costitute such a powerful tool. We begi our study with some properties of RKHS s ad coclude with the basic theorems that eable us to geerate ew kerels. As we work i Hilbert spaces, the two Parseval s idetities are a extremely helpful tool. Whe {e s : s S} (where S is a arbitrary set) is a orthoormal basis for a Hilbert space H, the for ay h H we have that: h = s S h,e s e s, (1.7) h = h,e s. (1.8) s S

10 1 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES 5 5 (a) (b) Figure 1.3: (a) The homogeeous polyomial kerel for the case X = R, d = 1. (b) The elemet Φ(x ) = κ(,x ) of the feature space iduced by the homogeeous polyomial kerel (d = 1) for various values of x. 4 (a) (b) Figure 1.4: (a) The homogeeous polyomial kerel for the case X = R, d =. (b) The elemet Φ(x ) = κ(,x ) of the feature space iduced by the homogeeous polyomial kerel (d = ) for various values of x (a) (b) Figure 1.5: (a) The ihomogeeous polyomial kerel for the case X = R, d =. (b) The elemet Φ(x ) = κ(,x ) of the feature space iduced by the ihomogeeous polyomial kerel (d = ) for various values of x.

11 PROPERTIES OF RKHS (a) (b) Figure 1.6: (a) The Laplacia kerel for the case X = R, t = 1. (b) The elemet Φ() = κ(, ) of the feature space iduced by the Laplacia kerel for various values of the parameter t (a) - - (b) - (c) - (d) Figure 1.7: The elemet Φ() = κ(, ) of the feature space iduced by the Gaussia kerel (X = R ) for various values of the parameter σ. (a) σ =.5, (b) σ =.8, (c) σ = 1, (d) σ = (a) - - (b) - (c) - (d) Figure 1.8: The elemet Φ(x ) = κ(, x ) of the feature space iduced by the Gaussia kerel (X = R ) with σ =.5. (a) x = (, )T, (b) x = (, 1)T, (c) x = (1, )T, (d) x = (1, 1)T (a) (b) - (c) - - (d) Figure 1.9: The elemet Φ() = κ(, ) of the feature space iduced by the Laplacia kerel (X = R ) for various values of the parameter t. (a) t =.5, (b) t = 1, (c) t =, (d) t = 4.

12 1 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Note that these two idetities hold for a geeral arbitrary set S (ot ecessarily ordered). The covergece i this case is defied somewhat differetly. We say that h = s S h s, if for ay ǫ >, there exists a fiite subset F F, such that for ay fiite set F: F F S, we have that h s S h s < ǫ. Propositio (Cauchy-Schwarz Iequality). If κ is a reproducig kerel o X the κ(x,y) κ(x,x) κ(y,y). Proof. The proof is straightforward, as κ(x,y) is the ier product Φ(y),Φ(x) H of the space H(κ). Theorem Every fiite dimesioal class of fuctios defied o X, equipped with a ier product, is a RKHS. Let h 1,...,h N costitute a basis of the space ad the ier product is defied as follows f,g =,m=1 α,m γ ζ m, for f = N =1 γ h ad g = N =1 ζ h. Let A = (α,m ) N, ad B = (β,m ) N be its iverse. The the kerel of the RKHS is give by κ(x,y) =,m=1 Proof. The reproducig property is immediately verified by equatio 1.9: N f,κ(,x) H = γ h, =,k=1 =f(x). =1,m=1 β,m h (x)h m (y), (1.9) β,m h m (x) h m ( N α,m β m,k )γ h k (x) = m=1 H =,m=1 γ h (x) =1 N α,m γ β m,k h k (x) k=1 The followig theorem gives the kerel of a RKHS (of fiite or ifiite dimesio) i terms of the elemets of a orthoormal basis. Theorem Let H be a RKHS o X with reproducig kerel κ. If {e s : s S N} is a orthoormal basis for H, the κ(x,y) = s S e s(y)e s (x), where this series coverges poitwise. Proof. For ay y X we have that κ(,y),e s H = e s,κ(,y) H = e s (y). Hece, employig Parseval s idetity (1.7), we have that κ(,y) = s S e s(y)e s ( ), where these sums coverge i the orm o H. Sice the sums coverge i the orm, they coverge at every poit. Hece, κ(x,y) = s S e s(y)e s (x). Propositio If H is a RKHS o X with respective kerel κ the every closed subspace F H is also a RKHS. I additio, if F 1 (κ 1 ) ad F (κ ) are complemetary subspaces of H the κ = κ 1 +κ. Propositio Let H be a RKHS o X with kerel κ ad {g } is a orthoormal system i H. The for ay sequece of umbers {a } such that a < (i.e., {a } l ) we have ( )1 a g (x) κ(x,x) 1 a.

13 1.5. PROPERTIES OF RKHS 13 Proof. We have see that g (y) = g,κ(,y) ad that κ(,y) H = κ(y,y). Thus, cosiderig that g s are orthoormal ad takig the Parseval s idetity (1.8) for κ(, y) with respect to the orthoormal basis we have: g (y) = g,κ(,y) H = κ(,y) H = κ(y,y). Therefore, applyig the Cauchy-Schwartz iequality we take ( )1 ( )1 a g (x) a g (x) κ(x,x) 1 ( a )1. Theorem (Represeter Theorem). Deote by Ω : [,+ ) R a strictly mootoic icreasig fuctio, by X a oempty set ad by L : X R R { } a arbitrary loss fuctio. The each miimizer f H of the regularized miimizatio problem: mi f L((x 1,y 1,f(x 1 )),...,(x N,y N,f(x N ))+Ω( f H ), admits a represetatio of the form f = N =1 a κ(,x ). Proof. We may decompose each f H ito a part cotaied i the spa of the kerels cetered at the traiig poits, i.e., κ(,x 1 ),...,κ(,x N ), (which is a closed liear subspace) ad a part i the orthogoal complemet of the previous spa. Thus each f ca be writte as: f = a κ(,x )+f. =1 Applyig the reproducig property ad cosiderig that f,κ(,x ) H =, for = 1,...,N, we take: f(x ) = f,κ(,x ) H = a i κ(x,x i )+ f,κ(,x ) H = i=1 a i κ(x,x i ). Thus, the value of the loss fuctio L depeds oly o the part cotaied i the spa of the kerels cetered at the traiig poits, i.e., o a 1,...,a N. Furthermore, for all f we have: Ω( f ) = Ω a κ(,x ) + f H Ω a κ(,x ). =1 Thus, for ay fixed a 1,...,a the value of the cost fuctio is miimized for f =. Hece, the solutio of the miimizatio task will have to obey this property too. Examples of loss fuctios L as the oes metioed i Theorem are for example the MSE: =1 i=1 ad the l 1 mea error L((x 1,y 1,f(x 1 )),...,(x N,y N,f(x N )) = (f(x ) y ), =1 L((x 1,y 1,f(x 1 )),...,(x N,y N,f(x N )) = f(x ) y. The aforemetioed theorem is of great importace to practical applicatios. Although oe might be tryig to solve a optimizatio task i a ifiite dimesioal RKHS H (such as the oe that geerated by the =1

14 14 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES (a) (b) (c) (d) 1 Figure 1.1: Solvig the regressio problem mi N f N =1 (y f(x )) + λ f H, o a set of 11 poits (a), (c) with a bias, i.e., f admits the form of (1.1) ad (b), (d) without a bias, i.e., f admits the form of (1.11). I (a) ad (b) we set σ =.15, λ =.7. I (c) ad (d) we set σ =.15, λ =.1. Observe that for λ =.1, the ubiased solutio takes values sigificatly lower compared to the values of the traiig poits. For the smaller λ =.1, the differece betwee (c) ad (d) is reduced (compared to the case λ =.7). However, oe may observe that the ubiased solutio (d) is ot as smooth as the biased solutio (c), especially ear ad 1. Gaussia kerel), the Represeter Theorem states that the solutio of the problem lies i the spa of N particular kerels, those cetered o the traiig poits. I practice, we ofte iclude a bias factor to the solutio of kerel-based regularized miimizatio tasks, that is, we assume that f admits a represetatio of the form f = θ κ(,x )+b, (1.1) =1 where b R. This has bee show to improve the performace of the respective algorithms [43, 36], for two mai reasos. Firstly, the itroductio of the bias, b, elarges the family of fuctios i which we search for a solutio, thus leadig to potetially better estimatios. Moreover, as the regularizatio factor Ω( f H ) pealizes the values of f at the traiig poits, the resultig solutio teds to take values as close to zero as possible, for large values of λ (see figure 1.1). The use of the bias factor is theoretically justified by the semi-parametric represeter theorem. Theorem (Semi-parametric Represeter Theorem). Suppose that i additio to the assumptios of theorem 1.5.3, we are give a set of M real valued fuctios {ψ m } M m=1 : X R, with the property that the N M matrix (ψ m (x )),m has rak M. The ay f := f+h, with f H ad h spa{ψ m ; m = 1,...,M}, solvig mi f L((x 1,y 1, f(x 1 )),...,(x N,y N, f(x N ))+Ω( f H ), admits a represetatio of the form M f = θ κ(,x )+ b m ψ m ( ), (1.11) =1 m=1 with θ R, b m R, for all = 1,dots,N, m = 1,...,M. The followig results ca be used for the costructio of ew kerels.

15 1.5. PROPERTIES OF RKHS 15 Propositio (Coformal Trasformatios). If f : X R is ay fuctio, the κ 1 (x,y) = f(x)f(y) is a reproducig kerel. Moreover, if κ is ay other reproducig kerel the κ (x,y) = f(x)κ(x,y)f(y) is also a reproducig kerel. Proof. The first part is a direct cosequece of theorem For the secod part, cosider x 1,...,x N X ad a 1,...,a N R. The Moreover, as,m=1 a a m f(x )κ(x,x m )f(x m ) = cos( (Φ (x),φ (y))) = =,m=1 m a a m f(x )f(x m ) Φ(x m ),Φ(x ) H N = a m f(x m )Φ(x m ), a f(x )Φ(x ) = a f(x )Φ(x ). f(x)κ(x,y)f(y) f(x)κ(x,x)f(x) (f(y)κ(y,y)f(y) κ(x, y) = cos( (Φ(x),Φ(y))), κ(x,x) (κ(y,y)) H this trasformatio of the origial kerel, preserves agles i the feature space. Theorem (Restrictio of a kerel). Let H be a RKHS o X with respective kerel κ. The κ restricted to the set X 1 X is the reproducig kerel of the class H 1 of all restrictios of fuctios of H to the subset X 1. The respective orm of ay such restricted fuctio f 1 H 1 (origiatig from f H) has orm f 1 H1 = mi{ f H, f H : f X1 = f 1 }. Propositio (Normalizatio of a kerel). Let H be a RKHS o X with respective kerel κ. The ˆκ(x,y) = κ(x, y) κ(x,x)κ(y,y), (1.1) is also a positive defiite kerel o X. Note that ˆκ(x,y) 1, for all x,y X. Proof. Let x 1,x,...,x N X ad c 1,...,c N be real umbers. The,m=1 as κ is a positive defiite kerel. c c mˆκ(x,x m ) = =,m=1,m=1 c c m κ(x,x m ) κ(x,x )κ(x m,x m ) c c m κ(x,x ) κ(xm,x m ) κ(x,x m ), Theorem (Sum of kerels). Let H 1,H be two RKHS s o X with respective kerels κ 1,κ. The κ = κ 1 +κ is also a reproducig kerel. The correspodig RKHS, H, cotais the fuctios f = f 1 +f, where f i H i, i = 1,. The respective orm is defied by f H = mi{ f 1 + f, for all f = f 1 +f,f i H i,i = 1,}.

16 16 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Proof. It is trivial to show that κ 1 +κ is a positive defiite kerel. The difficult part is to associate this kerel with the specific RKHS H. Cosider the Hilbert space F = H 1 H. The respective ier product ad the correspodig orm are defied as (f 1,f ),(g 1,g ) F = f 1,g 1 H1 + f,g H, (f 1,f ) F = f 1 H 1 + f H, for f 1,g 1, H 1 ad f,g H. If H 1 ad H have oly i commo, it easy to show that there is a oe-to-oe correspodece betwee F ad H = H 1 +H, as each f H ca be decomposed ito two parts (oe belogig to H 1 ad the other i H ) uiquely. The difficult part is to discover such a relatio, if H = H 1 H is larger tha {}. To make this fact clear, cosider this simple example: Let H 1 ad H be the liear classes of polyomials of orders up to 1 ad up to respectively. Obviously, H = H 1 +H = H, as H 1 H. Let f(x) = x +5x, f H. The f ca be decomposed ito two parts (oe belogig to H 1 ad the other i H ) i more tha oe ways. For example f(x) = (x ) +(5x), or f(x) = (x +4x)+(x), or f(x) = (x + x) + (3x), e.t.c. Thus, the mappig betwee f = f 1 + f H ad (f 1,f ) F is ot oe-to-oe. However, i such cases, we ca still fid a smaller subspace of F, which ca be idetified to H. To this ed, defie F = {(f, f), f H }. It is clear that F is a liear subspace of F. We will show that it is a closed oe. Cosider the covergig sequece i F : (f, f ) ( f 1, f ). The f f 1 ad f f. Thus f 1 = f ad ( f 1, f ) is i F. As F is a closed liear subspace of F, we may cosider its complemetary subspace F : F = F F. As a ext step, cosider the liear trasformatio T : F H : T(f 1,f ) = f 1 + f. The kerel of this trasformatio is the subspace F. Hece, there is a oe-to-oe correspodece betwee F ad H. Cosider the iverse trasformatio T 1 ad let T 1 (f) = (f,f ), for f H, where f H 1 ad f H, i.e., through T 1 we decompose f uiquely ito two compoets, oe i H 1 ad the other i H. This decompositio eables us to defie a ier product i H, i.e., f,g H = f +f,g +g H = f,g H1 + f,g H = (f,f ),(g,g ) F, for f,g H. To prove that to this H there correspods the kerel κ = κ 1 + κ, we make the followig remarks: 1. For every y X, κ(,y) = κ 1 (,y)+κ (,y) H.. For every y X, let T 1 (κ(,y)) = (κ (,y),κ (,y)). Thus κ(x,y) = κ (x,y)+κ (x,y) = κ 1 (x,y)+κ (x,y), adcosequetlyκ 1 (x,y) k (x,y) = (κ (x,y) κ (x,y)). Thismeasthat(κ 1 (x,y) k (x,y),κ (x,y) κ (x,y)) F. Hece, for every y X we have f(y) =f (y)+f (y) = f,κ 1 (,y) H1 + f,κ (,y) H = (f,f ),(κ 1 (,y),κ (,y)) F = (f,f ),(κ (,y),κ (,y)) F + (f,f ),(κ 1 (,y) κ (,y),κ (,y) κ (,y) F. As (κ 1 (x,y) k (x,y),κ (x,y) κ (x,y)) F ad (f,f ) F, we coclude that f(y) = (f,f ),(κ (,y),κ (,y) F = f +f,κ (,y)+κ (,y) H = f,κ(,y) H. This is the reproducig property. Fially, to prove the last part of the theorem, cosider agai f H ad let f i H i, i = 1,, such that f = f 1 +f ad let f H 1 ad f H be the uiquedecompositio of f through T 1. As f 1 +f = f +f we obtai that f f 1 = (f f ), which implies that (f f 1,f f ) F. Thus, we take: f 1 H 1 + f H = (f 1,f ) F = (f,f ) F + (f 1 f,f f ) F = f H 1 + f H + (f 1 f,f f ) F = f H + (f 1 f,f f ) F.

17 1.6. DOT PRODUCT AND TRANSLATION INVARIANT KERNELS 17 From the last relatio we coclude that f H = f 1 H 1 + f H, if ad oly if f 1 = f ad f = f. I this case we take the miimum value of f 1 H 1 + f H, for all possible decompositios f = f 1 +f. This completes the proof. Despite the sum of kerels, other operatios preserve reproducig kerels as well. Below, we give a extesive list of such operatios. For a descriptio of the iduced RKHS ad a formal proof (i the cases that are ot cosidered here) the iterested reader may refer to [, 37]. 1. If κ(x,y) is a positive defiite kerel o X, the λκ(x,y) is also a positive defiite kerel for ay λ. It is obvious that i this case H(λκ) = H(κ), if λ >. If λ =, the H() = {}.. If κ 1 (x,y) ad κ (x,y) are positive defiite kerels o X, the κ 1 (x,y)+κ (x,y) is also a positive defiite kerel, as Theorem established. 3. If κ 1 (x,y) ad κ (x,y) are positive defiite kerels o X, the κ 1 (x,y) κ (x,y) is also a positive defiite kerel. 4. If κ (x,y) are positive defiite kerels o X, such that lim κ (x,y) = κ(x,y), for all x,y X, the κ(x,y) is also a positive defiite kerel. 5. If κ(x,y) is a positive defiite kerel o X ad p(z) is a polyomial with o-egative coefficiets, the p(κ(x, y)) is also a positive defiite kerel. 6. If κ(x,y) is a positive defiite kerel o X, the e κ(x,y) is also a positive defiite kerel. To prove this, cosider the Taylor expasio formula of e z, which may be cosider as a limit of polyomials with o-egative coefficiets. 7. If κ(x,y) is a positive defiite kerel o X ad Ψ : X X is a fuctio, the κ(ψ(x),ψ(y)) is a positive defiite kerel o X. 8. If κ 1 (x,y) ad κ (x,y ) are positive defiite kerels o X ad X respectively, the their tesor product (κ 1 κ )(x,y,x,y ) = κ 1 (x,y)κ (x,y ), is a kerel o X X. 9. If κ 1 (x,y) ad κ (x,y ) are positive defiite kerels o X ad X respectively, the their direct sum (κ 1 κ )(x,y,x,y ) = κ 1 (x,y)+κ (x,y ), is a kerel o X X. 1.6 Dot product ad traslatio ivariat kerels There are two importat classes of kerels that follow certai rules ad are widely used i practice. The first oe icludes the dot product kerels, which are fuctios defied as κ(x,y) = f( x,y ), for some real fuctio f. The secod class are the traslatio ivariat kerels, which are defied as κ(x,y) = f(x y), for some real fuctio f defied o X. The followig theorems establish ecessary ad sufficiet coditios for such fuctios to be reproducig kerels. Theorem (Power Series of dot product kerels). Let f : R R. A fuctio κ(x,y) = f( x,y ) defied o X, such that f has the power series expasio f(t) = a t, is a positive defiite kerel, if ad oly if we have a for all. Theorem 1.6. (Bocher s- Fourier Criterio for traslatio ivariat kerels). Let f : X R. A fuctio κ(x,y) = f(x y) defied o X R ν, is a positive defiite kerel, if the Fourier trasform F[k](ω) = (π) N e i ω,x f(x)dx is o-egative. Remark Bocher s theorem is more geeral, ivolvig Borel measures ad topological spaces. For the sake of simplicity we give oly this simple form. X

18 18 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES Employig the tools provided i this sectio, oe ca readily prove the positivity of some of the kerels give i sectio 1.4. For example: Homogeeous Polyomial Kerel: As x,y is a positive defiite kerel ad p(z) = z d is a polyomial with o-egative coefficiets, p( x,y ) = ( x,y ) d is a positive defiite kerel. Ihomogeeous Polyomial Kerel: As x,y is a positive defiite kerel, ad p(z) = (z + c) d is a polyomial with o-egative coefficiets (for positive c), p( x,y ) = (c+ x,y ) d is a positive defiite kerel. The cosie kerel: Note that cos( (x,y)) = x,y x y. Thus the cosie kerel is the ormalizatio of the simple kerel x, y. To prove that the Gaussia ad the Laplacia are positive kerels we eed aother set of tools. This is the topic of the ext sectio. 1.7 The Gaussia kerel ad other traslatio ivariat kerels As the Gaussia kerel is the most widely used i applicatios, we dedicate this sectio to preset some of its most importat properties. We begi our study showig that the gaussia radial basis fuctio is ideed a reproducig kerel. To this ed, we itroduce some ew otios. Defiitio (Negative Defiite Kerel). Let X be a set. A fuctio κ : X X R is called a egative defiite kerel if it is symmetric, i.e., κ(y,x) = κ(x,y), ad,m=1 c c m κ(x,x m ), for ay x 1,...,x N X ad c 1,...,c N R, with N =1 c =. Examples of egative kerels are the costat fuctios ad all fuctios of the form κ, where κ is a positive defiite kerel. Furthermore, the followig propositio holds: Propositio Let X be a o empty set, the fuctios ψ k : X X R be egative kerels ad α k >, for k N. The Ay positive combiatio of a fiite umber of egative kerels is also a egative kerel, i.e., ψ = k α kψ k, with α 1,...,α > is a egative kerel. The limit of ay covergig sequece of egative kerels is also a egative kerel, i.e. if ψ(x,y) = lim k ψ k (x,y), for all x,y X, the ψ is a egative kerel. Proof. For the first part, cosider the umbers c 1,...,c N such that N =1 c =, x 1,...,x N X ad K N. The,m=1 K c c m α k ψ k (x,x m ) = k=1 Fially, to prove the secod part we take: K N α k k=1,m=1 c c m ψ k (x,x m ). c c m ψ(x,x m ) =,m=1,m=1 c c m lim k ψ k (x,x m ) = lim k N,m=1 c c m ψ k (x,x m ).

19 1.7. THE GAUSSIAN KERNEL AND OTHER TRANSLATION INVARIANT KERNELS 19 Lemma Let X be a oempty set, V be a vector space equipped with a ier product ad T : X V. The the fuctio is a egative defiite kerel o X. ψ(x,y) = T(x) T(y) V Proof. Cosider the umbers c 1,...,c N such that N =1 c = ad x 1,...,x N X. The,m=1 c c m T(x ) T(x m ) V = N = =,m=1,m=1 c c m T(x ) T(x m ),T(x ) T(x m V c c m ( T(x ) V + T(x m) V T(x ),T(x m ) V T(x m ),T(x ) V ) N c m c T(x ) V + m=1 =1 N c T(x ), =1 N c c m T(x m ) V =1 c m T(x m ) m=1 m=1 V N c m T(x m ), c T(x ) m=1 =1 V. As N =1 c =, the first two terms of the summatio vaish ad we take:,m=1 c c m T(x ) T(x m ) V = c T(x ) =1 V. Thus ψ is a egative defiite kerel. Lemma Let ψ : X X R be a fuctio. Fix x X ad defie κ(x,y) = ψ(x,y)+ψ(x,x )+ψ(x,y) ψ(x,x ). The ψ is a egative defiite kerel if ad oly if κ is a positive defiite kerel. Proof. Let x 1,...,x N X. For the if part, cosider the umbers c 1,...,c N such that N =1 c =. The,m=1 c c m κ(x,y m ) = + = +,m=1,m=1,m=1 c c m ψ(x,x m )+ c c m ψ(x,x m ) c c m ψ(x,x m )+ N c c m ψ(x,x m ) =1 m=1,m=1,m=1 c c m ψ(x,x ) c c m ψ(x,x ) N c m c ψ(x,x ) N c m c ψ(x,x ). As N =1 c = ad N,m=1 c c m κ(x,y m ), we take that N,m=1 c c m ψ(x,y m ). Thus ψ is a egative defiite kerel. For the coverse, take c 1,...,c N R ad defie c = N =1 c. By this simple trick, we geerate the umbers c,c 1,...,c N R, which have the property N = c =. As ψ is a egative defiite kerel, we m=1 m=1

20 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES take that N,m= c c m ψ(x,x m ), for ay x X. Thus,,m= c c m ψ(x,x m ) = = =,m=1,m=1 =,m=1,m=1,m=1 c c m ψ(x,x m )+ c c m ψ(x,x m ) c c m ψ(x,x )+ c c m ψ(x,x m )+ m=1,m=1,m=1 c c m ψ(x,x m ) c c m ψ(x,x ) c c ψ(x,x )+c ψ(x,x ) =1 c c m (ψ(x,x m ) ψ(x,x m ) ψ(x,x )+ψ(x,x )) c c m κ(x,x m ). Thus N,m=1 c c m κ(x,x m ) ad κ is a positive defiite kerel. Theorem (Schoeberg). Let X be a oempty set ad ψ : X X R. The fuctio ψ is a egative kerel if ad oly if exp( tψ) is a positive defiite kerel for all t. Proof. For the if part, recall that 1 exp( tψ(x,y)) ψ(x,y) = lim. t t As exp( tψ) is positive defiite, exp( tψ) is egative defiite ad the result follows from Propositio It suffices to prove the coverse for t = 1, as if ψ is a egative defiite kerel so is tψ, for ay t. Take x X ad defie the positive defiite kerel κ(x,y) = ψ(x,y)+ψ(x,x )+ψ(x,y) ψ(x,x ) (Lemma 1.7.). The e ψ(x,y) = e ψ(x,x ) e κ(x,y) e ψ(x,y) e κ(x,x ). Let f(x) = e ψ(x,x ). The, as ψ is a egative kerel ad therefore symmetric, oe ca readily prove that the last relatio ca be rewritte as e ψ(x,y) = e κ(x,x ) f(x)e κ(x,y) f(y). Sice e κ(x,x ) is a positive umber, employig the properties of positive kerels give i sectio 1.5, we coclude that e ψ(x,y) is a positive defiite kerel. Corollary The Gaussia radial basis fuctio is a reproducig kerel. Although all properties of positive kerels do ot apply to egative kerels as well (for example the product of egative kerels is ot a egative kerel), there are some other operatios that preserve egativity. Propositio Let ψ : X X R be egative defiite. I this case: 1. If ψ(x,x), for all x X, the ψ p (x,y) is egative defiite for ay < p 1.. If ψ(x,x), for all x X, the log(1+ψ(x,y)) is egative defiite. 3. If ψ : X X (,+ ), the logψ(y,x) is egative defiite. Proof. We give a brief descriptio of the proofs.

21 1.7. THE GAUSSIAN KERNEL AND OTHER TRANSLATION INVARIANT KERNELS 1 1. We use the formula: ψ(x,y) p = p Γ(1 p) t p 1( 1 e tψ(x,y)) dt, where the Gamma fuctio is give by Γ(z) = e t t z dt. As e tψ(x,y) is positive defiite (Theorem 1.7.1) ad 1, t p 1 are positive umbers, it is ot difficult to prove that the expressio iside the itegral is egative defiite for all t >.. Similarly, we use the formula: log(1+ψ(x,y)) = e t t ( 1 e tψ(x,y)) dt. 3. For ay c >, log(ψ(x,y)+1/c) = log(1+cψ(x,y)) log(c). We ca prove that the secod part is egative defiite. The, by takig the limit c, oe completes the proof. As a direct cosequece, oe ca prove that sice x y is a egative kerel, so is x y p, for ay < p 1. Thus, for ay < p, x y p is a egative kerel ad exp( t x y p ) is a positive kerel for ay t >. Therefore, for p = we take aother proof of the positivity of the Gaussia radial basis fuctio. I additio, for p = 1 oe cocludes that the Laplacia radial basis fuctio is also a positive kerel. Moreover, for the Gaussia kerel the followig importat property has bee proved. Theorem 1.7. (Full rak of thegaussia RBF Gram matrices). Suppose that x 1,...,x N X are distict poits ad σ. The Gram matrix give by K,m = exp ( x x m ) σ, has full rak. As a cosequece, for ay choice of discrete poits x 1,...,x N, we have that N m=1 a mκ(x,x m ) =, for all = 1,...,N, if ad oly if a 1 = = a N =. However, observe that for ay a 1,...,a N N a m κ(x,x m ) = a m κ(,x m ),κ(,x ) H = a m κ(,x m ),κ(,x ) = f,κ(,x ) H, m=1 m=1 wheref = N m=1 a mκ(,x m ) spa{κ(,x m ), m = 1,...,N}. Iadditio, if for a f spa{κ(,x m ), m = 1,...,N} we have that f,κ(,x ) H = for all = 1,...,N, if ad oly if f =. Hece, if f is orthogoal to all Φ(x ), the f =. We coclude that f = N m=1 a mκ(,x m ) = if ad oly if a 1 = = a N =. Therefore, the poits Φ(x m ) = κ(,x m ), m = 1,...,N, are liearly idepedet, provided that o two x m are the same. Hece, a Gaussia kerel defied o a domai of ifiite cardiality, produces a feature space of ifiite dimesio. Moreover, the Gram matrices defied by Gaussia kerels are always strictly positive defiite ad ivertible. I additio, for every x,y X we have that κ(x,x) = 1 ad κ(x,y). This meas that all x X are mapped through the feature map Φ to poits lyig i the surface of the uit sphere of the RKHS H ad that the agle betwee ay two mapped poits Φ(x) ad Φ(y) is betwee o ad 9 o degrees. We coclude this sectio with the followig two importat formulas, which hold for the case of the RKHS iduced by the Gaussia kerel. For the orm of f H, oe ca prove that: f H = X m=1 σ! (O f(x)) dx, (1.13) with O = ad O +1 =, beig the Laplacia ad the gradiet operator. The implicatio of this is that a regularizatio term of the form f H (which is usually adopted i practice) pealizes the H

22 CHAPTER 1. REPRODUCING KERNEL HILBERT SPACES derivatives of the miimizer. This results to a very smooth solutio of the regularized risk miimizatio problem. Fially, the Fourier trasform of the Gaussia kerel κ σ is give by F[k](ω) = σ exp ( σ ω ). (1.14) 1.8 The Complex case It has already bee metioed i sectio 1., that the geeral theory of RKHS was developed by the mathematicias for geeral complex Hilbert spaces. I this cotext, a positive defiite matrix is defied as a Hermitia matrix K = (K i,j ) N satisfyig c H K c = N,N i=1,j=1 c ic j K i,j, for all c i C, i = 1,...,N, where the otatio deotes the cojugate elemet ad H the cojugate traspose matrix. The geeral theory cosiders liear classes of complex fuctios uder the field of complex umbers, i.e., the scalar product c f is defied with complex umbers (c C) where the multiplicatio is the stadard complex oe. The defiitio of a complex RKHS is idetical to oe give i the real case. A complex Hilbert space H will be called a RKHS, if the followig two importat properties hold: 1. For every x X, κ(,x) belogs to H.. κ has the so called reproducig property, i.e., i particular κ(x,y) = κ(,y),κ(,x) H. f(x) = f,κ(,x) H, for all f H, The mai differece with the real case lies i the defiitio of the complex ier product, where the liearity ad the symmetry properties do ot hold. Recall that i the case of complex Hilbert spaces the ier product is sesqui-liear (i.e., liear i oe argumet ad ati-liear i the other) ad Hermitia: af +bg,h H = a f,h H +b g,h H, f,ag +bh H = a f,g H +b f,h H, f,g H = g,f H, for all f,g,h H, ad a,b C. I the real case, we established the symmetry coditio κ(x,y) = κ(,y),κ(,x) H = κ(x,y) = κ(,x),κ(,y) H. However, sice i the complex case the ier product is Hermitia, the aforemetioed coditio is equivalet to κ(x,y) = ( κ(,x),κ(,y) H ). As a cosequece, almost all theorems that have bee established i sectios 1.,??, 1.5 ad 1.7, for the real case are actually special cases of more geeral oes, that ivolve complex fuctios ad umbers (excludig the oes that explicitly eed real umbers - e.g. Schoeberg s theorem, Propositio 1.7., some of the properties metioed i sectio??, e.t.c.). There are, however, certai differeces (due to the complex ier product) that must be stressed out. For example, the expasio of the kerel i terms of a orthoormal basis of H, which is give i Theorem 1.5. becomes κ(x,y) = s S e s(y) e s (x), the kerels i the Propositio regardig the coformal trasformatios become κ(x,y) = f(x)f(y) ad κ (x,y) = f(x)κ(x,y)f(y), e.t.c. Complex reproducig kerels, that have bee extesively studied by the mathematicias, are, amog others, the Szego kerels, i.e, κ(z,w) = 1 1 w z, for Hardy spaces o the uit disk, ad the Bergma kerels, 1 i.e., κ(z,w) =, for Bergma spaces o the uit disk, where z, w < 1 [31]. Aother complex kerel (1 w z) of great importace is the complex Gaussia kerel: ( ) d i=1 κ σ,c d(z,w) := exp (z i wi ), (1.15) σ

23 1.9. DIFFERENTIATION IN HILBERT SPACES 3 defied o C d C d, where z,w C d, z i deotes the i-th compoet of the complex vector z C d ad exp is the exteded expoetial fuctio i the complex domai. It ca be show that κ σ,c d is a complex valued kerel with parameter σ. Its restrictio κ σ := ( ) κ σ,c d is the well kow real Gaussia kerel. A R d R d explicit descriptio of the RKHSs of these kerels, together with some importat properties ca be foud i [4]. 1.9 Differetiatio i Hilbert spaces Fréchet s Differetiatio I the followig sectios we will develop cost fuctios defied o RKHS, that are suitable for miimizatio tasks related with adaptive filterig problems. As most miimizatio procedures ivolve computatio of gradiets or subgradiets, we devote this sectio to study differetiatio o Hilbert spaces. The otio of Fréchet s Differetiability, which geeralizes differetiability to geeral Hilbert spaces, lies at the core of this aalysis. Defiitio (Fréchet s Differetial) Let H be a Hilbert space o a field F (typically R or C), T : H F a operator ad f H. The operator T is said to be Fréchet differetiable at f, if there exists a θ H such that T(f +h) T(f) h,θ H lim =, (1.16) h H h H where, H is the dot product of the Hilbert space H ad H =, H is the iduced orm. The elemet θ H is called the gradiet of the operator at f, ad is usually deoted as T(f). This relates to the stadard gradiet operator kow by Calculus i Euclidea spaces. The Fréchet s Differetial is also kow as Strog Differetial. There is also a weaker defiitio of Differetiability, amed Gâteaux s Differetial (or Weak Differetial), which is a geeralizatio of the directioal derivative. The Gâteaux differetial dt(f,ψ) F of T at f H i the directio ψ H is defied as T(f +ǫψ) T(f ) dt(f,ψ) = lim. (1.17) ǫ ǫ I the followig, wheever we are referrig to a derivative or a gradiet we will mea the oe produced by Fréchet s otio of differetiability. The iterested reader is addressed to [14, 3, 3, 18, 34, 44], (amogst others) for a more detailed discussio o the subject. The well kow properties of the derivative of a real valued fuctio of oe variable, which are kow from elemetary Calculus, apply to the Fréchet s derivatives as well. Below we summarize some of these properties. For the first three we cosider the operators T 1,T : H F differetiable at f H ad λ F: 1. Sum. (T 1 +T )(f) = T 1 (f)+ T (f).. Scalar Product. (λt 1 )(f) = λ T 1 (f). 3. Product Rule. (T 1 T )(f) = T (f) T 1 (f)+t 1 (f) T (f). 4. Chai Rule. Cosider T 1 : H F differetiable at f H ad T : F F differetiable at y = T 1 (f) F, the (T T 1 )(f) = T (T 1(f)) T 1 (f). The followig simple examples demostrate the differetiatio procedure i arbitrary spaces. Example Cosider the real Hilbert space H, with ier product, H, ad T : H R : T(f) = f,ψ H, where ψ H fixed. We ca easily show (usig Fréchet s defiitio) that T is differetiable at ay f H ad that T(f) = ψ. Example Cosider the real Hilbert space H, with ier product, H, ad T : H R : T(f) = f,f H. We ca easily show (usig Fréchet s defiitio) that T is differetiable at ay f H ad that T(f) = f.

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Introduction to Optimization Techniques

Introduction to Optimization Techniques Itroductio to Optimizatio Techiques Basic Cocepts of Aalysis - Real Aalysis, Fuctioal Aalysis 1 Basic Cocepts of Aalysis Liear Vector Spaces Defiitio: A vector space X is a set of elemets called vectors

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

TENSOR PRODUCTS AND PARTIAL TRACES

TENSOR PRODUCTS AND PARTIAL TRACES Lecture 2 TENSOR PRODUCTS AND PARTIAL TRACES Stéphae ATTAL Abstract This lecture cocers special aspects of Operator Theory which are of much use i Quatum Mechaics, i particular i the theory of Quatum Ope

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Brief Review of Functions of Several Variables

Brief Review of Functions of Several Variables Brief Review of Fuctios of Several Variables Differetiatio Differetiatio Recall, a fuctio f : R R is differetiable at x R if ( ) ( ) lim f x f x 0 exists df ( x) Whe this limit exists we call it or f(

More information

Abstract Vector Spaces. Abstract Vector Spaces

Abstract Vector Spaces. Abstract Vector Spaces Astract Vector Spaces The process of astractio is critical i egieerig! Physical Device Data Storage Vector Space MRI machie Optical receiver 0 0 1 0 1 0 0 1 Icreasig astractio 6.1 Astract Vector Spaces

More information

Chapter 3 Inner Product Spaces. Hilbert Spaces

Chapter 3 Inner Product Spaces. Hilbert Spaces Chapter 3 Ier Product Spaces. Hilbert Spaces 3. Ier Product Spaces. Hilbert Spaces 3.- Defiitio. A ier product space is a vector space X with a ier product defied o X. A Hilbert space is a complete ier

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

lim za n n = z lim a n n.

lim za n n = z lim a n n. Lecture 6 Sequeces ad Series Defiitio 1 By a sequece i a set A, we mea a mappig f : N A. It is customary to deote a sequece f by {s } where, s := f(). A sequece {z } of (complex) umbers is said to be coverget

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5 Ma 42: Itroductio to Lebesgue Itegratio Solutios to Homework Assigmet 5 Prof. Wickerhauser Due Thursday, April th, 23 Please retur your solutios to the istructor by the ed of class o the due date. You

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

1 6 = 1 6 = + Factorials and Euler s Gamma function

1 6 = 1 6 = + Factorials and Euler s Gamma function Royal Holloway Uiversity of Lodo Departmet of Physics Factorials ad Euler s Gamma fuctio Itroductio The is a self-cotaied part of the course dealig, essetially, with the factorial fuctio ad its geeralizatio

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n Review of Power Series, Power Series Solutios A power series i x - a is a ifiite series of the form c (x a) =c +c (x a)+(x a) +... We also call this a power series cetered at a. Ex. (x+) is cetered at

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Real Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B)

Real Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B) Real Numbers The least upper boud - Let B be ay subset of R B is bouded above if there is a k R such that x k for all x B - A real umber, k R is a uique least upper boud of B, ie k = LUB(B), if () k is

More information

Measure and Measurable Functions

Measure and Measurable Functions 3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Linear Elliptic PDE s Elliptic partial differential equations frequently arise out of conservation statements of the form

Linear Elliptic PDE s Elliptic partial differential equations frequently arise out of conservation statements of the form Liear Elliptic PDE s Elliptic partial differetial equatios frequetly arise out of coservatio statemets of the form B F d B Sdx B cotaied i bouded ope set U R. Here F, S deote respectively, the flux desity

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j The -Trasform 7. Itroductio Geeralie the complex siusoidal represetatio offered by DTFT to a represetatio of complex expoetial sigals. Obtai more geeral characteristics for discrete-time LTI systems. 7.

More information

A) is empty. B) is a finite set. C) can be a countably infinite set. D) can be an uncountable set.

A) is empty. B) is a finite set. C) can be a countably infinite set. D) can be an uncountable set. M.A./M.Sc. (Mathematics) Etrace Examiatio 016-17 Max Time: hours Max Marks: 150 Istructios: There are 50 questios. Every questio has four choices of which exactly oe is correct. For correct aswer, 3 marks

More information

(VII.A) Review of Orthogonality

(VII.A) Review of Orthogonality VII.A Review of Orthogoality At the begiig of our study of liear trasformatios i we briefly discussed projectios, rotatios ad projectios. I III.A, projectios were treated i the abstract ad without regard

More information

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size.

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size. Lecture 7: Measure ad Category The Borel hierarchy classifies subsets of the reals by their topological complexity. Aother approach is to classify them by size. Filters ad Ideals The most commo measure

More information

Numerical Conformal Mapping via a Fredholm Integral Equation using Fourier Method ABSTRACT INTRODUCTION

Numerical Conformal Mapping via a Fredholm Integral Equation using Fourier Method ABSTRACT INTRODUCTION alaysia Joural of athematical Scieces 3(1): 83-93 (9) umerical Coformal appig via a Fredholm Itegral Equatio usig Fourier ethod 1 Ali Hassa ohamed urid ad Teh Yua Yig 1, Departmet of athematics, Faculty

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Solutions to home assignments (sketches)

Solutions to home assignments (sketches) Matematiska Istitutioe Peter Kumli 26th May 2004 TMA401 Fuctioal Aalysis MAN670 Applied Fuctioal Aalysis 4th quarter 2003/2004 All documet cocerig the course ca be foud o the course home page: http://www.math.chalmers.se/math/grudutb/cth/tma401/

More information

Beurling Integers: Part 2

Beurling Integers: Part 2 Beurlig Itegers: Part 2 Isomorphisms Devi Platt July 11, 2015 1 Prime Factorizatio Sequeces I the last article we itroduced the Beurlig geeralized itegers, which ca be represeted as a sequece of real umbers

More information

PAPER : IIT-JAM 2010

PAPER : IIT-JAM 2010 MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

MAS111 Convergence and Continuity

MAS111 Convergence and Continuity MAS Covergece ad Cotiuity Key Objectives At the ed of the course, studets should kow the followig topics ad be able to apply the basic priciples ad theorems therei to solvig various problems cocerig covergece

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Amog the most useful of the various geeralizatios of KolmogorovâĂŹs strog law of large umbers are the ergodic theorems of Birkhoff ad Kigma, which exted the validity of the

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine Lecture 11 Sigular value decompositio Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaie V1.2 07/12/2018 1 Sigular value decompositio (SVD) at a glace Motivatio: the image of the uit sphere S

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Chapter 7: The z-transform. Chih-Wei Liu

Chapter 7: The z-transform. Chih-Wei Liu Chapter 7: The -Trasform Chih-Wei Liu Outlie Itroductio The -Trasform Properties of the Regio of Covergece Properties of the -Trasform Iversio of the -Trasform The Trasfer Fuctio Causality ad Stability

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

Introduction to Functional Analysis

Introduction to Functional Analysis MIT OpeCourseWare http://ocw.mit.edu 18.10 Itroductio to Fuctioal Aalysis Sprig 009 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. LECTURE OTES FOR 18.10,

More information

REGULARIZATION OF CERTAIN DIVERGENT SERIES OF POLYNOMIALS

REGULARIZATION OF CERTAIN DIVERGENT SERIES OF POLYNOMIALS REGULARIZATION OF CERTAIN DIVERGENT SERIES OF POLYNOMIALS LIVIU I. NICOLAESCU ABSTRACT. We ivestigate the geeralized covergece ad sums of series of the form P at P (x, where P R[x], a R,, ad T : R[x] R[x]

More information

Chapter IV Integration Theory

Chapter IV Integration Theory Chapter IV Itegratio Theory Lectures 32-33 1. Costructio of the itegral I this sectio we costruct the abstract itegral. As a matter of termiology, we defie a measure space as beig a triple (, A, µ), where

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics ad Egieerig Lecture otes Sergei V. Shabaov Departmet of Mathematics, Uiversity of Florida, Gaiesville, FL 326 USA CHAPTER The theory of covergece. Numerical sequeces..

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Enumerative & Asymptotic Combinatorics

Enumerative & Asymptotic Combinatorics C50 Eumerative & Asymptotic Combiatorics Stirlig ad Lagrage Sprig 2003 This sectio of the otes cotais proofs of Stirlig s formula ad the Lagrage Iversio Formula. Stirlig s formula Theorem 1 (Stirlig s

More information

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum

More information

CARLEMAN INTEGRAL OPERATORS AS MULTIPLICATION OPERATORS AND PERTURBATION THEORY

CARLEMAN INTEGRAL OPERATORS AS MULTIPLICATION OPERATORS AND PERTURBATION THEORY Kragujevac Joural of Mathematics Volume 41(1) (2017), Pages 71 80. CARLEMAN INTEGRAL OPERATORS AS MULTIPLICATION OPERATORS AND PERTURBATION THEORY S. M. BAHRI 1 Abstract. I this paper we itroduce a multiplicatio

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

PRELIM PROBLEM SOLUTIONS

PRELIM PROBLEM SOLUTIONS PRELIM PROBLEM SOLUTIONS THE GRAD STUDENTS + KEN Cotets. Complex Aalysis Practice Problems 2. 2. Real Aalysis Practice Problems 2. 4 3. Algebra Practice Problems 2. 8. Complex Aalysis Practice Problems

More information

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz

More information

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001. Physics 324, Fall 2002 Dirac Notatio These otes were produced by David Kapla for Phys. 324 i Autum 2001. 1 Vectors 1.1 Ier product Recall from liear algebra: we ca represet a vector V as a colum vector;

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the

More information

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows

More information

Chapter 2. Periodic points of toral. automorphisms. 2.1 General introduction

Chapter 2. Periodic points of toral. automorphisms. 2.1 General introduction Chapter 2 Periodic poits of toral automorphisms 2.1 Geeral itroductio The automorphisms of the two-dimesioal torus are rich mathematical objects possessig iterestig geometric, algebraic, topological ad

More information

The second is the wish that if f is a reasonably nice function in E and φ n

The second is the wish that if f is a reasonably nice function in E and φ n 8 Sectio : Approximatios i Reproducig Kerel Hilbert Spaces I this sectio, we address two cocepts. Oe is the wish that if {E, } is a ierproduct space of real valued fuctios o the iterval [,], the there

More information

Rotationally invariant integrals of arbitrary dimensions

Rotationally invariant integrals of arbitrary dimensions September 1, 14 Rotatioally ivariat itegrals of arbitrary dimesios James D. Wells Physics Departmet, Uiversity of Michiga, A Arbor Abstract: I this ote itegrals over spherical volumes with rotatioally

More information

8. Applications To Linear Differential Equations

8. Applications To Linear Differential Equations 8. Applicatios To Liear Differetial Equatios 8.. Itroductio 8.. Review Of Results Cocerig Liear Differetial Equatios Of First Ad Secod Orders 8.3. Eercises 8.4. Liear Differetial Equatios Of Order N 8.5.

More information

Complex Analysis Spring 2001 Homework I Solution

Complex Analysis Spring 2001 Homework I Solution Complex Aalysis Sprig 2001 Homework I Solutio 1. Coway, Chapter 1, sectio 3, problem 3. Describe the set of poits satisfyig the equatio z a z + a = 2c, where c > 0 ad a R. To begi, we see from the triagle

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero? 2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios

More information

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = = Review Problems ICME ad MS&E Refresher Course September 9, 0 Warm-up problems. For the followig matrices A = 0 B = C = AB = 0 fid all powers A,A 3,(which is A times A),... ad B,B 3,... ad C,C 3,... Solutio:

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Questions and answers, kernel part

Questions and answers, kernel part Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Stochastic Matrices in a Finite Field

Stochastic Matrices in a Finite Field Stochastic Matrices i a Fiite Field Abstract: I this project we will explore the properties of stochastic matrices i both the real ad the fiite fields. We first explore what properties 2 2 stochastic matrices

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

MA131 - Analysis 1. Workbook 3 Sequences II

MA131 - Analysis 1. Workbook 3 Sequences II MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................

More information

1. Hydrogen Atom: 3p State

1. Hydrogen Atom: 3p State 7633A QUANTUM MECHANICS I - solutio set - autum. Hydroge Atom: 3p State Let us assume that a hydroge atom is i a 3p state. Show that the radial part of its wave fuctio is r u 3(r) = 4 8 6 e r 3 r(6 r).

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information