ISSN , Volume 32, Number 2

Size: px
Start display at page:

Download "ISSN , Volume 32, Number 2"

Transcription

1 ISSN , Volume 3, Number This article was published in the above mentioned Springer issue The material, including all portions thereof, is protected by copyright; all rights are held exclusively by Springer Science + Business Media The material is for personal use only; commercial use is not permitted Unauthorized reproduction, transfer and/or use may be a violation of criminal as well as civil law

2 Constr Approx (00) 3: DOI 0007/s Some Properties of Gaussian Reproducing Kernel Hilbert Spaces and Their Implications for Function Approximation and Learning Theory Ha Quang Minh Received: 7 August 008 / Revised: 6 May 009 / Accepted: 9 July 009 / Published online: 9 December 009 Springer Science+Business Media, LLC 009 Abstract We give several properties of the reproducing kernel Hilbert space induced by the Gaussian kernel, along with their implications for recent results in the complexity of the regularized least square algorithm in learning theory Keywords Reproducing kernel Hilbert spaces Gaussian kernel Eigenvalues Learning theory Regularized least square algorithm Mathematics Subject Classification (000) 68T05 68P30 Introduction The theory of reproducing kernel Hilbert spaces (RKHS) has recently emerged as a powerful framework for the problem of learning from data, both from algorithmic and theoretical perspectives (comprehensive treatments are found in, for example, [, 0, ]) Of the many kernels being utilized, the Gaussian kernel is the most widely used and in many cases gives the best performance It is thus of crucial importance, for both theoretical and practical purposes, to have a deep understanding of this kernel and the Hilbert space it induces This paper describes some properties that resulted from our study of learning theory problems While we provide several implications of these properties for learning theory and function approximation, the results are of mathematical interest in their own right For many other interesting properties of the Gaussian RKHS that have appeared elsewhere in the literature, we refer to [5, 7] and the many references therein Communicated by Wolfgang Dahmen HQ Minh ( ) Humboldt Universität zu Berlin, Invalidenstrasse 43, 05 Berlin, Germany minhhaquang@staffhu-berlinde

3 308 Constr Approx (00) 3: Reproducing Kernel Hilbert Spaces Before we state our main results, let us briefly recall RKHS The general theory of reproducing kernel Hilbert spaces was developed by Aronszajn [] Let X be an arbitrary nonempty set Let K : X X R be a symmetric function satisfying: for any finite set of points {x i } N i= in X and real numbers {a i} N i=, N a i a j K(x i,x j ) 0 i,j= K is said to be a positive definite kernel on X There exists a unique Hilbert space H K of functions on X satisfying: K x H K for all x X, where K x (t) = K(x,t); span{k x } x X is dense in H K ; the inner product, K of H K satisfies: f(x)= f,k x K (reproducing property), for all f H K H K is called the Reproducing Kernel Hilbert Space with reproducing kernel K The Gaussian kernel is K(x,t) = exp( x t ), where X R n and σ>0 σ Organization We will state the main results we wish to report in Sect and give their proofs in Sect 3 A discussion of some of their implications for learning theory and function approximation will be given in Sect 4 The proof of Theorem 0, which is necessary for Theorem 5, will be given in Appendix B Finally, Appendix C contains some technical results on the Gamma function that we will need at various points in the paper Main Results of the Paper Notation Let α = (α,,α n ) (N {0}) n, α = n j= α j, x α = x α xα n n, and Cα d = α d!!α n!, the multinomial coefficients Also, by writing Lp (X), dx, we assume that the Lebesgue measure is being used Theorem Let X R n be any set with nonempty interior Let K(x,t) = exp( x t ) Then dim(h σ K ) = and { H K = f = e x } σ w α x α : f K = k! wα (/σ ) k < () α =0 C k α =k α

4 Constr Approx (00) 3: The inner product, K on H K is given by for f = e x σ for H K is f,g K = α =0 w α x α,g = e x σ k! (/σ ) k α =k w α v α C k α α =0 v α x α H K An orthonormal basis { (/σ φ α (x) = ) k Cα k } e x σ k! x α () α =k, Remark Though an orthonormal basis for the RKHS induced by the Gaussian kernels K(x,t) = exp( x t ) has been known in the literature (for example [5] and σ references therein), our approach below using the Weyl inner product leads to a much shorter proof Following are some of the properties of the Gaussian RKHS H K that will be derived from Theorem Theorem Let X R n be any set with nonempty interior Let K(x,t) = exp( x t ) Then H σ K does not contain any polynomial on X, including the nonzero constant function This theorem may be somewhat surprising, given the fact that if X is compact, then the H K induced by the Gaussian kernel is dense in the space C(X) of continuous functions on X (see [4]) This generalizes a result from [5], which shows that H K does not contain the nonzero constant function, using a different method Theorem 3 Let X R n be any set with nonempty interior Let K(x,z) = exp( x z ) The Hilbert space H σ K induced by K on X contains the function exp( μ x ) if and only if 0 <μ< For such μ, the corresponding functions have σ norms given by ( ) exp μ x [ ] n = μ( μ) σ K To discuss Theorem 3, let us use the notation H K,σ Then exp( x ) H σ K, σ, but exp( x )/ H σ K,σ This is not necessarily surprising, since the two Hilbert spaces contain functions that decay at different rates Essentially, Theorem 3 states that the function space H K,σ contains functions with decay rates within a fixed band Remark Setting μ = 0 in Theorem 3 gives another proof that the constant function does not belong to the RKHS H K

5 30 Constr Approx (00) 3: Theorem 4 Let K(x,t) = exp( x t ) on R n R n Then H σ K L (R n ) for any σ>0 This result is in contrast to the following fact: H K (R n ( ) = {f C 0 R n ) R L ( R n) : f K = n e σ ξ (π) n (σ π) n 4 f(ξ) dξ < where f is the Fourier-Plancherel transform of f, that is, H K (R n ) is an infinite-order Sobolev space Thus functions in H K (R n ), which are smooth, are not necessarily integrable Remark 3 We will give two different proofs for Theorem 4 The first proof follows from Theorem and constructs an explicit function in H K that does not belong to L (R n ) The second proof was suggested to the author by one of the referees and invokes a general result from [3] While nonconstructive, it has the advantage of not having to use any explicit computation To state our next results, we need the following connection between the theory of reproducing kernels and integral operators, manifested via Mercer s theorem Let X be a complete, separable metric space, equipped with a finite, Borel measure μ, that is μ(x) <, with supp(μ) = X, ie, the measure of each nonempty open subset is nonzero Let K : X X R be a continuous, symmetric, positive definite kernel satisfying κ = sup K(x,x) < x X Consider the integral operator L K : L μ (X) L μ (X) defined by (L K f )(x) = K(x,t)f(t)dμ(t) X This is a self-adjoint, compact operator with eigenvalues λ λ 0, with the corresponding L μ -normalized eigenfunctions {φ k} k= forming an orthonormal basis for L μ (X) Mercer s theorem (we refer to [4] for more detail) states that K(x,t) = λ k φ k (x)φ k (t), k= where the series converges absolutely for each (x, t) X X and uniformly on compact subsets of X X It follows from Mercer s theorem that H K = Im ( { L / ) } K = f = a k φ k : f K = ak < λ k k=,λ k >0 k=,λ k >0 and the set { λ k φ k } k=,λ k >0 forms an orthonormal basis for H K },

6 Constr Approx (00) 3: Remark 4 Note that for the compactness of L K and Mercer s theorem, we can assume that X is complete and not compact, as in the original version due to Mercer (the interval [0, ],see[7]), or in the treatment given in [4] It suffices to have (see [6]) for all x X and X X K x L μ (X) K(x,t) dμ(x)dμ(t) < By assuming that sup x X K(x,x) < and μ(x) <, as we do here, both of these conditions are satisfied For the Gaussian kernel, the second condition will fail if X = R n and μ is the Lebesgue measure For the polynomial kernels, the second condition could also fail if X = R n (hence K(x,x) is unbounded), evenifμ is a probability measure Let S n ={x R n : x =} be the n-dimensional unit sphere with surface area S n = π n Ɣ( n ) Theorem 5 Let n N, n be fixed Let X = S n Let μ be the uniform measure on S n Let f 0 : S n R be defined by { if x S n + f 0 (x) = (x n 0), if x S n (x n < 0) Let K :[, ] R be a continuous function giving rise to the Mercer kernel K(x,t) = K( x,t ) on S n S n () If n 3, then f 0 / Im(L r K ) for any r n () If K(x,t) = exp( x t ), then f σ 0 / Im(L r K ) for any r>0 Remark 5 The function f 0 above is the Bayes classifier corresponding to the binary classification problem where the two classes lie on the upper and lower hemispheres, respectively, with decision boundary x n = 0, with P ( y = x S+ n ) ( =, P y = x S n ) + = 0, P ( y = x S n ) ( = 0, P y = x S n ) = Remark 6 We recall that if K is continuous, then all functions in H K are continuous (see for example [4]) Since H K = Im(L / K ), we can immediately state that as a discontinuous function, f 0 / Im(L r K ) for any r / Theorem 5 thus extends this result to all r>0 for the Gaussian kernel and all r n (n 3) for a general continuous kernel Our next results will be on the eigenvalues and eigenfunctions of L K on S n The first two give the general formula and the rate of decay of the eigenvalues corresponding to a continuous, symmetric, positive definite kernel on S n, while the

7 3 Constr Approx (00) 3: third computes the eigenvalues corresponding to the Gaussian kernel itself explicitly Together they imply Theorem 5, but the results are of interest by themselves Recall that the space of spherical harmonics of order k on S n, denoted by Y k (n), has dimension (see for example [9]) dim Y k (n) = N(n,k)= (k + n )(k + n 3)! k!(n )! and an orthonormal basis denoted by {Y k,j (n; x)} N(n,k) j= Theorem 6 Let n N, n be fixed Let K :[, ] R be a continuous function giving rise to a continuous, positive definite kernel K(x,t) = K( x,t ) on S n S n Let μ be the Lebesgue measure on S n The eigenvalues λ k of L K : L μ (Sn ) L μ (Sn ) are given by λ k = S n K(t)P k (n; t) ( n 3 t ) dt, each with multiplicity N(n,k), for k Z, k 0 The corresponding eigenfunctions for each λ k are the spherical harmonics {Y k,j (n; x)} N(n,k) j= of order k Theorem 7 Let n N, n 3 be fixed Let K :[, ] R be a continuous function giving rise to a continuous, positive definite kernel K(x,t) = K( x,t ) on S n S n Let μ be the Lebesgue measure on S n The eigenvalues λ k of L K : L μ (Sn ) L μ (Sn ) satisfy λ k κ Sn κ Sn (n )! N(n,k) (k + ) n Theorem 8 Let n N, n, be fixed Let X = S n and μ be the uniform probability distribution on S n For K(x,t) = exp( x t ), σ>0, the eigenvalues of σ L K : L μ (X) L μ (X) are: ( ) ( ) λ k = e /σ n σ n I k+n/ σ Ɣ for all k N {0}, where I denotes the modified Bessel function of the first kind In all three cases, each λ k occurs with multiplicity N(n,k) The corresponding eigenfunctions are the spherical harmonics of order k on S n The λ k s satisfy and are decreasing if σ ( n )/ λ k λ k+ >(k+ n/)σ

8 Constr Approx (00) 3: Proofs of Main Results 3 The Weyl Inner Product and Orthonormal Basis of the Gaussian RKHS Let us prove Theorem Itwasshownin[4] that for X = R n, n N, and K(x,t) = x,t d, d N, wehaveh K = H d (R n ), the linear space of all homogeneous polynomialsofdegreed in R n, with the inner product, being the Weyl inner product on H d (R n ): f,g K = w α v α Cα d α =d for f = α =d w αt α, g = α =d v αt α H K Theorem 9 (Aronszajn) Let H be a separable Hilbert space of functions over X with orthonormal basis {φ k } H is a reproducing kernel Hilbert space if and only if φ k (x) < for all x X The unique kernel K is defined by K(x,y) = φ k (x)φ k (y) Proof of Theorem We will show that the inner product, K in H K is simply a generalization of the Weyl inner product for the homogeneous polynomial space H d (R n ), d N Consider the following expansion: ) x t K(x,t) = exp ( σ = e x σ e t (/σ ) k σ Cα k k! xα t α α =k Let { H 0 = f = e x σ α =0 w α x α k! (/σ ) k w α C k α =k α } < For f H 0, g = e x σ α =0 v α x α H 0, we define the inner product f,g K,0 = k! (/σ ) k α =k w α v α Cα k Let us show that H 0 is itself a Hilbert space under, K,0 For simplicity let n = Then { H 0 = f = e x } σ w k x k k! (/σ ) k w k <

9 34 Constr Approx (00) 3: It is clear that H 0 is an inner product space under, K,0 Its completeness under the induced norm K,0 is equivalent to the completeness of the weighted l sequence space l σ {(w = k ) : ( (wk ) ) / } k! l = σ (/σ ) k w k, which is itself a Hilbert space Thus (H 0, K,0 ) is a Hilbert space If X R n has nonempty interior, then the mononomials x α, α 0, are all distinct It follows from the definition of the inner product, K,0 that the φ α s, as given in (), are orthonormal under, K,0 Since H 0 = span{φ α } α, it follows that the φ α s form an orthonormal basis for (H 0, K,0 ) By Theorem 9 and the relations φ α (x)φ α (t) = K(x,t), α =k φα (x) = K(x,x) = <, α =k it follows that (H 0, K,0 ) is a reproducing kernel Hilbert space of functions on X with kernel K(x,t) Since the RKHS induced by a kernel K on a set X is unique, we must have (H 0, K,0 ) = (H K, K ) 3 Proofs of Theorems, 3, and 4 Proof of Theorem It suffices for us to show the case n = On subsets of R with nonempty interior, we have { H K = f = e x σ w k x k : f K = } σ k k! k wk < Let d Z, d 0 be given but arbitrary Consider the polynomial p(x) = a 0 + a x + +a d x d for arbitrary coefficients a i R Then we have p(x) = e x σ p(x)e x σ = e x σ d i=0 a i x k+i σ k k! Let b j = σ j j! and b j+ = 0forj Z, j 0 Then we can rewrite p(x) as p(x) = e x σ ( 0 i d,j 0,i+j=k a i b j )x k

10 Constr Approx (00) 3: Let w k = 0 i d,j 0,i+j=k a ib j Then p(x) = e x σ w k x k Then we have σ k k! k w k = σ k ( k! k 0 i d,j 0,i+j=k a i b j ) (a) Assume for now that a i 0 for all 0 i d, with a d > 0 (what follows will also be true if a i 0 for all 0 i d, with a d < 0) Then σ k k! k w k k=d = a d σ k k! k a d b k d = a d σ (k+d) (k + d)! k+d b k σ (k+d) ( ) (k + d)! k+d σ k = a d σ d k! d (k + d)! k (k!) The inequality becomes an equality if a d > 0 and a i = 0 for 0 i d, so that this lower bound is sharp Recall Stirling s formula, which states that n! lim n = Then for c πn( n e ) n k = (k+d)!,wehaveford : k (k!) ( ) k + d lim c k + d k k = lim (k + d) d = k k π k k c k / = k For d = 0, we have lim k π In both cases we have c k =,showing that p(x) cannot be a member of H K (b) Consider now the case in which the coefficients a i s have mixed signs, with a d 0 We will make use of two elementary inequalities, the first being that (a + b) ( a b ) for all a,b R and the second, (a b) (a c) for all a c b By the first inequality, we have ( ) a i b j ( a d b k d a d b k d+ + +a 0 b k ) 0 i d,j 0,i+j=k (b) Let d be even By definition of the b j s, we have b j+ = b j+ = 0 Thus, if k d 0 is even, then b j σ (j+) a d b k d+ + +a 0 b k ( a d + + a + a 0 ) σ ( k d b k d + ) a d b k d when k satisfies k 4( a d + + a + a 0 ) + d = A σ a d d Then for k even, k max{d,a d },wehave ( ) a i b j 4 a d b k d 0 i d,j 0,i+j=k and

11 36 Constr Approx (00) 3: by the second elementary inequality above Hence we have σ k k! k w k 4 k even,k max{d,a d } σ k k! k a d b k d, which diverges as in part (a), showing that p(x) is not in H K (b) The case when d is odd is entirely similar Proof of Theorem 3 Let us first consider the case n = Then { H K = f = e x } σ w k x k : f K = σ k k! k wk < Consider the function e μx σ, which is e μx σ = e x σ e (μ )x σ = e x σ ( ) k (μ )k x k σ k k! Thus, w k = ( )k (μ ) k, and w σ k k! j = 0forj k Then σ k k! k w k = If μ 0orμ, then σ 4k (k)! (μ ) k k σ 4k (k!) = (μ ) k (k)! k (k!) (μ ) k (k)! k (k!) (k)! k (k!) = as in the Proof of Theorem above, showing that f / H K in those cases If 0 < μ<, then (μ ) k (k)! k (k )!! k (k!) = + (μ ), (k)!! which converges by the Ratio Test Hence we have σ k k! w k k <, showing that e μx σ H K for 0 <μ<, with norm e μx σ K = For any n N,wehave (μ ) k (k)! k (k!) = k= = (μ ) μ( μ) e μ x σ = e x σ e (μ ) x σ = e x σ n i= k i =0 w ki x k i i = e x σ n k,,k n i= w ki x k i i,

12 Constr Approx (00) 3: giving us e μ x σ K = n {k,,k n }=0 i= σ k i k i! k i w k i = n i= k i =0 σ k i k i! k w i k i The result then follows from the one-dimensional case above by symmetry Proof of Theorem 4 We have for n =, { H K (R) = f = e x σ w k x k : f K = } σ k k! k wk < By formula (), if f(x ) H K (R), then g = ( n j= e x j σ )f (x ) H K (R n ) Since R n R n ( n j= ( n j= e x j σ ) dx dx n = ( σ π ) n <, ) e x j ( ) π n σ dx dx n = σ <, it suffices to show that H K = H K (R) / L (R) Fork 0, let where q> Then σ k k! k w k = w k = k k! (k + ) q/ σ k, (k + ) q < f = e We will show that f/ L (R) for <q 3/ We have R f(x) dx f(x) dx = = = 0 w k e x σ x k dx 0 ( ) k + w k σ k+ Ɣ = σ 0 e x σ x σ w k x k H K w k x k dx since w k 0 for all k, by the Monotone Convergence Theorem, k k! (k + ) q/ Ɣ ( k + )

13 38 Constr Approx (00) 3: Now k!=ɣ(k + ) = π k Ɣ( k+ )Ɣ( k + ) Thus f(x) π /4 σ dx R ( Ɣ( k+ ) ) / Ɣ( k + ) (k + ) q/ By Lemma 0, wehave R f(x) (π) /4 e /4 σ dx > Ɣ( k+ ) Ɣ( k +) = Sk+ π S k > e / (k+) / It thus follows that (k + ) q+ 4 = for 0 <q 3/ Thus f/ L (R) as required This completes the proof 33 A Different Proof for Theorem 4 The proof of Theorem 4 given above uses Theorem and constructs an explicit function in H K that is not in L (R n ) Let us now present a nonconstructive proof, based on results from [3], which have a very general setting For our purpose, let X R n, and μ a Borel measure on X LetK : X X R be a measurable, positive definite kernel For a fixed p, q = p p, the function K is said to be p-bounded if: () the function K x L q μ(x) for almost all x X, () the function L K f L q μ(x), where L K f(x)= X K(x,y)f(y)dμ(y), for all f L p μ(x) We will need the following result from [3] Proposition Assume that the Hilbert space H K induced by K is separable Given p, the following two conditions are equivalent: () H K L p μ(x) () The reproducing kernel K is q-bounded, with q = p p Proof of Theorem 4 For our present setting, X = R n, and μ is the Lebesgue measure on R n We can prove that H K (R n ) is not a proper subset of L (R n ) if we can show that K is not -bounded Let us verify the two conditions for -boundedness For simplicity, it suffices for us to prove for n = and σ = For the Gaussian kernel K(x,y) = exp( (x y) ), it is obvious that K x L (R n ), so condition () is satisfied For condition (), let φ = L (R n ); then L K φ(x)= exp ( (x y) ) dy = π, R which is a constant function and obviously not a part of L (R n ) Thus condition () of -boundedness is not satisfied, and so H K (R n ) is not a proper subset of L (R n )

14 Constr Approx (00) 3: Proofs of Theorems 6, 7, and 8 Proof of Theorem 6 From the Funk-Hecke formula (see Theorem in Appendix A), it immediately follows that spherical harmonics are eigenfunctions of L K, and we also obtain the analytical formula of the corresponding eigenvalues λ k We also know that the normalized eigenfunctions of L K form an orthonormal basis for L μ (Sn ) Since the spherical harmonics {{Y k,j (n; x)} N(n,k) j= } indeed form an orthonormal basis for L μ (Sn ), they are the only eigenfunctions of L K Proof of Theorem 7 We have from the Funk-Hecke formula that λ k = S n K(t)P k (n; t) ( n 3 t ) dt We will make use of the following identities [9]: Pk (n; t) ( t ) n 3 dt = Sn S n N(n,k) ( t ) n 3 dt = Sn S n and By the Cauchy-Schwarz inequality, K(t)P k (n; t) ( n 3 t ) dt ( sup t K(t) ) ( P k (n; t) ( t ( κ P k (n; t) ( t ) )( n 3 dt = κ Sn S n S n N(n,k) S n = κ ) n 3 ) dt ( S n S n ( t ) ) n 3 dt ) N(n,k) Since the λ k s are nonnegative, it follows that λ k κ Sn N(n,k), which is the first inequality We have N(n,k) = (k + n )(k + n 3)! k!(n )! (k + )n, (n )! (k + )(k + )(k + ) (k + n 3) (n )! giving us the second inequality

15 30 Constr Approx (00) 3: Lemma Let K(t) = e rt Then K(t)P k (n; t) ( n 3 t ) dt = πɣ ( n )( ) n/ I k+n/ (r), r where I is the modified Bessel function of the first kind, defined by I ν (x) = j=0 ( ) x ν+j j!ɣ(ν + j + ) Proof We will need the following formula from [6]: ) ν / Ɣ(ν)I ν /(r), e rt( t ) ν dt = π( r where r>0 and ν>0 Recall Rodrigues rule (see [9]), which states that for f C k ([, ]), f(t)p k (n; t) ( n 3 t ) dt = R k (n) f (k) (t) ( t ) k+ n 3 dt, where R k (n) is called the Rodrigues constant and is given by: Applying Rodrigues rule, we have R k (n) = k e rt P k (n; t) ( n 3 t ) dt = R k (n)r k = R k (n)r k π e rt( t ) k+ n 3 ( r Ɣ( n ) Ɣ(k + n ) ) k+n/ Ɣ( k + n ) I k+n/ (r) Substituting in the values of R k (n) and r, we obtain the desired answer Proof of Theorem 8 On S n,wehave ) ( x t exp ( σ = exp ) σ exp ( x,t The explicit formula for λ k thus follows from Theorem 7 and Lemma, with r = σ σ )

16 Constr Approx (00) 3: By definition of the modified Bessel function, we have ( ) ( ) k+n/ ( ) j σ I k+n/ σ = σ j!ɣ(j + k + n/ + ) j=0 ( ) k+n/ ( ) j σ = σ j!(j + k + n/)ɣ(j + k + n/) j=0 ( ) k+n/ < σ k + n/ j=0 ( σ ) j j!ɣ(j + k + n/) ( ) = σ (k + n/) I k+n/ σ, from which we have the inequality λ k λ k+ >(k+ n/)σ The inequality λ k λ k+ thus is satisfied if σ (k +n/) for all k 0 It suffices to require that it holds for k = 0, that is, σ n/ σ ( n )/ 35 Spherical Harmonics Expansions on the Sphere and Proof of Theorem 5 The proof is based on the rate of decay of the eigenvalues λ k of L K : L μ (Sn ) L μ (Sn ) and the Fourier expansion of f 0 on S n in terms of spherical harmonics, which is stated below Theorem 0 Let n N,n be fixed For n =, the standard Fourier expansion on S holds: f 0 (θ) = 4 π sin(k + )θ k + = 4 π(k + ) sin(k + )θ π For n 3, the Fourier expansion of f 0 in the spherical harmonics on S n is: where f 0 (x) = S n A(0, k +,n)= A(0, k +,n)y k+,0, (n; x), π(4k + n)(k + n )! n (k + )! Ɣ( k)ɣ(k + n+ ), and {{Y k,m,j (n; x)} N(n,m) j= } k m=0 is an orthonormal basis for the space Y k(n) of spherical harmonics of order k, as described in Proposition in Appendix B For all n N, n, 3 π 3/ (k + ) S n < S n A (0, k +,n)< 5 π (k + ) 3/ S n

17 3 Constr Approx (00) 3: Proof of Theorem 0 This is given in Appendix B Proof of Theorem 5 From Theorem 0, the Fourier expansion of f 0 on S n consists of precisely one function Y k+,0, (n; x) from each spherical space Y k+ (n) of odd order k +, k 0 If f 0 Im(L r K ), then its Fourier expansion must have the form f 0 = λ r k+ a k+y k+,0, (n; x) Comparing this expansion with the Fourier expansion of f 0, we obtain S a k+ = n A(0, k +,n) λ r k+ We must have ak+ < Since Sn A (0, k +,n)> 3,itfollows that π 3/ (k+) ak+ > 3 Sn π 3/ (k + ) λ r = 3 Sn π k+ 3/ b k For part, where n 3, we have from Theorem 7 that for r n,wehave ( b k κ S n (n )! ) r λ k+ k + (k + ) =, n (k+) κ S n (n )! Thus showing that f 0 / Im(L r K ) when r n For the Gaussian kernel K(x,t) = e x t σ on S n S n in part, we have for all k N, ( n by Theorem 8, with ( ) λ k+ = e /σ σ n I k+n/ σ Ɣ ) S n, λ k+ λ k+3 >(k + n/ + )(k + n/ + )σ 4 for all k 0 We now apply the Ratio Test to the series b k : b k+ lim = lim k b k k lim k ( ) r λk+ (k + ) (k + 3) λ k+3 ( σ 4 (k + n/ + )(k + n/ + ) ) r = for any r>0, showing that the series diverges Thus f 0 / Im(L r K ) for any r>0

18 Constr Approx (00) 3: Several Implications for Learning Theory and Function Approximation 4 Implication of Theorem 4 The main implication of this theorem is the nonfeasibility of L (R n ) norm optimization or regularization in H K (R n ) However, this could still be done on subsets of finite linear combinations of the basis functions K x Furthermore, this phenomenon does not arise when we talk about H K (X), where X is a bounded subset of R n 4 Implication of Theorems 5 and 3 for the Regularized Least Square Algorithm in RKHS and the Regression and Binary Classification Problems in Learning Theory We will first need to discuss the learning setting Assume that the input space X admits as mathematical representation a subset or manifold in some Euclidean space R n, and the output space Y is a subset of the real numbers R It is assumed that there is an unknown probability distribution ρ on Z = X Y with ρ(x,y) = ρ X (x)ρ(y x) In the regression problem (see [4, 0] and references therein), we aim to estimate the regression function f ρ (x) = ydρ(y x), Y which minimizes the least square error ( ) ε(f ) = f(x) y dρ Z Let L ρ X (X) denote the Hilbert space of square integrable functions on X, with norm denoted by ρ With the assumption that f ρ L ρ X (X), for every f L ρ X (X), ε(f ) ε(f ρ ) = f f ρ ρ Let z = (x i,y i ) m i= be a finite random sample of size m, m N, drawn independently according to ρ Our task is to construct functions f z, using the finite sample z, such that lim f z f ρ m ρ = 0 with high probability In the binary classification problem, where Y ={, }, the optimal binary classifier is the Bayes classifier: sgn f ρ (x) = { ifp(y= x) P(y= x), ifp(y= x) < P(y = x)

19 34 Constr Approx (00) 3: Let f : X R be a real-valued function, which induces the binary classifier sgn(f ) : X {, }, defined by sgn(f )(x) = sgn(f (x)) The error of sgn(f ) with respect to the Bayes classifier is ρ X (X f ) = sgn(f ) sgn(f ρ ) 4 ρ, where X f ={x X : sgn(f (x)) sgn(f ρ (x))} Our task is to construct binary classifiers sgn(f z ), using the finite sample z, such that lim m sgn(f z ) sgn(f ρ ) 4 ρ = 0 with high probability The regularized least square algorithm (see [] and references therein) attempts to solve both problems of least square regression and binary classification by the following minimization procedure Algorithm Let K : X X R denote a continuous, positive definite kernel Let H K be the corresponding RKHS, with norm K For each λ>0, let { m ( ( f z,λ = arg min f x i ) } y i) + λ f f H K m K i= (A) For the least square regression problem, f z,λ is taken to be the empirical version of f ρ, which approximates f ρ in the ρ norm (B) For the binary classification problem, sgn(f z,λ ) is taken to be the empirical version of sgn(f ρ ), which approximates sgn(f ρ ) in the ρ norm For the purpose of our argument, we state here two typical results obtained recently regarding the complexity of the above algorithm (we refer to [5, 3, ] and follow-up works for detail) Assume that μ = ρ X, f ρ Im(L r K ) for 0 <r, and that y M almost surely Then according to [3], for any 0 <δ<, for an appropriate choice of λ, ( f z,λ f ρ ρ log 4 ) (κm) r/(+r) L r δ K f ρ for / <r, and ( f z,λ f ρ ρ log 4 ) (8M + 8 r κ r L r δ /(+r) ρ K f ρ ( ) r/(+r) (3) m ) ( ) r/ ρ (4) m for 0 <r /, with probability at least δ, where the quantity L r K f ρ ρ plays a crucial role This in turn will lead to convergence in the binary classification problem, via the following []: ρ X (X fz,λ ) = sgn(f z,λ ) sgn(f ρ ) q 4 ρ 4(B q+ q + ) f z,λ f ρ ρ, (5)

20 Constr Approx (00) 3: where 0 q, provided that the Tsybakov s noise condition [9] is satisfied: ρ X ({ x : f ρ (x) L }) B q L q, 0 L It is the purpose of this section to discuss the applicability of these results in certain settings We note that f 0 = f ρ is the regression function resulting from the conditional probability P ( y = x S+ n ) ( =, P y = x S n ) + = 0, P ( y = x S n ) ( = 0, P y = x S n ) = Recall that it is the Bayes classifier corresponding to the binary classification problem where the two classes lie on the upper and lower hemispheres, respectively, with decision boundary x n = 0 Theorem 5 shows that complexity results such as the ones just mentioned cannot be applied in this case at all (for the Gaussian kernel) or with very small r (for the general continuous case) However, this is a noise-free binary classification problem, which can be solved successfully precisely by the regularized least square algorithm above, with the linear kernel K(x,t) = x,t, such that ρ X (X fz,λ ) = sgn(f z,λ ) sgn(f ρ ) ( n 4 ρ O m log ) (6) δ Remark 7 We stress that expression (6) does not follow from the preceding discussion, but is a result obtained by the author in [8], where a more general result is proved In general, it has been observed that the classification problem is significantly easier than regression if the function η(x) = P(y= x) = + f ρ(x) is far away from (or equivalently, f ρ is far away from 0), with high probability (we refer to the survey [] and the extensive references therein) We remark that convergence rates of the form O(m r/(r+) ) as in (3) are optimal for the regression problem (we refer to [8] and the references therein, where r represents a different concept) It would be interesting to establish the corresponding optimality of (3) and (4), which we will leave for a future work We would like now to discuss a different complexity result from [3], where the multi-kernel least square regularization algorithm is proposed, which computes f z,λ = arg min min σ f H K σ { m m ( ( f x i ) } y i) + λ f K σ Here K σ (x, t) = exp( n (x i t i ) i= ), with σ = (σ,,σ n ) = (0, ) n σ i i=

21 36 Constr Approx (00) 3: Example of [3] states that if X R n is a domain with Lipschitz boundary and f ρ H s (X), the Hilbert-Sobolev space of order s>0, then for an appropriate choice of λ: E ( f z,λ f ρ ) ( L = O (log m) / m s n ɛ ) 4(4s n ɛ) (7) ρ X if n <s n +, 0 <ɛ<s n IfX is bounded and ρ X is the Lebesgue measure on X, then: E ( f z,λ f ρ ) ( L = O (log m) / m s ) (4s+n) (8) ρ X if 0 <s If f H s (X), with n <s n +, then f is necessarily continuous and thus bound (7) cannot be applied for our present function f ρ, which is discontinuous As for bound (8), it could be applied for our case only when 0 <s min{, n }We note that for such s, the best possible rate offered by (8)isO((log m) / m α ), where α = min{, 8+n } This is also the rate for the binary classification error ρ X(X fz,λ ), using (5), when q = One must exercise caution when reading the O notation here however, since the constants within may contain the dimension n There are two interesting issues here whose exploration we will leave for a future work The first is to find 0 <s min{, n } such that f ρ H s (S n ) The second is to investigate connections between this multi-kernel approach and our Theorem 3 By this theorem, using kernels with flexible variances allows us to capture functions with multi-level rates of decay It is also possible that it is not necessary to use a continuous set as above, but a discrete version of it 43 Derivatives in the Gaussian RKHS Let X be a closed subset of R n with nonempty interior The method of proofs for Theorems, 3, and 4 can be applied to show that t α K x (t) H K for any multinomial index α, and hence p(t)k x (t) H K for any polynomial p : X R This implies that D α K x H K for any α, where D α denotes the partial derivative with multiindex α This order of reasoning is the reverse of that in the proof of Corollary of [7], where they use D α K x H K for all α to imply that p(t)k x H K for any p Note that [7] and references therein provide general treatments of the derivatives in RKHS by analytic kernels Furthermore, if n =, then for f(t)= t d K x (t) = t d exp( (x t) ),wehave σ f K = e x σ σ d d ( ) x k (k + ) (k + d) < (9) k! σ Using this formula and the binomial theorem, one can obtain the K-norm of the derivative of any order for K x Then-dimensional case can be worked out similarly Acknowledgements The present paper developed from a part of the author s PhD thesis with Steve Smale, whose advice and support is gratefully acknowledged He also wishes to thank the referees for their many valuable comments and suggestions This work was partially supported by the Vienna Fund for Science and Technology (Wiener Wissenschafts-, Forschungs- und Technologiefonds) and the German Research Foundation (Deutsche Forschungsgemeinschaft, grant DFG:GZ WI 55/-)

22 Constr Approx (00) 3: Appendix A: The Funk-Hecke Formula For our computations on the sphere S n, we need the following fundamental result from the theory of spherical harmonics (see [9]) For consistency of notation with the spherical harmonics literature, we use ds n for the surface measure of S n Normalizing by dividing by S n, we get the uniform probability measure on S n Theorem (Funk-Hecke Formula) Let K :[, ] R be a continuous function giving rise to an inner product kernel K(x,t) = K( x,t ) on S n S n Let Y k Y k (n) for k 0 Then for any x S n, K ( x,t ) Y k (t) ds n (t) = λ k Y k (x), S n where λ k = S n K(t)P k (n; t) ( n 3 t ) dt, where P k (n; t) denotes the Legendre polynomial of degree k in dimension n Appendix B: Fourier Expansion on S n In this section, we will prove Theorem 0 in Sect 35, that is, we will compute the Fourier expansion of { if x S n + f 0 (x) = (x n 0), if x S n (x n < 0), on S n, in terms of the spherical harmonics The case n = is just the usual Fourier expansion on the circle Lemma The Fourier series of f 0 on L (S ) is given by: f 0 (θ) = 4 π sin(k + )θ k + = 4 π(k + ) sin(k + )θ π Consider n 3 Let Y k (n) denote the space of spherical harmonics of order k on the sphere S n It turns out that working directly with an explicit orthonormal basis of spherical harmonics on S n is highly complicated analytically We shall find that it is much better for us to utilize an inductive construction of orthonormal bases of Y k (n) Let us first describe one such construction, as given in [9] Let e,,e n be the canonical basis of R n Letx S n We write ( ) x(n ) x = te n + t, 0

23 38 Constr Approx (00) 3: where t [, ] and x (n ) S n, (x (n ), 0) T span{e,,e n } We then have [9] ds n ( te n + t ) ( x (n ) = t ) n 3 dt ds n ( ) x (n ), or more compactly, ds n = ( n 3 t ) dt ds n (0) Recall that the normed associated Legendre functions are defined by: A m k (n; t)= n (k + n )(k m)!(k + n + m 3)! k!ɣ( n ) Pk m (n; t), (n; t) is the associated Legendre function of degree k, order m, and dimen- where Pk m sion n Proposition (Orthonormal basis of Y k (n) [9]) Suppose that for m = 0,,,k, the orthonormal bases Y m,j, j =,,N(n,m), of Y m (n ) are given Then the functions { Yk,m,j (n; x) = A m k (n; t)y m,j (n ; x (n ) ) : j =,,,N(n,m) } form an orthonormal basis for Y k (n), starting with the Fourier basis for n = (the circle S ) We will now expand f 0 in terms of the orthonormal spherical harmonics of Y k,m,j (n; x) of Proposition Lemma 3 f0,y k,m,j (n; ) = A(m, k, n) Y m,j (n ; x (n ) )ds n, S n where A(m, k, n) = Proof By definition, we have 0 A m k (n; t)( n 3 0 t ) dt A m k (n; t)( n 3 t ) dt { ( f 0 (x) = f 0 ten + t ) if 0 t, x (n ) = if t<0,

24 Constr Approx (00) 3: independent of the first n coordinates Thus by (0), we have f 0 (x)y k,m,j (n; x)ds n S n as desired = t= ( f 0 ten + t ) x (n ) S n A m k (n; t)y m,j (n ; x (n ) ) ( n 3 t ) dt ds n = A(m, k, n) Y m,j (n ; x (n ) )ds n, S n The expression obtained in the above lemma is considerably simplified with the aid of the Funk-Hecke formula (Theorem ), which states that for α S n, Y k Y k (n), and f C([, ]), f ( α, x ) Y k (x) ds n (x) = C k Y k (α), S n where C k = S n P k (n; t)f(t) ( n 3 t ) dt In particular, for f, this implies Y k (x) ds n (x) = C k Y k (α) S n The right-hand side is thus independent of α This implies that they are both identically zero for k For k = 0, we have Y 0, and hence S n C 0 = Y 0 (x) ds n (x) = S n S n Corollary The only nonzero Fourier coefficients of f 0 are: f0,y k,0, (n; ) = A(0,k,n) S n Proof From the above Funk-Hecke formula, we have { 0 if m, Y m,j (n ; x (n ) )ds n = S n S n if m = 0 Thus S n f 0 Y k,m,j (n; x)ds n is only nonzero when m = 0 Since N(n,0) =, j takes only the value This gives us the desired result

25 330 Constr Approx (00) 3: It thus remains for us to evaluate A(0,k,n) We will do this using the Gegenbauer polynomials and the aid of [6] Definition (Gegenbauer polynomials [9]) Let 0 r<, t [, ] For each integer k 0, λ>0, the Gegenbauer polynomial Ck λ (t) is defined to be the coefficient in the expansion ( rt + r ) λ = r k Ck λ (t) Ck λ (t) is an odd function when k is odd, and even when k is even Lemma 4 [9] For n 3, let λ = n Then C λ k (k + n 3)! (t) = P k (n; t) k!(n 3)! In particular, for n = 3, we have C λ k (t) = P k(3; t) From this and the formula for A 0 k (n; t) in the definition of associated Legendre functions, we obtain, for λ = n, A 0 k (n; t)= (n 3)! Ɣ( n ) Corollary Let D k and λ be as above Then (k + n )k! n (k + n 3)! Cλ k (t) = D kc λ k (t) { 0 if k is even, A(0,k,n)= D k 0 Cλ k (t)( t ) λ dt if k is odd Proof For λ = n n 3,wehave = λ We then have A(0,k,n)= 0 = D k [ A 0 k (n; t)( n 3 0 t ) dt 0 A 0 k (n; t)( n 3 t ) dt Ck λ (t)( t ) λ 0 dt Ck λ (t)( t ) ] λ dt When k is even, Ck λ (t) is even, hence the two integrals are equal and cancel out When k is odd, Ck λ (t) is odd, thus two integrals have the same magnitude but opposite signs, giving us the desired expression Let us now evaluate 0 Cλ k (t)( t ) λ dt We have the following two results at our disposal:

26 Constr Approx (00) 3: Lemma 5 [6] Let Pν μ (t) denote the classical associated Legendre function, with μ and ν complex numbers (for natural numbers m, k, Pk m(t) = P k m (3; t)) Let λ>0 Then Ck λ (t) = λ Ɣ(λ + k)ɣ(λ + ) P λ (t) ( t ) Ɣ(λ)Ɣ(k + ) λ+k 4 λ Lemma 6 [6] Let μ C be such that Re(μ) < Then Let E k = λ 0 ( t ) μ Pν μ (t) dt = μ π Ɣ( μ+ν ν μ+3 )Ɣ( ) Ɣ(λ+k)Ɣ(λ+ ) Ɣ(λ)Ɣ(k+) 0 Then we have C λ k (t)( t ) λ dt = E k 0 P λ ( t ) λ λ+k 4 dt We now apply the above lemma with μ = λ< and ν = λ + k to obtain Ck λ (t)( t ) λ λ π dt = E k Ɣ( k )Ɣ(λ + + k ) 0 Substituting λ = n and odd values of k, we get: Corollary 3 For k 0, 0 C n k+ (t)( n 3 t ) dt = π(n+ k )!Ɣ( n ) (n 3)!(k + )!Ɣ( k)ɣ(k + n+ ) Corollary 4 Let n 3 be fixed The Fourier expansion of f 0 in the spherical harmonics on S n is: f 0 (x) = S n A(0, k +,n)y k+,0, (n; x), where A(0, k +,n)is given by: π(4k + n)(k + n )! A(0, k +,n)= n (k + )! for k N {0} Example (n = 3) We have f 0 L (S ) = S =4π and A(0, k +, 3) = 4k + 3 ( ) k (k )!! 4 6 (k + ) = Ɣ( 4k + 3 k)ɣ(k + n+ ) ( ) k (k )!! k (k + )!

27 33 Constr Approx (00) 3: We check that S A(0, k +, 3) = π 4k + 3 k+ The sum of the last series above follows from the identity [6]: 4k + 3 k+ (0 θ π ), where we set θ = 0 ( ) (k )!! = 4π = f 0 (k + )! L (S ) ( ) (k )!! P k+(cos θ)= 4θ (k + )! π, Remark 8 (n = ) For n =, noting that S 0 = {±} =, we obtain S 0 A(0, k +, ) = 4( )k, π(k + ) which differ from the Fourier coefficients in the expansion of Lemma only by the factor ( ) k B0 General Case n We now move to the general case n Let us consider two separate cases: n is even and n is odd Lemma 7 Let n = m, m We have A(0, k +, m) = 8 π(k + ) k + m k + m (k + )!! (k + m )!! (k)!! (k + m )!! Proof We have ( ) ( m + Ɣ = Ɣ m + ) = (m )!! π m Thus (k )!! (m + ) (m + k )Ɣ( m+ ) = (m )!! m (k + ) (k + m ) (m )!! π On the other hand, = m π(k + ) (k + m ) (4k + m)(k + m )! m (k + )! = 8(k + m)(k + )(k + 3) (k + m ) m Combining these two expressions, we get A(0, k +, m)

28 Constr Approx (00) 3: Lemma 8 Let n = m +, m Then A (0, k +, m + ) = Proof We have A(0, k +, m + ) = (4k + m + )(k + m )! m (k + )! 4k + m + (k + ) k + m (k + )!! (k)!! (k + m )!! (k + m)!! [(k )!!] [(k + )(k + 4) (k + m)m!] [(k )!!] = (4k + m + )(k + )(k + 3) (k + m ) [(k + m)!!] (4k + m + )(k + )(k + 3) (k + m ) [(k + )!!] = (k + ) (k + )(k + 4) (k + m) [(k)!!(k + )!!] = 4k + m + (k + 3)(k + 5) (k + m ) [(k + )!!] (k + ) k + m (k + 4)(k + 6) (k + m) [(k)!!(k + )!!], from which the desired expression follows To find upper and lower bounds for the coefficients A(0, k +,n), we will apply Lemmas 0 and in Appendix C, which are consequences of Stirling s formula Corollary 5 Let n = m, m Then 4A π(m ) / (k + ) <A(0, k +, 8A m) < π(m ) / (k + ) 3/, where A = e 5/36 ( 7 8e )/ and A = A Proof We have A(0, k +, m) = From Lemma,wehave 8 π(k + ) ( ) 4eπ / e /8 7 (k + m ) / ( π (k + m )!! < (k + m )!! <e/ ) / (k + ) / < e / ( π k + m k + m (k + )!! (k + m )!! (k)!! (k + m )!! ) / (k + m ) /, (k + )!! (k)!! <e /8 ( 7 4eπ ) / (k + ) /

29 334 Constr Approx (00) 3: Combining these gives us ( ) 8e / e 5/36 (k + ) / (k + m )!! (k + )!! < 7 (k + m ) / (k + m )!! (k)!! Now <e 5/36 ( 7 8e ) / (k + ) / (k + m ) / (m ) / (k + ) / (k + )/ (k + m ) / (m ) / and < m m k + m k + m We finally have 4A π(m ) / (k + ) <A(0, k +, 8A m) < π(m ) / (k + ) 3/, where A = e 5/36 ( 7 8e )/ and A = A as required Corollary 6 Let n = m +, m Then e /3 π(m) / (k + ) <A (0, k +, m + )< Proof We have A (0, k +, m + ) = 4k + m + (k + ) k + m ( ) 7 / 4e /9 e π(m) / (k + ) 3/ (k + )!! (k)!! (k + m )!! (k + m)!! From Lemma, ( ) e / ( ) e / (k + m )!! / < <e /6 π (k + m) / (k + m)!! π (k + m) /, ( ) / ( ) e / (k + ) / (k + )!! 7 / < <e /8 (k + ) / π (k)!! 4eπ Combining these gives us e /3 π Now (k + ) / (k + m) (k + m )!! < / (k + m)!! (k + )!! (k)!! (k + )/ (k + )/ (m) / (k + m) / (m) / and < m + m ( ) 7 / <e /9 (k + ) / e π (k + m) / 4k + m + k + m < 4

30 Constr Approx (00) 3: We finally have e /3 π(m) / (k + ) <A (0, k +, m + ) ( ) 7 / <e /9 4 e π(m) / (k + ) 3/, as required Combining both cases of n odd and even, we have: Corollary 7 For all n N, n, e /3 π(n ) / (k + ) <A (0, k +,n)< ( ) 7 / 4e /9 e π(n ) / (k + ) 3/, 3 S n < S n A π 3/ (k + ) (0, k +,n)< 5 S n π (k + ) 3/ Proof The first inequality follows by combining both cases of n even and odd For the second one, we have that S n A (0, k +,n)= S n A (0, k +,n) Sn S n Now ( ) n / ( ) e /6 < Sn n / eπ S n <e/ π Combine this with the first inequality and simplify, and we obtain the second inequality Appendix C: The Gamma Function and Stirling s Formula Consider Stirling s series for a>0: Ɣ(a + ) = ( ) a a [ πa + e a + 88a 39 ] 5840a 3 + Thus for all a>0 we can write Ɣ(a + ) = e A(a) ( ) a a πa, e where 0 <A(a)< a

31 336 Constr Approx (00) 3: Lemma 9 For all n N, ( ) n+ n n!! = Stir(n), e where Stir(n) = e A(n/) πe if n is even, and Stir(n) = e A(n/) e if n is odd Proof (a) By Stirling s formula, we have n!=ɣ(n + ) = e A(n) ( ) n n πn e It thus follows that (n)!! = n n!=e A(n) ( ) n n πn = e A(n) ( ) n n+/ πe, e e from which we have the first identity (b) From the formula Ɣ(n + ) = (n )!! π n,wehave (n + )!! = (n + ) π n Ɣ(n + /) = n+ π Ɣ(n + 3/) = n+ e A(n+/) ( n + / π(n+ /) π e = e A(n+/) n+ ( ) n + / n+ e e = e A(n+/) ( ) n + n+ e, e ) n+/ which is the second identity Lemma 0 Let n N, n be fixed Then ( ) n / ( e B(n) Sn n eπ S n <eb(n) π ) / for some B(n) satisfying 6(n ) <B(n)< 6n Consequently for all n N, n, Proof We have ( ) n / ( ) e /6 < Sn n / eπ S n <e/ π S n S n = π n Ɣ( n ) Ɣ( n ) π n = π Ɣ( n ) Ɣ( n )

32 Constr Approx (00) 3: From Stirling s formula, we have Ɣ(a + ) = e A(a) πa( a e )a for all a>0, where 0 <A(a)</a Thus for all a>0, Ɣ(a) = e A(a) π e ( ) a a / e Hence we have Ɣ(n/) Ɣ( n ) = e A(n/) π e ( e n ) n = e B(n) n A( e ) π e ( n e ) n / ( n e ) / ( + ) n, n where B(n) = A(n/) A( n ), easily seen to satisfy 6(n ) <B(n)< 6n We have that for all n, ( / + ) n <e / n Combining this with the last expression, we obtain the desired result Lemma For all n N, ( ) 4eπ / ( ) e B(n) (n)!! π / 7 (n + ) / (n + )!! <eb(n) (n + ) /, for some function B(n) satisfying n+6 <B(n)< n Consequently for all n N, Similarly, ( ) 4eπ / e /8 (n)!! 7 (n + ) / (n + )!! <e/ ( π ) / (n + ) / ( ) e / ( ) e / (n )!! / < <e /6 π (n) / (n)!! π (n) / Proof We proceed as in the above lemma From Stirling s formula, we have ( ) n+ n (n)!! = Stir(n) = e A(n) ( ) n+ n πe, e e ( ) n+ n + (n + )!! = Stir(n + ) = e A(n+/) ( ) n+ n + e e e Thus it follows that ( ) (n)!! π e / ( (n + )!! = eb(n) ) n+, n + n +

33 338 Constr Approx (00) 3: where B(n) = A(n) A(n + /) is easily seen to satisfy have that for all n, ( ) 8 / = 7 ( ) 3/ ( ) n+ < 3 n + e / n+6 <B(n)< n We Combining this with the last expression, we obtain the desired result The other inequality is proven similarly References Aronszajn, N: Theory of reproducing kernels Trans Am Math Soc 68, (950) Boucheron, S, Bousquet, O, Lugosi, G: Theory of classification: a survey of recent advances ESAIM: Prob Stat 9, (005) 3 Carmeli, C, De Vito, E, Toigo, A: Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem Anal Appl 4, (006) 4 Cucker, F, Smale, S: On the mathematical foundations of learning Bull Am Math Soc 39(), 49 (00) 5 De Vito, E, Caponnetto, A, Rosasco, L: Model selection for regularized least-squares algorithm in learning theory Found Comput Math 5(), (005) 6 Gradshteyn, IS, Ryzhik, IM: Table of Integrals, Series, Products, 6th edn Academic Press, San Diego (000) 7 Mercer, J: Functions of positive and negative type, and their connection with the theory of integral equations Philos Trans R Soc Lond, Ser A 09, (909) 8 Minh, HQ: The regularized least square algorithm and the problem of learning halfspaces Submitted preprint (007) 9 Müller, C: Analysis of Spherical Symmetries in Euclidean Spaces Applied Mathematical Sciences, vol 9 Springer, New York (997) 0 Niyogi, P, Girosi, F: Generalization bounds for function approximation from scattered noisy data Adv Comput Math 0, 5 80 (999) Poggio, T, Smale, S: The mathematics of learning: dealing with data Not Am Math Soc 50(5), (003) Schölkopf, B, Smola, AJ: Learning with Kernels MIT Press, Cambridge (00) 3 Smale, S, Zhou, DX: Learning theory estimates via integral operators and their approximations Constr Approx 6(), 53 7 (007) 4 Steinwart, I: On the influence of the kernel on the consistency of support vector machines J Mach Learn Res, (00) 5 Steinwart, I, Hush, D, Scovel, C: An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels IEEE Trans Inf Theory 5, (006) 6 Sun, HW: Mercer theorem for RKHS on noncompact sets J Complex, (005) 7 Sun, HW, Zhou, DX: Reproducing kernel Hilbert spaces associated with analytic translationinvariant Mercer kernels J Fourier Anal Appl 4, 89 0 (008) 8 Temlyakov, VN: Approximation in learning theory Constr Approx 7, (008) 9 Tsybakov, AB: Optimal aggregation of classifiers in statistical learning Ann Stat 3(), (004) 0 Vapnik, V: Statistical Learning Theory Wiley, New York (998) Wahba, G: Spline Models for Observational Data CBMS-NSF Regional Conference Series in Applied Mathematics Society for Industrial and Applied Mathematics, Philadelphia (990) Yao, Y: Early stopping in gradient descent learning Constr Approx 6(), (007) 3 Ying, Y, Zhou, DX: Learnability of Gaussians with flexible variances J Mach Learn Res 8, (007)

Mercer s Theorem, Feature Maps, and Smoothing

Mercer s Theorem, Feature Maps, and Smoothing Mercer s Theorem, Feature Maps, and Smoothing Ha Quang Minh, Partha Niyogi, and Yuan Yao Department of Computer Science, University of Chicago 00 East 58th St, Chicago, IL 60637, USA Department of Mathematics,

More information

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Strictly Positive Definite Functions on a Real Inner Product Space

Strictly Positive Definite Functions on a Real Inner Product Space Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Strictly positive definite functions on a real inner product space

Strictly positive definite functions on a real inner product space Advances in Computational Mathematics 20: 263 271, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Strictly positive definite functions on a real inner product space Allan Pinkus Department

More information

Derivative reproducing properties for kernel methods in learning theory

Derivative reproducing properties for kernel methods in learning theory Journal of Computational and Applied Mathematics 220 (2008) 456 463 www.elsevier.com/locate/cam Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics,

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Online gradient descent learning algorithm

Online gradient descent learning algorithm Online gradient descent learning algorithm Yiming Ying and Massimiliano Pontil Department of Computer Science, University College London Gower Street, London, WCE 6BT, England, UK {y.ying, m.pontil}@cs.ucl.ac.uk

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

LEGENDRE POLYNOMIALS AND APPLICATIONS. We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates.

LEGENDRE POLYNOMIALS AND APPLICATIONS. We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates. LEGENDRE POLYNOMIALS AND APPLICATIONS We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates.. Legendre equation: series solutions The Legendre equation is

More information

A Concise Course on Stochastic Partial Differential Equations

A Concise Course on Stochastic Partial Differential Equations A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original

More information

Recall that any inner product space V has an associated norm defined by

Recall that any inner product space V has an associated norm defined by Hilbert Spaces Recall that any inner product space V has an associated norm defined by v = v v. Thus an inner product space can be viewed as a special kind of normed vector space. In particular every inner

More information

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis Supplementary Notes for W. Rudin: Principles of Mathematical Analysis SIGURDUR HELGASON In 8.00B it is customary to cover Chapters 7 in Rudin s book. Experience shows that this requires careful planning

More information

Are Loss Functions All the Same?

Are Loss Functions All the Same? Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH

More information

Deviation Measures and Normals of Convex Bodies

Deviation Measures and Normals of Convex Bodies Beiträge zur Algebra und Geometrie Contributions to Algebra Geometry Volume 45 (2004), No. 1, 155-167. Deviation Measures Normals of Convex Bodies Dedicated to Professor August Florian on the occasion

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32 Functional Analysis Martin Brokate Contents 1 Normed Spaces 2 2 Hilbert Spaces 2 3 The Principle of Uniform Boundedness 32 4 Extension, Reflexivity, Separation 37 5 Compact subsets of C and L p 46 6 Weak

More information

Compression on the digital unit sphere

Compression on the digital unit sphere 16th Conference on Applied Mathematics, Univ. of Central Oklahoma, Electronic Journal of Differential Equations, Conf. 07, 001, pp. 1 4. ISSN: 107-6691. URL: http://ejde.math.swt.edu or http://ejde.math.unt.edu

More information

TUM 2016 Class 1 Statistical learning theory

TUM 2016 Class 1 Statistical learning theory TUM 2016 Class 1 Statistical learning theory Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Machine learning applications Texts Images Data: (x 1, y 1 ),..., (x n, y n ) Note: x i s huge dimensional! All

More information

Derivatives of Harmonic Bergman and Bloch Functions on the Ball

Derivatives of Harmonic Bergman and Bloch Functions on the Ball Journal of Mathematical Analysis and Applications 26, 1 123 (21) doi:1.16/jmaa.2.7438, available online at http://www.idealibrary.com on Derivatives of Harmonic ergman and loch Functions on the all oo

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation

More information

Learning Theory of Randomized Kaczmarz Algorithm

Learning Theory of Randomized Kaczmarz Algorithm Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,

More information

7: FOURIER SERIES STEVEN HEILMAN

7: FOURIER SERIES STEVEN HEILMAN 7: FOURIER SERIES STEVE HEILMA Contents 1. Review 1 2. Introduction 1 3. Periodic Functions 2 4. Inner Products on Periodic Functions 3 5. Trigonometric Polynomials 5 6. Periodic Convolutions 7 7. Fourier

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Convex Geometry. Carsten Schütt

Convex Geometry. Carsten Schütt Convex Geometry Carsten Schütt November 25, 2006 2 Contents 0.1 Convex sets... 4 0.2 Separation.... 9 0.3 Extreme points..... 15 0.4 Blaschke selection principle... 18 0.5 Polytopes and polyhedra.... 23

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

Series Solutions. 8.1 Taylor Polynomials

Series Solutions. 8.1 Taylor Polynomials 8 Series Solutions 8.1 Taylor Polynomials Polynomial functions, as we have seen, are well behaved. They are continuous everywhere, and have continuous derivatives of all orders everywhere. It also turns

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

Gaussian Random Fields

Gaussian Random Fields Gaussian Random Fields Mini-Course by Prof. Voijkan Jaksic Vincent Larochelle, Alexandre Tomberg May 9, 009 Review Defnition.. Let, F, P ) be a probability space. Random variables {X,..., X n } are called

More information

Analysis-3 lecture schemes

Analysis-3 lecture schemes Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space

More information

1 Orthonormal sets in Hilbert space

1 Orthonormal sets in Hilbert space Math 857 Fall 15 1 Orthonormal sets in Hilbert space Let S H. We denote by [S] the span of S, i.e., the set of all linear combinations of elements from S. A set {u α : α A} is called orthonormal, if u

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 68 November 999 C.B.C.L

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Richard F. Bass Krzysztof Burdzy University of Washington

Richard F. Bass Krzysztof Burdzy University of Washington ON DOMAIN MONOTONICITY OF THE NEUMANN HEAT KERNEL Richard F. Bass Krzysztof Burdzy University of Washington Abstract. Some examples are given of convex domains for which domain monotonicity of the Neumann

More information

Analysis Qualifying Exam

Analysis Qualifying Exam Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,

More information

be the set of complex valued 2π-periodic functions f on R such that

be the set of complex valued 2π-periodic functions f on R such that . Fourier series. Definition.. Given a real number P, we say a complex valued function f on R is P -periodic if f(x + P ) f(x) for all x R. We let be the set of complex valued -periodic functions f on

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

A RECONSTRUCTION FORMULA FOR BAND LIMITED FUNCTIONS IN L 2 (R d )

A RECONSTRUCTION FORMULA FOR BAND LIMITED FUNCTIONS IN L 2 (R d ) PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 127, Number 12, Pages 3593 3600 S 0002-9939(99)04938-2 Article electronically published on May 6, 1999 A RECONSTRUCTION FORMULA FOR AND LIMITED FUNCTIONS

More information

ON EARLY STOPPING IN GRADIENT DESCENT LEARNING. 1. Introduction

ON EARLY STOPPING IN GRADIENT DESCENT LEARNING. 1. Introduction ON EARLY STOPPING IN GRADIENT DESCENT LEARNING YUAN YAO, LORENZO ROSASCO, AND ANDREA CAPONNETTO Abstract. In this paper, we study a family of gradient descent algorithms to approximate the regression function

More information

2014:05 Incremental Greedy Algorithm and its Applications in Numerical Integration. V. Temlyakov

2014:05 Incremental Greedy Algorithm and its Applications in Numerical Integration. V. Temlyakov INTERDISCIPLINARY MATHEMATICS INSTITUTE 2014:05 Incremental Greedy Algorithm and its Applications in Numerical Integration V. Temlyakov IMI PREPRINT SERIES COLLEGE OF ARTS AND SCIENCES UNIVERSITY OF SOUTH

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Lecture Notes on Metric Spaces

Lecture Notes on Metric Spaces Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],

More information

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

1.5 Approximate Identities

1.5 Approximate Identities 38 1 The Fourier Transform on L 1 (R) which are dense subspaces of L p (R). On these domains, P : D P L p (R) and M : D M L p (R). Show, however, that P and M are unbounded even when restricted to these

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx.

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx. Math 321 Final Examination April 1995 Notation used in this exam: N 1 π (1) S N (f,x) = f(t)e int dt e inx. 2π n= N π (2) C(X, R) is the space of bounded real-valued functions on the metric space X, equipped

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall. .1 Limits of Sequences. CHAPTER.1.0. a) True. If converges, then there is an M > 0 such that M. Choose by Archimedes an N N such that N > M/ε. Then n N implies /n M/n M/N < ε. b) False. = n does not converge,

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Folland: Real Analysis, Chapter 8 Sébastien Picard

Folland: Real Analysis, Chapter 8 Sébastien Picard Folland: Real Analysis, Chapter 8 Sébastien Picard Problem 8.3 Let η(t) = e /t for t >, η(t) = for t. a. For k N and t >, η (k) (t) = P k (/t)e /t where P k is a polynomial of degree 2k. b. η (k) () exists

More information

arxiv: v1 [math.ca] 31 Dec 2018

arxiv: v1 [math.ca] 31 Dec 2018 arxiv:181.1173v1 [math.ca] 31 Dec 18 Some trigonometric integrals and the Fourier transform of a spherically symmetric exponential function Hideshi YAMANE Department of Mathematical Sciences, Kwansei Gakuin

More information

4th Preparation Sheet - Solutions

4th Preparation Sheet - Solutions Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space

More information

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e., Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional

More information

2. Function spaces and approximation

2. Function spaces and approximation 2.1 2. Function spaces and approximation 2.1. The space of test functions. Notation and prerequisites are collected in Appendix A. Let Ω be an open subset of R n. The space C0 (Ω), consisting of the C

More information

General Power Series

General Power Series General Power Series James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 29, 2018 Outline Power Series Consequences With all these preliminaries

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Measure and Integration: Solutions of CW2

Measure and Integration: Solutions of CW2 Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost

More information

Ventral Visual Stream and Deep Networks

Ventral Visual Stream and Deep Networks Ma191b Winter 2017 Geometry of Neuroscience References for this lecture: Tomaso A. Poggio and Fabio Anselmi, Visual Cortex and Deep Networks, MIT Press, 2016 F. Cucker, S. Smale, On the mathematical foundations

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Problem Set 5: Solutions Math 201A: Fall 2016

Problem Set 5: Solutions Math 201A: Fall 2016 Problem Set 5: s Math 21A: Fall 216 Problem 1. Define f : [1, ) [1, ) by f(x) = x + 1/x. Show that f(x) f(y) < x y for all x, y [1, ) with x y, but f has no fixed point. Why doesn t this example contradict

More information

Convex Feasibility Problems

Convex Feasibility Problems Laureate Prof. Jonathan Borwein with Matthew Tam http://carma.newcastle.edu.au/drmethods/paseky.html Spring School on Variational Analysis VI Paseky nad Jizerou, April 19 25, 2015 Last Revised: May 6,

More information

Decomposition of Riesz frames and wavelets into a finite union of linearly independent sets

Decomposition of Riesz frames and wavelets into a finite union of linearly independent sets Decomposition of Riesz frames and wavelets into a finite union of linearly independent sets Ole Christensen, Alexander M. Lindner Abstract We characterize Riesz frames and prove that every Riesz frame

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

1 Functional Analysis

1 Functional Analysis 1 Functional Analysis 1 1.1 Banach spaces Remark 1.1. In classical mechanics, the state of some physical system is characterized as a point x in phase space (generalized position and momentum coordinates).

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Regularity of the density for the stochastic heat equation

Regularity of the density for the stochastic heat equation Regularity of the density for the stochastic heat equation Carl Mueller 1 Department of Mathematics University of Rochester Rochester, NY 15627 USA email: cmlr@math.rochester.edu David Nualart 2 Department

More information

Functional Analysis F3/F4/NVP (2005) Homework assignment 3

Functional Analysis F3/F4/NVP (2005) Homework assignment 3 Functional Analysis F3/F4/NVP (005 Homework assignment 3 All students should solve the following problems: 1. Section 4.8: Problem 8.. Section 4.9: Problem 4. 3. Let T : l l be the operator defined by

More information

Daniel M. Oberlin Department of Mathematics, Florida State University. January 2005

Daniel M. Oberlin Department of Mathematics, Florida State University. January 2005 PACKING SPHERES AND FRACTAL STRICHARTZ ESTIMATES IN R d FOR d 3 Daniel M. Oberlin Department of Mathematics, Florida State University January 005 Fix a dimension d and for x R d and r > 0, let Sx, r) stand

More information

Math 117: Continuity of Functions

Math 117: Continuity of Functions Math 117: Continuity of Functions John Douglas Moore November 21, 2008 We finally get to the topic of ɛ δ proofs, which in some sense is the goal of the course. It may appear somewhat laborious to use

More information

CLOSED RANGE POSITIVE OPERATORS ON BANACH SPACES

CLOSED RANGE POSITIVE OPERATORS ON BANACH SPACES Acta Math. Hungar., 142 (2) (2014), 494 501 DOI: 10.1007/s10474-013-0380-2 First published online December 11, 2013 CLOSED RANGE POSITIVE OPERATORS ON BANACH SPACES ZS. TARCSAY Department of Applied Analysis,

More information

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1. Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information