Mercer s Theorem, Feature Maps, and Smoothing

Size: px
Start display at page:

Download "Mercer s Theorem, Feature Maps, and Smoothing"

Transcription

1 Mercer s Theorem, Feature Maps, and Smoothing Ha Quang Minh, Partha Niyogi, and Yuan Yao Department of Computer Science, University of Chicago 00 East 58th St, Chicago, IL 60637, USA Department of Mathematics, University of California, Berkeley 970 Evans Hall, Berkeley, CA 9470, USA minh,niyogi@cs.uchicago.edu, yao@math.berkeley.edu Abstract. We study Mercer s theorem and feature maps for several positive definite kernels that are widely used in practice. The smoothing properties of these kernels will also be explored. Introduction Kernel-based methods have become increasingly popular and important in machine learning. The central idea behind the so-called kernel trick is that a closed form Mercer kernel allows one to efficiently solve a variety of non-linear optimization problems that arise in regression, classification, inverse problems, and the like. It is well known in the machine learning community that kernels are associated with feature maps and a kernel based procedure may be interpreted as mapping the data from the original input space into a potentially higher dimensional feature space where linear methods may then be used. One finds many accounts of this idea where the input space X is mapped by a feature map Φ : X H (where H is a Hilbert space) so that for any two points x, y X, we have K(x, y) = φ(x), φ(y) H. Yet, while much has been written about kernels and many different kinds of kernels have been discussed in the literature, much less has been explicitly written about their associated feature maps. In general, we do not have a clear and concrete understanding of what exactly these feature maps are. Our goal in this paper is to take steps toward a better understanding of feature maps by explicitly computing them for a number of popular kernels for a variety of domains. By doing so, we hope to clarify the precise nature of feature maps in very concrete terms so that machine learning researchers may have a better feel for them. Following are the main points and new results of our paper:. As we will illustrate, feature maps and feature spaces are not unique. For a given domain X and a fixed kernel K on X X, there exist in fact infinitely many feature maps associated with K. Although these maps are essentially equivalent, in a sense to be made precise in Section 4.3, there are subtleties

2 that we wish to emphasize. For a given kernel K, the feature maps of K induced by Mercer s theorem depend fundamentally on the domain X, as will be seen in the examples of Section. Moreover, feature maps do not necessarily arise from Mercer s theorem, examples of which will be given in Section 4.. The importance of Mercer s theorem, however, goes far beyond the feature maps that it induces: the eigenvalues and eigenfunctions associated with K play a central role in obtaining error estimates in learning theory, see for example [8], [4]. For this reason, the determination of the spectrum of K, which is highly nontrivial in general, is crucially important in its own right. Theorems and 3 in Section give the complete spectrum of the polynomial and Gaussian kernels on S n, including sharp rates of decay of their eigenvalues. Theorem 4 gives the eigenfunctions and a recursive formula for the computation of eigenvalues of the polynomial kernel on the hypercube {, } n.. One domain that we particularly focus on is the unit sphere S n in R n, for several reasons. First, it is a special example of a compact Riemannian manifold and the problem of learning on manifolds has attracted attention recently, see for example [], [3]. Second, its symmetric and homogeneous nature allows us to obtain complete and explicit results in many cases. We believe that S n together with kernels defined on it is a fruitful source of examples and counterexamples for theoretical analysis of kernel-based learning. We will point out that intuitions based on low dimensions such as n = in general do not carry over to higher dimensions - Theorem 5 in Section 3 gives an important example along this line. We will also consider the unit ball B n, the hypercube {, } n, and R n itself. 3. We will also try to understand the smoothness property of kernels on S n. In particular, we will show that the polynomial and Gaussian kernels define Hilbert spaces of functions whose norms may be interpreted as smoothness functionals, similar to those of splines on S n. We will obtain precise and sharp results on this question in the paper. This is the content of Section 5. The smoothness implications allow us to better understand the applicability of such kernels in solving smoothing problems. Notation: For X R n and µ a Borel measure on X, L µ(x) = {f : X C : X f(x) dµ(x) < }. We will also use L (X) for L µ(x) and dx for dµ(x) if µ is the Lebesgue measure on X. The surface area of the unit sphere S n is denoted by S n = n Γ ( n ). Mercer s Theorem One of the fundamental mathematical results underlying learning theory with kernels is Mercer s theorem. Let X be a closed subset of R n, n N, µ a Borel measure on X, and K : X X R a symmetric function satisfying: for any

3 finite set of points {x i } N i= in X and real numbers {a i} N i= N a i a j K(x i, x j ) 0 () i,j= K is said to be a positive definite kernel on X. Assume further that K(x, t) dµ(x)dµ(t) < () X X Consider the induced integral operator L K : L µ(x) L µ(x) defined by L K f(x) = K(x, t)f(t)dµ(t) (3) X This is a self-adjoint, positive, compact operator with a countable system of nonnegative eigenvalues {λ k } k= satisfying k= λ k <. L K is said to be Hilbert- Schmidt and the corresponding L µ(x)-normalized eigenfunctions {φ k } k= form an orthonormal basis of L µ(x). We recall that a Borel measure µ on X is said to be strictly positive if the measure of every nonempty open subset in X is positive, an example being the Lebesgue measure in R n. Theorem (Mercer). Let X R n be closed, µ a strictly positive Borel measure on X, K a continuous function on X X satisfying () and (). Then K(x, t) = λ k φ k (x)φ k (t) (4) k= where the series converges absolutely for each pair (x, t) X X and uniformly on each compact subset of X. Mercer s theorem still holds if X is a finite set {x i }, such as X = {, } n, K is pointwise-defined positive definite and µ(x i ) > 0 for each i.. Examples on the Sphere S n We will give explicit examples of the eigenvalues and eigenfunctions in Mercer s theorem on the unit sphere S n for the polynomial and Gaussian kernels. We need the concept of spherical harmonics, a modern and authoritative account of which is [6]. Some of the material below was first reported in the kernel learning literature in [9], where the eigenvalues for the polynomial kernels with n = 3, were computed. In this section, we will carry out computations for a general n N, n. [ Definition (Spherical Harmonics). Let n = x ] x denote n the Laplacian operator on R n. A homogeneous polynomial of degree k in R n whose Laplacian vanishes is called a homogeneous harmonic of order k. Let Y k (n)

4 denote the subspace of all homogeneous harmonics of order k on the unit sphere S n in R n. The functions in Y k (n) are called spherical harmonics of order k. We will denote by {Y k,j (n; x)} N(n,k) j= any fixed orthonormal basis for Y k (n) where N(n, k) = dim Y k (n) = (k+n )(k+n 3)! k!(n )!, k 0. Theorem. Let X = S n, n N, n. Let µ be the uniform probability distribution on S n. For K(x, t) = exp( x t σ ), σ > 0 λ k = e /σ σ n I k+n/ ( σ )Γ (n ) (5) for all k N {0}, where I denotes the modified Bessel function of the first kind, defined below. Each λ k occurs with multiplicity N(n, k) with the corresponding eigenfunctions being spherical harmonics of order k on S n. The λ k s are decreasing if σ ( n )/. Furthermore ( e A σ )k n (k + n ) k+ < λ k < ( e A σ )k n (k + n ) k+ (6) for A, A depending on σ and n given below. Remark. A = e /σ / (e) n Γ ( n ), A = e /σ +/σ 4 (e) n Γ ( n ). For ν, z C, I ν (z) = j=0 j!γ (ν+j+) ( z ) ν+j. Theorem 3. Let X = S n, n N, n, and d N. Let µ be the uniform probability distribution on S n. For K(x, t) = ( + x, t ) d, the nonzero eigenvalues of L K : L µ(x) L µ(x) are λ k = d+n d! Γ (d + n )Γ ( n ) (7) (d k)! Γ (d + k + n ) for 0 k d. Each λ k occurs with multiplicity N(n, k), with the corresponding eigenfunctions being spherical harmonics of order k on S n. Furthermore, the λ k s form a decreasing sequence and B (k + d + n ) d+n 3 B < λ k < (k + d + n ) d+n 3 (8) where 0 k d, for B, B depending on d, n given below. Remark. B = e d (e) d+n n Γ (d+ d! )Γ ( n ), B e /6 d d+ = e d (e) d+n d! Γ (d+ n )Γ ( n ).. Example on the Hypercube {, } n We will now give an example with the hypercube {, } n. Let M k = {α = (α i ) n i=, α i {0, }, α = α + + α n = k}, then the set {x α } α Mk,0 k n, consists of multilinear mononomials {, x, x x,..., x... x n }.

5 Theorem 4. Let X = {, } n. Let d N, d n be fixed. Let K(x, t) = ( + x, t ) d on X X. Let µ be the uniform distribution on X, then the nonzero eigenvalues λ d k s of L K : L µ(x) L µ(x) satisfy λ d+ k = kλ d k + λ d k + (n k)λ d k+ (9) λ d 0 λ d... λ d d = λ d d = d! (0) and λ d k = 0 for k > d. The corresponding L µ(x)-normalized eigenfunctions for each λ k are {x α } α Mk. Example (d = ). The recurrence relation (9) is nonlinear in two indexes and hence a closed analytic expression for λ d k is hard to find for large d. It is straightforward, however, to write a computer program for computing λ d k. For d = λ 0 = n + λ = λ = with corresponding eigenfunctions, {x,..., x n }, and {x x, x x 3,..., x n x n }, respectively..3 Example on the Unit Ball B n Except for the homogeneous polynomial kernel K(x, t) = x, t d, the computation of the spectrum of L K on the unit ball B n is much more difficult analytically than that on S n. For K(x, t) = ( + x, t ) d and small values of d, it is still possible, however, to obtain explicit answers. Example (X = B n, K(x, t) = ( + x, t ) ). Let µ be the uniform measure on B n. The eigenspace spanned by {x,..., x n } corresponds to the eigenvalue λ = (n+). The eigenspace spanned by { x x Y,j (n; x )}N(n,) j= corresponds to the eigenvalue λ = (n+)(n+4). The eigenvalues that correspond to span{, x } are λ 0, = (n + )(n + 5) + D (n + )(n + 4) where D = (n + ) (n + 5) 6(n + 4). λ 0, = (n + )(n + 5) D (n + )(n + 4) 3 Unboundedness of Normalized Eigenfunctions It is known that the L µ-normalized eigenfunctions {φ k } are generally unbounded, that is in general sup k N φ k = This was first pointed out by Smale, with the first counterexample given in [4]. This phenomenon is very common, however, as the following result shows.

6 Theorem 5. Let X = S n, n 3. Let µ be the Lebesgue measure on S n. Let f : [, ] R be a continuous function, giving rise to a Mercer kernel K(x, t) = f( x, t ) on S n S n. If infinitely many of the eigenvalues of L K : L µ(s n ) L µ(s n ) are nonzero, then for the set of corresponding L µ-normalized eigenfunctions {φ k } k= sup φ k = () k N Remark 3. This is in sharp contrast with the case n =, where we will show that sup φ k k with the supremum attained on the functions { cos kθ, sin kθ } k N. Theorem 5 applies in particular to the Gaussian kernel K(x, t) = exp( x t σ ). Hence care needs to be taken in applying analysis that requires C K = sup k φ k <, for example [5]. 4 Feature Maps 4. Examples of Feature Maps via Mercer s Theorem A natural feature map that arises immediately from Mercer s theorem is Φ µ : X l Φ µ (x) = ( λ k φ k (x)) k= () where if only N < of the eigenvalues are strictly positive, then Φ µ : X R N. This is the map that is often covered in the machine learning literature. Example 3 (n = d =,X = S n ). Theorem 3 gives the eigenvalues (3,, ), with eigenfunctions (, x, x, xx, x x ) = (, cos θ, sin θ sin θ, cos θ, ), where x = cos θ, x = sin θ, giving rise to the feature map 3 Φ µ (x) = (, x, x, x x, x x ) Example 4 (n = d =, X = {, } ). Theorem 4 gives Φ µ (x) = ( 3, x, x, x x ) Observation (i) As our notation suggests, Φ µ depends on the particular measure µ that is in the definition of the operator L K and thus is not unique. Each measure µ gives rise to a different system of eigenvalues and eigenfunctions (λ k, φ k ) k= and therefore a different Φ µ. (ii) In Theorem 3 and, the multiplicity of the λ k s means that for each choice of orthonormal bases of the space Y k (n) of spherical harmonics of order k, there is a different feature map. Thus are infinitely many feature maps arising from the uniform probability distribution on S n alone.

7 4. Examples of Feature Maps not via Mercer s Theorem Feature maps do not necessarily arise from Mercer s theorem. Consider any set X and any pointwise-defined, positive definite kernel K on X X. For each x X, let K x : X R be defined by K x (t) = K(x, t) and H K = span{k x : x X} (3) be the Reproducing Kernel Hilbert Space (RKHS) induced by K, with the inner product K x, K t K = K(x, t), see []. The following feature map is then immediate: Φ K : X H K Φ K (x) = K x (4) In this section we discuss, via examples, two other methods for obtaining feature maps. Let X R n be any subset. Consider the Gaussian kernel K(x, t) = exp( x t σ ) on X X, which admits the following expansion K(x, t) = exp( x t σ ) = e x σ e t σ k=0 (/σ ) k k! α =k C k αx α t α (5) where Cα k k! = (α, which implies the feature map: Φ )!...(α n)! g : X l where Φ g (x) = e x σ (/σ ( ) k Cα k k! x α ) α =k,k=0 Remark 4. The standard polynomial feature maps in machine learning, see for example ([7], page 8), are obtained exactly in the same way. Consider next a special class of kernels that is widely used in practice, called convolution kernels. We recall that for a function f L (R n ), its Fourier transform is defined to be ˆf(ξ) = f(x)e i ξ,x dx R n By Fourier transform computation, it may be shown that if µ : R n R is even, nonnegative, such that µ, µ L (R n ), then the kernel K : R n R n R defined by K(x, t) = µ(u)e i x t,u du (6) R n is continuous, symmetric, positive definite. Further more, for any x, t R n K(x, t) = () n The following then is a feature map of K on X X R n µ(x u) µ(t u)du (7) Φ conv : X L (R n ) (8)

8 For the Gaussian kernel e x t σ Φ conv (x)(u) = () n µ(x u) = ( σ )n e σ u R n 4 e i x t,u du and (Φ conv (x))(u) = ( σ ) n e x u σ One can similarly obtain feature maps for the inverse multiquadric, exponential, or B-spline kernels. The identity e x t σ = ( 4 σ ) n e R x u n σ e t u σ du can also be verified directly, as done in [0], where implications of the Gaussian feature map Φ conv (x) above are also discussed. 4.3 Equivalence of Feature Maps It is known ([7], page 39) that, given a set X and a pointwise-defined, symmetric, positive definite kernel K on X X, all feature maps from X into Hilbert spaces are essentially equivalent. In this section, we will make this statement precise. Let H be a Hilbert space and Φ : X H be such that Φ x, Φ t H = K(x, t) for all x, t X, where Φ x = Φ(x). The evaluation functional L x : H R given by L x v = v, Φ x H, where x varies over X, defines an inclusion map L Φ : H R X (L Φ v)(x) = v, Φ x H where R X denotes the vector space of pointwise-defined, real-valued functions on X. Observe that as a vector space of functions, H K R X. Proposition. Let H Φ = span{φ x : x X}, a subspace of H. The restriction of L Φ on H Φ is an isometric isomorphism between H Φ and H K. Proof. First, L Φ is bijective from H Φ to the image L Φ (H Φ ), since ker L Φ = H Φ. Under the map L Φ, for each x, t X, (L Φ Φ x )(t) = Φ x, Φ t = K x (t), thus K x L Φ Φ x as functions on X. This implies that span{k x : x X} is isomorphic to span{φ x : x X} as vector spaces. The isometric isomorphism of H Φ = span{φ x : x X} and H K = span{k x : x X} then follows from Φ x, Φ t H = K(x, t) = K x, K t K. This completes the proof. Remark 5. Each choice of Φ is thus equivalent to a factorization of the map Φ K : x K x H K, that is the following diagram is commutative Φ K x X K x H K Φ L Φ Φ x H Φ (9) We will call Φ K : x K x H K the canonical feature map associated with K.

9 5 Smoothing Properties of Kernels on the Sphere Having discussed feature maps, we will in this section analyze the smoothing properties of the polynomial and Gaussian kernels and compare them with those of spline kernels on the sphere S n. In the spline smoothing problem on S as described in [], one solves the minimization problem m m (f(x i ) y i ) + λ i= 0 (f (m) (t)) dt (0) for x i [0, ] and f W m, where J m (f) = 0 (f (m) (t)) dt is the square norm of the RKHS W 0 m = {f : f, f,..., f (m ) absolutely continuous, f (m) L [0, ], f (j) (0) = f (j) (), j = 0,,..., m } The space W m is then the RKHS defined by ( ) W m = {} Wm 0 = {f : f K = 4 f(t)dt (f (m) (t)) dt < } induced by a kernel K. One particular feature of spline smoothing, on S, S, or R n, is that in general the RKHS W m does not have a closed form kernel K that is efficiently computable. This is in contrast with the RKHS that are used in kernel machine learning, all of which correspond to closed-form kernels that can be evaluated efficiently. It is not clear, however, whether the norms in these RKHS correspond to smoothness functionals. In this section, we will show that for the polynomial and Gaussian kernels on S n, they do. 5. The Iterated Laplacian and Splines on the Sphere S n Splines on S n for n = and n = 3, as treated by Wahba [], [], can be generalized to any n, n N, via the iterated Laplacian (also called the Laplace-Beltrami operator) on S n. The RKHS corresponding to W m in (0) is a subspace of L (S n ) described by H K = {f : f K = ( ) S n f(x)dx + f(x) m f(x)dx < } S n S n The Laplacian on S n has eigenvalues λ k = k(k + n ), k 0, with corresponding eigenfunctions {Y k,j (n; x)} N(n,k) j=, which form an orthonormal basis in the space Y k (n) of spherical harmonics of order k. For f L (S n ), if we use a the expansion f = 0 + N(n,k) S n k= j= a k,j Y k,j (n; x) then the space H K takes the form H K = {f L (S n ) : f K = a N(n,k) 0 S n + [k(k + n )] m a k,j < } k= j=

10 and thus the corresponding kernel is K(x, t) = + k= N(n,k) [k(k + n )] m Y k,j (n; x)y k,j (n; t) () which is well-defined iff m > n. Let P k(n; t) denote the Legendre polynomial of degree k in dimension n, then () takes the form K(x, t) = + S n k= j= N(n, k) [k(k + n )] m P k(n; x, t ) () which does not have a closed form in general - for the case n = 3, see []. Remark 6. Clearly m can be replaced by any real number s > n. 5. Smoothing Properties of Polynomial and Gaussian Kernels Let n denote the gradient operator on S n (also called the first-order Beltrami operator, see [6] page 79 for a definition). Let Y k Y k (n), k 0, then n Y k L (S n ) = S n n Y k (x) ds n (x) = k(k + n ) (3) This shows that spherical harmonics of higher-order are less smooth. This is particularly straightforward in the case n = with the Fourier basis functions {, cos kθ, sin kθ} k N - as k increases, the functions oscillate more rapidly. It follows that any regularization term f K in problems such as (0), where K possesses a decreasing spectrum λ k - k corresponds to the order of the spherical harmonics - will have a smoothing effect. That is, the higher-order spherical harmonics, which are less smooth, will be penalized more. The decreasing spectrum property is true for the spline kernels, the polynomial kernel ( + x, t ) d, and the Gaussian kernel for σ ( n )/, as we showed in Theorems and 3. Hence all these kernels possess smoothing properties on S n. Furthermore, Theorem shows that for the Gaussian kernel, for all k ( e A σ )k n (k + n ) k+ < λ k < ( e A σ )k n (k + n ) k+ and Theorem 3 shows that for the polynomial kernel ( + x, t ) d B (k + d + n ) d+n 3 B < λ k < (k + d + n ) d+n 3 for 0 k d. Compare these with the eigenvalues of the spline kernels λ k = [k(k + n )] m

11 for k, we see that the Gaussian kernel has the sharpest smoothing property, as can be seen from the exponential decay of the eigenvalues. For K(x, t) = ( + x, t ) d, if d > m n + 3, then K has sharper smoothing property than a spline kernel of order m. Moreover, all spherical harmonics of order greater than d are filtered out, hence choosing K amounts to choosing a hypothesis space of bandlimited functions on S n. A Proofs of Results The proofs for results on S n all make use of properties of spherical harmonics on S n, which can be found in [6]. We will prove Theorem (Theorem 3 is similar) and Theorem 5. A. Proof of Theorem Let f : [, ] R be a continuous function. Let Y k Y k (n) for k 0. Then the Funk-Hecke formula ([6], page 30) states that for any x S n : S n f( x, t )Y k (t)ds n (t) = λ k Y k (x) (4) where λ k = S n f(t)p k (n; t)( t ) n 3 dt (5) and P k (n; t) denotes the Legendre polynomial of degree k in dimension n. Since the spherical harmonics {{Y k,j (n; x)} N(n,k) j= } k=0 form an orthonormal basis for L (S n ), an immediate consequence of the Funk-Hecke formula is that if K on S n S n is defined by K(x, t) = f( x, t ), and µ is the Lebesgue measure on S n, then the eigenvalues of L K : L µ(s n ) L µ(s n ) are given precisely by (5), with the corresponding orthonormal eigenfunctions of λ k being {Y k,j (n; x)} N(n,k) j=. The multiplicity of λ k is therefore N(n, k) = dim(y k (n)). On S n e x t σ = e σ e x,t σ. Thus λ k = e σ S n e t σ P k (n; t)( t ) n 3 dt = e σ S n Γ ( n )(σ ) n/ I k+n/ ( σ ) by Lemma = e /σ σ n I k+n/ ( σ )Γ ( n ) Sn Normalizing by setting S n = gives the required expression for λ k as in (5). Lemma. Let f(t) = e rt, then ( ) f(t)p k (n; t)( t n/ ) n 3 n dt = Γ ( ) I k+n/ (r) (6) r

12 Proof. We apply the following which follows from ([3], page 79, formula 9) e rt ( t ) ν dt = ( ) ν / Γ (ν)i ν /(r) (7) r and Rodrigues rule ([6], page 3), which states that for f C k ([, ]) f(t)p k (n; t)( t ) n 3 dt = Rk (n) where R k (n) = Γ (k+ n ). For f(t) = ert, we have ert P k (n; t)( t ) n 3 dt = R k (n)r k ert ( t n 3 k+ ) = R k (n)r k ( ) k+n/ r Γ (k + n )I k+n/ (r) Substituting in the values of R k (n) gives the desired answer. Lemma. The sequence {λ k } k=0 is decreasing if σ ( /. n) Γ ( n k ) Proof. We will first prove that I k+n/ ( σ ) = ( σ ) k+n/ j=0 = ( σ ) k+n/ j=0 λ k λ k+ > (k + n/)σ. We have ( σ )j j!γ (j+k+n/+) ( σ )j j!(j+k+n/)γ (j+k+n/) < ( σ ) k+n/ ( σ )j k+n/ j=0 j!γ (j+k+n/) = σ (k+n/) I k+n/ ( σ ) f (k) (t)( t n 3 k+ ) dt (8) λ which implies k λ k+ > (k + n/)σ. The inequality λ k λ k+ thus is satisfied if σ (k + n/) for all k 0. It suffices to require that it holds for k = 0, that is σ n/ σ ( /. n) Proof (of (6)). By definition of I ν (z), we have for z > 0 I ν (z) < ( z )ν ( z )j Γ (ν+) j=0 j! = ( z )ν /4 Γ (ν+) ez Then for ν = k + n and z = σ : I k+ n ( σ ) < Γ (k+ n )( σ )k+n e /σ4. Consider Stirling s series for a > 0 Γ (a + ) = ( a ) [ a a + e a + 88a 39 ] 5840a (9) Thus for all a > 0 we can write Γ (a + ) = e A(a) e ( a e. Hence for all k a Γ (k + n ) = ea(k,n) e( k + n n k+ ) e where 0 < A(k, n) < (k+ n ). Then I k+ n ( σ ) < (e) k+ n k+ n (k+n ) The other direction is obtained similarly. ) a+ where 0 < A(a) < = e A(k,n) e( k + n e ( σ )k+n e /σ4 implying (6). n k+ )

13 A. Proof of Theorem 5 We will first show an upper bound for Y k, where Y k is any L (S n )- normalized function in Y k (n), then exhibit a one-dimensional subspace of functions in Y k (n) that attain this upper bound. Observe that Y k belongs to an orthonormal basis {Y k,j (n; x)} N(n,k) j= of Y k (n). The following highlights the crucial difference between the case n = and n 3. Lemma 3. For any n, k 0, for all j N, j N(n, k) Y k,j (n;.) N(n, k) S n (30) In particular, for n = and all k 0: Y k,j (n;.). Proof. The Addition Theorem for spherical harmonics ([6], page 8) states that for any x, α S n N(n,k) j= Y k,j (n; x)y k,j (n; α) = which implies that for any x S n Y k,j (n; x) N(n, k) S n P k(n; x, α ) N(n, k) S n P N(n, k) k(n; x, x ) = S n P N(n, k) k(n; ) = S n giving us the first result. For n =, we have N(n, k) = for k = 0, N(n, k) = for k, and S =, giving us the second result. Definition. Consider the group O(n) of all orthogonal transformations in R n, that is O(n) = {A R n n : A T A = AA T = I}. A function f : S n R is said to be invariant under a transformation A O(n) if f A (x) = f(ax) = f(x) for all x S n. Let α S n. The isotropy group J n,α is defined by J n,α = {A O(n) : Aα = α}. Lemma 4. Assume that Y k Y k (n) is invariant with respect to J n,α and satisfies S n Y k (x) ds n (x) =. Then Y k is unique up to a multiplicative constant C α,n,k with C α,n,k = and Y k = Y k (α) = N(n, k) S n (3) Proof. If Y k is invariant with respect to J n,α, then by ([6], Lemma 3, page 7), it must satisfy Y k (x) = Y k (α)p k (n; x, α ), showing that the subspace of Y k (n) invariant with respect to J n,α is one-dimensional. The Addition Theorem implies that for any α S n

14 By assumption, we then have S n P k (n; x, α ) ds n (x) = Sn N(n,k) = S n Y k (x) ds n (x) = Y k (α) S n P k (n; x, α ) ds n (x) = Y k (α) S n N(n,k), giving us Y k(α) = Y k (x) = N(n,k) S n N(n,k) S n P k(n; x, α ) by the property P k (n; t) for t. Thus Y k =. Thus we for all x Sn N(n,k) S n N(n,k) S n as desired. Proposition (Orthonormal basis of Y k (n) [6]). Let n 3. Let e,..., e n be the canonical basis of R n. Let x S n. We write x = te n + ( ) t x(n ) 0 where t [, ] and x (n ) S n, (x (n ), 0) T span{e,..., e n }. Suppose that for m = 0,,..., k, the orthonormal bases Y m,j, j =,..., N(n, m) of Y m (n ) are given, then an orthonormal basis for Y k (n) is Y k,m,j (n; x) = A m k (n; t)y m,j (n ; x (n ) ) : j =,..., N(n, m) (3) starting with the Fourier basis for n =, where A m n (k + n )(k m)!(k + n + m 3)! k (n; t) = k!γ ( n ) Pk m (n; t) (33) Proposition 3. Let n N, n 3. Let µ be the Lebesgue measure on S n. For each k 0, any orthonormal basis of the space Y k (n) of spherical harmonics of order k contains an L µ-normalized spherical harmonic Y k such that Y k = N(n, k) S n = (k + n )(k + n 3)! k!(n )! S n (34) as k, where S n = n Γ ( n ) is the surface area of Sn. Proof. Let x = te n + ( ) t x(n ), t. For each k 0, the 0 orthonormal basis for Y k (n) in Proposition contains the function Y k,0, (n; x) = A 0 k(t)y 0, (n ; x (n ) ) = A 0 k(t) S n (35) Y k,0, (n; x) = (k+n )(k+n 3)! Γ ( n ) n k! S n P k (n; t) = N(n,k) S n P k(n; t) Then Y k,0, (n; x) is invariant with respect to J n,α where α = (0,..., 0, ). Thus Y k = Y k,0, is the desired function for the current orthonormal basis. For any orthonormal basis of Y k (n), the result follows by Lemma 4 and rotational symmetry on the sphere.

15 Proof (of Theorem 5). By the Funk-Hecke formula, all spherical harmonics of order k are eigenfunctions corresponding to the eigenvalue λ k as given by (5). If infinitely many of the λ k s are nonzero, then the corresponding set of L (S n )- orthonormal eigenfunctions {φ k }, being an orthonormal basis of L (S n ), contains a spherical harmonic Y k satisfying (34), for infinitely many k. It follows from Proposition 3 then that sup k φ k =. References. N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, vol. 68, pages , M. Belkin, P. Niyogi, and V. Sindwani. Manifold Regularization: a Geometric Framework for Learning from Examples. University of Chicago Computer Science Technical Report TR , 004, accepted for publication. 3. M. Belkin and P. Niyogi. Semi-supervised Learning on Riemannian Manifolds. Machine Learning, Special Issue on Clustering, vol. 56, pages 09-39, E. De Vito, A. Caponnetto, and L. Rosasco. Model Selection for Regularized Least-Squares Algorithm in Learning Theory. Foundations of Computational Mathematics, vol. 5, no., pages 59-85, J. Lafferty and G. Lebanon. Diffusion Kernels on Statistical Manifolds. Journal of Machine Learning Research, vol. 6, pages 9-63, C. Müller. Analysis of Spherical Symmetries in Euclidean Spaces. Applied Mathematical Sciences 9, Springer, New York, B. Schölkopf and A.J. Smola. Learning with Kernels. The MIT Press, Cambridge, Massachusetts, S. Smale and D.X. Zhou. Learning Theory Estimates via Integral Operators and Their Approximations, 005, to appear. 9. A.J. Smola, Z.L. Ovari and R.C. Williamson. Regularization with Dot-Product Kernels. Advances in Information Processing Systems, I. Steinwart, D. Hush, and C. Scovel. An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels Kernels. Los Alamos National Laboratory Technical Report LA-UR , December G. Wahba. Spline Interpolation and Smoothing on the Sphere. SIAM Journal of Scientific and Statistical Computing, vol., pages 5-6, 98.. G. Wahba. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59, Society for Industrial and Applied Mathematics, Philadelphia, G.N. Watson. A Treatise on the Theory of Bessel Functions, nd edition, Cambridge University Press, Cambridge, England, D.X. Zhou. The Covering Number in Learning Theory. Journal of Complexity, vol. 8, pages , 00.

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

ISSN , Volume 32, Number 2

ISSN , Volume 32, Number 2 ISSN 076-476, Volume 3, Number This article was published in the above mentioned Springer issue The material, including all portions thereof, is protected by copyright; all rights are held exclusively

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Recall that any inner product space V has an associated norm defined by

Recall that any inner product space V has an associated norm defined by Hilbert Spaces Recall that any inner product space V has an associated norm defined by v = v v. Thus an inner product space can be viewed as a special kind of normed vector space. In particular every inner

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

Eigenvalues and eigenfunctions of the Laplacian. Andrew Hassell

Eigenvalues and eigenfunctions of the Laplacian. Andrew Hassell Eigenvalues and eigenfunctions of the Laplacian Andrew Hassell 1 2 The setting In this talk I will consider the Laplace operator,, on various geometric spaces M. Here, M will be either a bounded Euclidean

More information

Learning with Consistency between Inductive Functions and Kernels

Learning with Consistency between Inductive Functions and Kernels Learning with Consistency between Inductive Functions and Kernels Haixuan Yang Irwin King Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Shatin, N.T., Hong

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014 What is an RKHS? Dino Sejdinovic, Arthur Gretton March 11, 2014 1 Outline Normed and inner product spaces Cauchy sequences and completeness Banach and Hilbert spaces Linearity, continuity and boundedness

More information

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

MAT 578 FUNCTIONAL ANALYSIS EXERCISES MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Strictly Positive Definite Functions on a Real Inner Product Space

Strictly Positive Definite Functions on a Real Inner Product Space Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Derivative reproducing properties for kernel methods in learning theory

Derivative reproducing properties for kernel methods in learning theory Journal of Computational and Applied Mathematics 220 (2008) 456 463 www.elsevier.com/locate/cam Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics,

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Vectors in Function Spaces

Vectors in Function Spaces Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also

More information

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms

More information

Representation theory and quantum mechanics tutorial Spin and the hydrogen atom

Representation theory and quantum mechanics tutorial Spin and the hydrogen atom Representation theory and quantum mechanics tutorial Spin and the hydrogen atom Justin Campbell August 3, 2017 1 Representations of SU 2 and SO 3 (R) 1.1 The following observation is long overdue. Proposition

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32 Functional Analysis Martin Brokate Contents 1 Normed Spaces 2 2 Hilbert Spaces 2 3 The Principle of Uniform Boundedness 32 4 Extension, Reflexivity, Separation 37 5 Compact subsets of C and L p 46 6 Weak

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation

More information

Strictly positive definite functions on a real inner product space

Strictly positive definite functions on a real inner product space Advances in Computational Mathematics 20: 263 271, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Strictly positive definite functions on a real inner product space Allan Pinkus Department

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

THE PROBLEMS FOR THE SECOND TEST FOR BRIEF SOLUTIONS

THE PROBLEMS FOR THE SECOND TEST FOR BRIEF SOLUTIONS THE PROBLEMS FOR THE SECOND TEST FOR 18.102 BRIEF SOLUTIONS RICHARD MELROSE Question.1 Show that a subset of a separable Hilbert space is compact if and only if it is closed and bounded and has the property

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

SOLUTIONS TO SELECTED PROBLEMS FROM ASSIGNMENTS 3, 4

SOLUTIONS TO SELECTED PROBLEMS FROM ASSIGNMENTS 3, 4 SOLUTIONS TO SELECTED POBLEMS FOM ASSIGNMENTS 3, 4 Problem 5 from Assignment 3 Statement. Let be an n-dimensional bounded domain with smooth boundary. Show that the eigenvalues of the Laplacian on with

More information

On an Eigenvalue Problem Involving Legendre Functions by Nicholas J. Rose North Carolina State University

On an Eigenvalue Problem Involving Legendre Functions by Nicholas J. Rose North Carolina State University On an Eigenvalue Problem Involving Legendre Functions by Nicholas J. Rose North Carolina State University njrose@math.ncsu.edu 1. INTRODUCTION. The classical eigenvalue problem for the Legendre Polynomials

More information

Chapter 8 Integral Operators

Chapter 8 Integral Operators Chapter 8 Integral Operators In our development of metrics, norms, inner products, and operator theory in Chapters 1 7 we only tangentially considered topics that involved the use of Lebesgue measure,

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

Spectral theory for compact operators on Banach spaces

Spectral theory for compact operators on Banach spaces 68 Chapter 9 Spectral theory for compact operators on Banach spaces Recall that a subset S of a metric space X is precompact if its closure is compact, or equivalently every sequence contains a Cauchy

More information

Definition and basic properties of heat kernels I, An introduction

Definition and basic properties of heat kernels I, An introduction Definition and basic properties of heat kernels I, An introduction Zhiqin Lu, Department of Mathematics, UC Irvine, Irvine CA 92697 April 23, 2010 In this lecture, we will answer the following questions:

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE

COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE MICHAEL LAUZON AND SERGEI TREIL Abstract. In this paper we find a necessary and sufficient condition for two closed subspaces, X and Y, of a Hilbert

More information

MATH 220: INNER PRODUCT SPACES, SYMMETRIC OPERATORS, ORTHOGONALITY

MATH 220: INNER PRODUCT SPACES, SYMMETRIC OPERATORS, ORTHOGONALITY MATH 22: INNER PRODUCT SPACES, SYMMETRIC OPERATORS, ORTHOGONALITY When discussing separation of variables, we noted that at the last step we need to express the inhomogeneous initial or boundary data as

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia

More information

I teach myself... Hilbert spaces

I teach myself... Hilbert spaces I teach myself... Hilbert spaces by F.J.Sayas, for MATH 806 November 4, 2015 This document will be growing with the semester. Every in red is for you to justify. Even if we start with the basic definition

More information

MATH 5640: Fourier Series

MATH 5640: Fourier Series MATH 564: Fourier Series Hung Phan, UMass Lowell September, 8 Power Series A power series in the variable x is a series of the form a + a x + a x + = where the coefficients a, a,... are real or complex

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Topics in Fourier analysis - Lecture 2.

Topics in Fourier analysis - Lecture 2. Topics in Fourier analysis - Lecture 2. Akos Magyar 1 Infinite Fourier series. In this section we develop the basic theory of Fourier series of periodic functions of one variable, but only to the extent

More information

1 Continuity Classes C m (Ω)

1 Continuity Classes C m (Ω) 0.1 Norms 0.1 Norms A norm on a linear space X is a function : X R with the properties: Positive Definite: x 0 x X (nonnegative) x = 0 x = 0 (strictly positive) λx = λ x x X, λ C(homogeneous) x + y x +

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Kernels MIT Course Notes

Kernels MIT Course Notes Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we

More information

Solutions: Problem Set 4 Math 201B, Winter 2007

Solutions: Problem Set 4 Math 201B, Winter 2007 Solutions: Problem Set 4 Math 2B, Winter 27 Problem. (a Define f : by { x /2 if < x

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Hilbert Spaces. Contents

Hilbert Spaces. Contents Hilbert Spaces Contents 1 Introducing Hilbert Spaces 1 1.1 Basic definitions........................... 1 1.2 Results about norms and inner products.............. 3 1.3 Banach and Hilbert spaces......................

More information

Chapter 4 Euclid Space

Chapter 4 Euclid Space Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,

More information

Preconditioning techniques in frame theory and probabilistic frames

Preconditioning techniques in frame theory and probabilistic frames Preconditioning techniques in frame theory and probabilistic frames Department of Mathematics & Norbert Wiener Center University of Maryland, College Park AMS Short Course on Finite Frame Theory: A Complete

More information

Kernel B Splines and Interpolation

Kernel B Splines and Interpolation Kernel B Splines and Interpolation M. Bozzini, L. Lenarduzzi and R. Schaback February 6, 5 Abstract This paper applies divided differences to conditionally positive definite kernels in order to generate

More information

Deviation Measures and Normals of Convex Bodies

Deviation Measures and Normals of Convex Bodies Beiträge zur Algebra und Geometrie Contributions to Algebra Geometry Volume 45 (2004), No. 1, 155-167. Deviation Measures Normals of Convex Bodies Dedicated to Professor August Florian on the occasion

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

LEGENDRE POLYNOMIALS AND APPLICATIONS. We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates.

LEGENDRE POLYNOMIALS AND APPLICATIONS. We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates. LEGENDRE POLYNOMIALS AND APPLICATIONS We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates.. Legendre equation: series solutions The Legendre equation is

More information

Real Analysis Qualifying Exam May 14th 2016

Real Analysis Qualifying Exam May 14th 2016 Real Analysis Qualifying Exam May 4th 26 Solve 8 out of 2 problems. () Prove the Banach contraction principle: Let T be a mapping from a complete metric space X into itself such that d(tx,ty) apple qd(x,

More information

Notes on Special Functions

Notes on Special Functions Spring 25 1 Notes on Special Functions Francis J. Narcowich Department of Mathematics Texas A&M University College Station, TX 77843-3368 Introduction These notes are for our classes on special functions.

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

1 Functional Analysis

1 Functional Analysis 1 Functional Analysis 1 1.1 Banach spaces Remark 1.1. In classical mechanics, the state of some physical system is characterized as a point x in phase space (generalized position and momentum coordinates).

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Functional Analysis I

Functional Analysis I Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker

More information

Basic Properties of Metric and Normed Spaces

Basic Properties of Metric and Normed Spaces Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion

More information

Bessel Functions Michael Taylor. Lecture Notes for Math 524

Bessel Functions Michael Taylor. Lecture Notes for Math 524 Bessel Functions Michael Taylor Lecture Notes for Math 54 Contents 1. Introduction. Conversion to first order systems 3. The Bessel functions J ν 4. The Bessel functions Y ν 5. Relations between J ν and

More information

x 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7

x 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7 Linear Algebra and its Applications-Lab 1 1) Use Gaussian elimination to solve the following systems x 1 + x 2 2x 3 + 4x 4 = 5 1.1) 2x 1 + 2x 2 3x 3 + x 4 = 3 3x 1 + 3x 2 4x 3 2x 4 = 1 x + y + 2z = 4 1.4)

More information

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2

More information

Introduction to Signal Spaces

Introduction to Signal Spaces Introduction to Signal Spaces Selin Aviyente Department of Electrical and Computer Engineering Michigan State University January 12, 2010 Motivation Outline 1 Motivation 2 Vector Space 3 Inner Product

More information

Waves on 2 and 3 dimensional domains

Waves on 2 and 3 dimensional domains Chapter 14 Waves on 2 and 3 dimensional domains We now turn to the studying the initial boundary value problem for the wave equation in two and three dimensions. In this chapter we focus on the situation

More information

The goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T

The goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T 1 1 Linear Systems The goal of this chapter is to study linear systems of ordinary differential equations: ẋ = Ax, x(0) = x 0, (1) where x R n, A is an n n matrix and ẋ = dx ( dt = dx1 dt,..., dx ) T n.

More information

Analysis in weighted spaces : preliminary version

Analysis in weighted spaces : preliminary version Analysis in weighted spaces : preliminary version Frank Pacard To cite this version: Frank Pacard. Analysis in weighted spaces : preliminary version. 3rd cycle. Téhéran (Iran, 2006, pp.75.

More information

NATIONAL BOARD FOR HIGHER MATHEMATICS. Research Awards Screening Test. February 25, Time Allowed: 90 Minutes Maximum Marks: 40

NATIONAL BOARD FOR HIGHER MATHEMATICS. Research Awards Screening Test. February 25, Time Allowed: 90 Minutes Maximum Marks: 40 NATIONAL BOARD FOR HIGHER MATHEMATICS Research Awards Screening Test February 25, 2006 Time Allowed: 90 Minutes Maximum Marks: 40 Please read, carefully, the instructions on the following page before you

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.

More information

OPERATOR THEORY ON HILBERT SPACE. Class notes. John Petrovic

OPERATOR THEORY ON HILBERT SPACE. Class notes. John Petrovic OPERATOR THEORY ON HILBERT SPACE Class notes John Petrovic Contents Chapter 1. Hilbert space 1 1.1. Definition and Properties 1 1.2. Orthogonality 3 1.3. Subspaces 7 1.4. Weak topology 9 Chapter 2. Operators

More information

Solutions: Problem Set 3 Math 201B, Winter 2007

Solutions: Problem Set 3 Math 201B, Winter 2007 Solutions: Problem Set 3 Math 201B, Winter 2007 Problem 1. Prove that an infinite-dimensional Hilbert space is a separable metric space if and only if it has a countable orthonormal basis. Solution. If

More information

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1 MATH 5310.001 & MATH 5320.001 FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING 2016 Scientia Imperii Decus et Tutamen 1 Robert R. Kallman University of North Texas Department of Mathematics 1155

More information