Mercer s Theorem, Feature Maps, and Smoothing
|
|
- Edwina Sharon Malone
- 6 years ago
- Views:
Transcription
1 Mercer s Theorem, Feature Maps, and Smoothing Ha Quang Minh, Partha Niyogi, and Yuan Yao Department of Computer Science, University of Chicago 00 East 58th St, Chicago, IL 60637, USA Department of Mathematics, University of California, Berkeley 970 Evans Hall, Berkeley, CA 9470, USA minh,niyogi@cs.uchicago.edu, yao@math.berkeley.edu Abstract. We study Mercer s theorem and feature maps for several positive definite kernels that are widely used in practice. The smoothing properties of these kernels will also be explored. Introduction Kernel-based methods have become increasingly popular and important in machine learning. The central idea behind the so-called kernel trick is that a closed form Mercer kernel allows one to efficiently solve a variety of non-linear optimization problems that arise in regression, classification, inverse problems, and the like. It is well known in the machine learning community that kernels are associated with feature maps and a kernel based procedure may be interpreted as mapping the data from the original input space into a potentially higher dimensional feature space where linear methods may then be used. One finds many accounts of this idea where the input space X is mapped by a feature map Φ : X H (where H is a Hilbert space) so that for any two points x, y X, we have K(x, y) = φ(x), φ(y) H. Yet, while much has been written about kernels and many different kinds of kernels have been discussed in the literature, much less has been explicitly written about their associated feature maps. In general, we do not have a clear and concrete understanding of what exactly these feature maps are. Our goal in this paper is to take steps toward a better understanding of feature maps by explicitly computing them for a number of popular kernels for a variety of domains. By doing so, we hope to clarify the precise nature of feature maps in very concrete terms so that machine learning researchers may have a better feel for them. Following are the main points and new results of our paper:. As we will illustrate, feature maps and feature spaces are not unique. For a given domain X and a fixed kernel K on X X, there exist in fact infinitely many feature maps associated with K. Although these maps are essentially equivalent, in a sense to be made precise in Section 4.3, there are subtleties
2 that we wish to emphasize. For a given kernel K, the feature maps of K induced by Mercer s theorem depend fundamentally on the domain X, as will be seen in the examples of Section. Moreover, feature maps do not necessarily arise from Mercer s theorem, examples of which will be given in Section 4.. The importance of Mercer s theorem, however, goes far beyond the feature maps that it induces: the eigenvalues and eigenfunctions associated with K play a central role in obtaining error estimates in learning theory, see for example [8], [4]. For this reason, the determination of the spectrum of K, which is highly nontrivial in general, is crucially important in its own right. Theorems and 3 in Section give the complete spectrum of the polynomial and Gaussian kernels on S n, including sharp rates of decay of their eigenvalues. Theorem 4 gives the eigenfunctions and a recursive formula for the computation of eigenvalues of the polynomial kernel on the hypercube {, } n.. One domain that we particularly focus on is the unit sphere S n in R n, for several reasons. First, it is a special example of a compact Riemannian manifold and the problem of learning on manifolds has attracted attention recently, see for example [], [3]. Second, its symmetric and homogeneous nature allows us to obtain complete and explicit results in many cases. We believe that S n together with kernels defined on it is a fruitful source of examples and counterexamples for theoretical analysis of kernel-based learning. We will point out that intuitions based on low dimensions such as n = in general do not carry over to higher dimensions - Theorem 5 in Section 3 gives an important example along this line. We will also consider the unit ball B n, the hypercube {, } n, and R n itself. 3. We will also try to understand the smoothness property of kernels on S n. In particular, we will show that the polynomial and Gaussian kernels define Hilbert spaces of functions whose norms may be interpreted as smoothness functionals, similar to those of splines on S n. We will obtain precise and sharp results on this question in the paper. This is the content of Section 5. The smoothness implications allow us to better understand the applicability of such kernels in solving smoothing problems. Notation: For X R n and µ a Borel measure on X, L µ(x) = {f : X C : X f(x) dµ(x) < }. We will also use L (X) for L µ(x) and dx for dµ(x) if µ is the Lebesgue measure on X. The surface area of the unit sphere S n is denoted by S n = n Γ ( n ). Mercer s Theorem One of the fundamental mathematical results underlying learning theory with kernels is Mercer s theorem. Let X be a closed subset of R n, n N, µ a Borel measure on X, and K : X X R a symmetric function satisfying: for any
3 finite set of points {x i } N i= in X and real numbers {a i} N i= N a i a j K(x i, x j ) 0 () i,j= K is said to be a positive definite kernel on X. Assume further that K(x, t) dµ(x)dµ(t) < () X X Consider the induced integral operator L K : L µ(x) L µ(x) defined by L K f(x) = K(x, t)f(t)dµ(t) (3) X This is a self-adjoint, positive, compact operator with a countable system of nonnegative eigenvalues {λ k } k= satisfying k= λ k <. L K is said to be Hilbert- Schmidt and the corresponding L µ(x)-normalized eigenfunctions {φ k } k= form an orthonormal basis of L µ(x). We recall that a Borel measure µ on X is said to be strictly positive if the measure of every nonempty open subset in X is positive, an example being the Lebesgue measure in R n. Theorem (Mercer). Let X R n be closed, µ a strictly positive Borel measure on X, K a continuous function on X X satisfying () and (). Then K(x, t) = λ k φ k (x)φ k (t) (4) k= where the series converges absolutely for each pair (x, t) X X and uniformly on each compact subset of X. Mercer s theorem still holds if X is a finite set {x i }, such as X = {, } n, K is pointwise-defined positive definite and µ(x i ) > 0 for each i.. Examples on the Sphere S n We will give explicit examples of the eigenvalues and eigenfunctions in Mercer s theorem on the unit sphere S n for the polynomial and Gaussian kernels. We need the concept of spherical harmonics, a modern and authoritative account of which is [6]. Some of the material below was first reported in the kernel learning literature in [9], where the eigenvalues for the polynomial kernels with n = 3, were computed. In this section, we will carry out computations for a general n N, n. [ Definition (Spherical Harmonics). Let n = x ] x denote n the Laplacian operator on R n. A homogeneous polynomial of degree k in R n whose Laplacian vanishes is called a homogeneous harmonic of order k. Let Y k (n)
4 denote the subspace of all homogeneous harmonics of order k on the unit sphere S n in R n. The functions in Y k (n) are called spherical harmonics of order k. We will denote by {Y k,j (n; x)} N(n,k) j= any fixed orthonormal basis for Y k (n) where N(n, k) = dim Y k (n) = (k+n )(k+n 3)! k!(n )!, k 0. Theorem. Let X = S n, n N, n. Let µ be the uniform probability distribution on S n. For K(x, t) = exp( x t σ ), σ > 0 λ k = e /σ σ n I k+n/ ( σ )Γ (n ) (5) for all k N {0}, where I denotes the modified Bessel function of the first kind, defined below. Each λ k occurs with multiplicity N(n, k) with the corresponding eigenfunctions being spherical harmonics of order k on S n. The λ k s are decreasing if σ ( n )/. Furthermore ( e A σ )k n (k + n ) k+ < λ k < ( e A σ )k n (k + n ) k+ (6) for A, A depending on σ and n given below. Remark. A = e /σ / (e) n Γ ( n ), A = e /σ +/σ 4 (e) n Γ ( n ). For ν, z C, I ν (z) = j=0 j!γ (ν+j+) ( z ) ν+j. Theorem 3. Let X = S n, n N, n, and d N. Let µ be the uniform probability distribution on S n. For K(x, t) = ( + x, t ) d, the nonzero eigenvalues of L K : L µ(x) L µ(x) are λ k = d+n d! Γ (d + n )Γ ( n ) (7) (d k)! Γ (d + k + n ) for 0 k d. Each λ k occurs with multiplicity N(n, k), with the corresponding eigenfunctions being spherical harmonics of order k on S n. Furthermore, the λ k s form a decreasing sequence and B (k + d + n ) d+n 3 B < λ k < (k + d + n ) d+n 3 (8) where 0 k d, for B, B depending on d, n given below. Remark. B = e d (e) d+n n Γ (d+ d! )Γ ( n ), B e /6 d d+ = e d (e) d+n d! Γ (d+ n )Γ ( n ).. Example on the Hypercube {, } n We will now give an example with the hypercube {, } n. Let M k = {α = (α i ) n i=, α i {0, }, α = α + + α n = k}, then the set {x α } α Mk,0 k n, consists of multilinear mononomials {, x, x x,..., x... x n }.
5 Theorem 4. Let X = {, } n. Let d N, d n be fixed. Let K(x, t) = ( + x, t ) d on X X. Let µ be the uniform distribution on X, then the nonzero eigenvalues λ d k s of L K : L µ(x) L µ(x) satisfy λ d+ k = kλ d k + λ d k + (n k)λ d k+ (9) λ d 0 λ d... λ d d = λ d d = d! (0) and λ d k = 0 for k > d. The corresponding L µ(x)-normalized eigenfunctions for each λ k are {x α } α Mk. Example (d = ). The recurrence relation (9) is nonlinear in two indexes and hence a closed analytic expression for λ d k is hard to find for large d. It is straightforward, however, to write a computer program for computing λ d k. For d = λ 0 = n + λ = λ = with corresponding eigenfunctions, {x,..., x n }, and {x x, x x 3,..., x n x n }, respectively..3 Example on the Unit Ball B n Except for the homogeneous polynomial kernel K(x, t) = x, t d, the computation of the spectrum of L K on the unit ball B n is much more difficult analytically than that on S n. For K(x, t) = ( + x, t ) d and small values of d, it is still possible, however, to obtain explicit answers. Example (X = B n, K(x, t) = ( + x, t ) ). Let µ be the uniform measure on B n. The eigenspace spanned by {x,..., x n } corresponds to the eigenvalue λ = (n+). The eigenspace spanned by { x x Y,j (n; x )}N(n,) j= corresponds to the eigenvalue λ = (n+)(n+4). The eigenvalues that correspond to span{, x } are λ 0, = (n + )(n + 5) + D (n + )(n + 4) where D = (n + ) (n + 5) 6(n + 4). λ 0, = (n + )(n + 5) D (n + )(n + 4) 3 Unboundedness of Normalized Eigenfunctions It is known that the L µ-normalized eigenfunctions {φ k } are generally unbounded, that is in general sup k N φ k = This was first pointed out by Smale, with the first counterexample given in [4]. This phenomenon is very common, however, as the following result shows.
6 Theorem 5. Let X = S n, n 3. Let µ be the Lebesgue measure on S n. Let f : [, ] R be a continuous function, giving rise to a Mercer kernel K(x, t) = f( x, t ) on S n S n. If infinitely many of the eigenvalues of L K : L µ(s n ) L µ(s n ) are nonzero, then for the set of corresponding L µ-normalized eigenfunctions {φ k } k= sup φ k = () k N Remark 3. This is in sharp contrast with the case n =, where we will show that sup φ k k with the supremum attained on the functions { cos kθ, sin kθ } k N. Theorem 5 applies in particular to the Gaussian kernel K(x, t) = exp( x t σ ). Hence care needs to be taken in applying analysis that requires C K = sup k φ k <, for example [5]. 4 Feature Maps 4. Examples of Feature Maps via Mercer s Theorem A natural feature map that arises immediately from Mercer s theorem is Φ µ : X l Φ µ (x) = ( λ k φ k (x)) k= () where if only N < of the eigenvalues are strictly positive, then Φ µ : X R N. This is the map that is often covered in the machine learning literature. Example 3 (n = d =,X = S n ). Theorem 3 gives the eigenvalues (3,, ), with eigenfunctions (, x, x, xx, x x ) = (, cos θ, sin θ sin θ, cos θ, ), where x = cos θ, x = sin θ, giving rise to the feature map 3 Φ µ (x) = (, x, x, x x, x x ) Example 4 (n = d =, X = {, } ). Theorem 4 gives Φ µ (x) = ( 3, x, x, x x ) Observation (i) As our notation suggests, Φ µ depends on the particular measure µ that is in the definition of the operator L K and thus is not unique. Each measure µ gives rise to a different system of eigenvalues and eigenfunctions (λ k, φ k ) k= and therefore a different Φ µ. (ii) In Theorem 3 and, the multiplicity of the λ k s means that for each choice of orthonormal bases of the space Y k (n) of spherical harmonics of order k, there is a different feature map. Thus are infinitely many feature maps arising from the uniform probability distribution on S n alone.
7 4. Examples of Feature Maps not via Mercer s Theorem Feature maps do not necessarily arise from Mercer s theorem. Consider any set X and any pointwise-defined, positive definite kernel K on X X. For each x X, let K x : X R be defined by K x (t) = K(x, t) and H K = span{k x : x X} (3) be the Reproducing Kernel Hilbert Space (RKHS) induced by K, with the inner product K x, K t K = K(x, t), see []. The following feature map is then immediate: Φ K : X H K Φ K (x) = K x (4) In this section we discuss, via examples, two other methods for obtaining feature maps. Let X R n be any subset. Consider the Gaussian kernel K(x, t) = exp( x t σ ) on X X, which admits the following expansion K(x, t) = exp( x t σ ) = e x σ e t σ k=0 (/σ ) k k! α =k C k αx α t α (5) where Cα k k! = (α, which implies the feature map: Φ )!...(α n)! g : X l where Φ g (x) = e x σ (/σ ( ) k Cα k k! x α ) α =k,k=0 Remark 4. The standard polynomial feature maps in machine learning, see for example ([7], page 8), are obtained exactly in the same way. Consider next a special class of kernels that is widely used in practice, called convolution kernels. We recall that for a function f L (R n ), its Fourier transform is defined to be ˆf(ξ) = f(x)e i ξ,x dx R n By Fourier transform computation, it may be shown that if µ : R n R is even, nonnegative, such that µ, µ L (R n ), then the kernel K : R n R n R defined by K(x, t) = µ(u)e i x t,u du (6) R n is continuous, symmetric, positive definite. Further more, for any x, t R n K(x, t) = () n The following then is a feature map of K on X X R n µ(x u) µ(t u)du (7) Φ conv : X L (R n ) (8)
8 For the Gaussian kernel e x t σ Φ conv (x)(u) = () n µ(x u) = ( σ )n e σ u R n 4 e i x t,u du and (Φ conv (x))(u) = ( σ ) n e x u σ One can similarly obtain feature maps for the inverse multiquadric, exponential, or B-spline kernels. The identity e x t σ = ( 4 σ ) n e R x u n σ e t u σ du can also be verified directly, as done in [0], where implications of the Gaussian feature map Φ conv (x) above are also discussed. 4.3 Equivalence of Feature Maps It is known ([7], page 39) that, given a set X and a pointwise-defined, symmetric, positive definite kernel K on X X, all feature maps from X into Hilbert spaces are essentially equivalent. In this section, we will make this statement precise. Let H be a Hilbert space and Φ : X H be such that Φ x, Φ t H = K(x, t) for all x, t X, where Φ x = Φ(x). The evaluation functional L x : H R given by L x v = v, Φ x H, where x varies over X, defines an inclusion map L Φ : H R X (L Φ v)(x) = v, Φ x H where R X denotes the vector space of pointwise-defined, real-valued functions on X. Observe that as a vector space of functions, H K R X. Proposition. Let H Φ = span{φ x : x X}, a subspace of H. The restriction of L Φ on H Φ is an isometric isomorphism between H Φ and H K. Proof. First, L Φ is bijective from H Φ to the image L Φ (H Φ ), since ker L Φ = H Φ. Under the map L Φ, for each x, t X, (L Φ Φ x )(t) = Φ x, Φ t = K x (t), thus K x L Φ Φ x as functions on X. This implies that span{k x : x X} is isomorphic to span{φ x : x X} as vector spaces. The isometric isomorphism of H Φ = span{φ x : x X} and H K = span{k x : x X} then follows from Φ x, Φ t H = K(x, t) = K x, K t K. This completes the proof. Remark 5. Each choice of Φ is thus equivalent to a factorization of the map Φ K : x K x H K, that is the following diagram is commutative Φ K x X K x H K Φ L Φ Φ x H Φ (9) We will call Φ K : x K x H K the canonical feature map associated with K.
9 5 Smoothing Properties of Kernels on the Sphere Having discussed feature maps, we will in this section analyze the smoothing properties of the polynomial and Gaussian kernels and compare them with those of spline kernels on the sphere S n. In the spline smoothing problem on S as described in [], one solves the minimization problem m m (f(x i ) y i ) + λ i= 0 (f (m) (t)) dt (0) for x i [0, ] and f W m, where J m (f) = 0 (f (m) (t)) dt is the square norm of the RKHS W 0 m = {f : f, f,..., f (m ) absolutely continuous, f (m) L [0, ], f (j) (0) = f (j) (), j = 0,,..., m } The space W m is then the RKHS defined by ( ) W m = {} Wm 0 = {f : f K = 4 f(t)dt (f (m) (t)) dt < } induced by a kernel K. One particular feature of spline smoothing, on S, S, or R n, is that in general the RKHS W m does not have a closed form kernel K that is efficiently computable. This is in contrast with the RKHS that are used in kernel machine learning, all of which correspond to closed-form kernels that can be evaluated efficiently. It is not clear, however, whether the norms in these RKHS correspond to smoothness functionals. In this section, we will show that for the polynomial and Gaussian kernels on S n, they do. 5. The Iterated Laplacian and Splines on the Sphere S n Splines on S n for n = and n = 3, as treated by Wahba [], [], can be generalized to any n, n N, via the iterated Laplacian (also called the Laplace-Beltrami operator) on S n. The RKHS corresponding to W m in (0) is a subspace of L (S n ) described by H K = {f : f K = ( ) S n f(x)dx + f(x) m f(x)dx < } S n S n The Laplacian on S n has eigenvalues λ k = k(k + n ), k 0, with corresponding eigenfunctions {Y k,j (n; x)} N(n,k) j=, which form an orthonormal basis in the space Y k (n) of spherical harmonics of order k. For f L (S n ), if we use a the expansion f = 0 + N(n,k) S n k= j= a k,j Y k,j (n; x) then the space H K takes the form H K = {f L (S n ) : f K = a N(n,k) 0 S n + [k(k + n )] m a k,j < } k= j=
10 and thus the corresponding kernel is K(x, t) = + k= N(n,k) [k(k + n )] m Y k,j (n; x)y k,j (n; t) () which is well-defined iff m > n. Let P k(n; t) denote the Legendre polynomial of degree k in dimension n, then () takes the form K(x, t) = + S n k= j= N(n, k) [k(k + n )] m P k(n; x, t ) () which does not have a closed form in general - for the case n = 3, see []. Remark 6. Clearly m can be replaced by any real number s > n. 5. Smoothing Properties of Polynomial and Gaussian Kernels Let n denote the gradient operator on S n (also called the first-order Beltrami operator, see [6] page 79 for a definition). Let Y k Y k (n), k 0, then n Y k L (S n ) = S n n Y k (x) ds n (x) = k(k + n ) (3) This shows that spherical harmonics of higher-order are less smooth. This is particularly straightforward in the case n = with the Fourier basis functions {, cos kθ, sin kθ} k N - as k increases, the functions oscillate more rapidly. It follows that any regularization term f K in problems such as (0), where K possesses a decreasing spectrum λ k - k corresponds to the order of the spherical harmonics - will have a smoothing effect. That is, the higher-order spherical harmonics, which are less smooth, will be penalized more. The decreasing spectrum property is true for the spline kernels, the polynomial kernel ( + x, t ) d, and the Gaussian kernel for σ ( n )/, as we showed in Theorems and 3. Hence all these kernels possess smoothing properties on S n. Furthermore, Theorem shows that for the Gaussian kernel, for all k ( e A σ )k n (k + n ) k+ < λ k < ( e A σ )k n (k + n ) k+ and Theorem 3 shows that for the polynomial kernel ( + x, t ) d B (k + d + n ) d+n 3 B < λ k < (k + d + n ) d+n 3 for 0 k d. Compare these with the eigenvalues of the spline kernels λ k = [k(k + n )] m
11 for k, we see that the Gaussian kernel has the sharpest smoothing property, as can be seen from the exponential decay of the eigenvalues. For K(x, t) = ( + x, t ) d, if d > m n + 3, then K has sharper smoothing property than a spline kernel of order m. Moreover, all spherical harmonics of order greater than d are filtered out, hence choosing K amounts to choosing a hypothesis space of bandlimited functions on S n. A Proofs of Results The proofs for results on S n all make use of properties of spherical harmonics on S n, which can be found in [6]. We will prove Theorem (Theorem 3 is similar) and Theorem 5. A. Proof of Theorem Let f : [, ] R be a continuous function. Let Y k Y k (n) for k 0. Then the Funk-Hecke formula ([6], page 30) states that for any x S n : S n f( x, t )Y k (t)ds n (t) = λ k Y k (x) (4) where λ k = S n f(t)p k (n; t)( t ) n 3 dt (5) and P k (n; t) denotes the Legendre polynomial of degree k in dimension n. Since the spherical harmonics {{Y k,j (n; x)} N(n,k) j= } k=0 form an orthonormal basis for L (S n ), an immediate consequence of the Funk-Hecke formula is that if K on S n S n is defined by K(x, t) = f( x, t ), and µ is the Lebesgue measure on S n, then the eigenvalues of L K : L µ(s n ) L µ(s n ) are given precisely by (5), with the corresponding orthonormal eigenfunctions of λ k being {Y k,j (n; x)} N(n,k) j=. The multiplicity of λ k is therefore N(n, k) = dim(y k (n)). On S n e x t σ = e σ e x,t σ. Thus λ k = e σ S n e t σ P k (n; t)( t ) n 3 dt = e σ S n Γ ( n )(σ ) n/ I k+n/ ( σ ) by Lemma = e /σ σ n I k+n/ ( σ )Γ ( n ) Sn Normalizing by setting S n = gives the required expression for λ k as in (5). Lemma. Let f(t) = e rt, then ( ) f(t)p k (n; t)( t n/ ) n 3 n dt = Γ ( ) I k+n/ (r) (6) r
12 Proof. We apply the following which follows from ([3], page 79, formula 9) e rt ( t ) ν dt = ( ) ν / Γ (ν)i ν /(r) (7) r and Rodrigues rule ([6], page 3), which states that for f C k ([, ]) f(t)p k (n; t)( t ) n 3 dt = Rk (n) where R k (n) = Γ (k+ n ). For f(t) = ert, we have ert P k (n; t)( t ) n 3 dt = R k (n)r k ert ( t n 3 k+ ) = R k (n)r k ( ) k+n/ r Γ (k + n )I k+n/ (r) Substituting in the values of R k (n) gives the desired answer. Lemma. The sequence {λ k } k=0 is decreasing if σ ( /. n) Γ ( n k ) Proof. We will first prove that I k+n/ ( σ ) = ( σ ) k+n/ j=0 = ( σ ) k+n/ j=0 λ k λ k+ > (k + n/)σ. We have ( σ )j j!γ (j+k+n/+) ( σ )j j!(j+k+n/)γ (j+k+n/) < ( σ ) k+n/ ( σ )j k+n/ j=0 j!γ (j+k+n/) = σ (k+n/) I k+n/ ( σ ) f (k) (t)( t n 3 k+ ) dt (8) λ which implies k λ k+ > (k + n/)σ. The inequality λ k λ k+ thus is satisfied if σ (k + n/) for all k 0. It suffices to require that it holds for k = 0, that is σ n/ σ ( /. n) Proof (of (6)). By definition of I ν (z), we have for z > 0 I ν (z) < ( z )ν ( z )j Γ (ν+) j=0 j! = ( z )ν /4 Γ (ν+) ez Then for ν = k + n and z = σ : I k+ n ( σ ) < Γ (k+ n )( σ )k+n e /σ4. Consider Stirling s series for a > 0 Γ (a + ) = ( a ) [ a a + e a + 88a 39 ] 5840a (9) Thus for all a > 0 we can write Γ (a + ) = e A(a) e ( a e. Hence for all k a Γ (k + n ) = ea(k,n) e( k + n n k+ ) e where 0 < A(k, n) < (k+ n ). Then I k+ n ( σ ) < (e) k+ n k+ n (k+n ) The other direction is obtained similarly. ) a+ where 0 < A(a) < = e A(k,n) e( k + n e ( σ )k+n e /σ4 implying (6). n k+ )
13 A. Proof of Theorem 5 We will first show an upper bound for Y k, where Y k is any L (S n )- normalized function in Y k (n), then exhibit a one-dimensional subspace of functions in Y k (n) that attain this upper bound. Observe that Y k belongs to an orthonormal basis {Y k,j (n; x)} N(n,k) j= of Y k (n). The following highlights the crucial difference between the case n = and n 3. Lemma 3. For any n, k 0, for all j N, j N(n, k) Y k,j (n;.) N(n, k) S n (30) In particular, for n = and all k 0: Y k,j (n;.). Proof. The Addition Theorem for spherical harmonics ([6], page 8) states that for any x, α S n N(n,k) j= Y k,j (n; x)y k,j (n; α) = which implies that for any x S n Y k,j (n; x) N(n, k) S n P k(n; x, α ) N(n, k) S n P N(n, k) k(n; x, x ) = S n P N(n, k) k(n; ) = S n giving us the first result. For n =, we have N(n, k) = for k = 0, N(n, k) = for k, and S =, giving us the second result. Definition. Consider the group O(n) of all orthogonal transformations in R n, that is O(n) = {A R n n : A T A = AA T = I}. A function f : S n R is said to be invariant under a transformation A O(n) if f A (x) = f(ax) = f(x) for all x S n. Let α S n. The isotropy group J n,α is defined by J n,α = {A O(n) : Aα = α}. Lemma 4. Assume that Y k Y k (n) is invariant with respect to J n,α and satisfies S n Y k (x) ds n (x) =. Then Y k is unique up to a multiplicative constant C α,n,k with C α,n,k = and Y k = Y k (α) = N(n, k) S n (3) Proof. If Y k is invariant with respect to J n,α, then by ([6], Lemma 3, page 7), it must satisfy Y k (x) = Y k (α)p k (n; x, α ), showing that the subspace of Y k (n) invariant with respect to J n,α is one-dimensional. The Addition Theorem implies that for any α S n
14 By assumption, we then have S n P k (n; x, α ) ds n (x) = Sn N(n,k) = S n Y k (x) ds n (x) = Y k (α) S n P k (n; x, α ) ds n (x) = Y k (α) S n N(n,k), giving us Y k(α) = Y k (x) = N(n,k) S n N(n,k) S n P k(n; x, α ) by the property P k (n; t) for t. Thus Y k =. Thus we for all x Sn N(n,k) S n N(n,k) S n as desired. Proposition (Orthonormal basis of Y k (n) [6]). Let n 3. Let e,..., e n be the canonical basis of R n. Let x S n. We write x = te n + ( ) t x(n ) 0 where t [, ] and x (n ) S n, (x (n ), 0) T span{e,..., e n }. Suppose that for m = 0,,..., k, the orthonormal bases Y m,j, j =,..., N(n, m) of Y m (n ) are given, then an orthonormal basis for Y k (n) is Y k,m,j (n; x) = A m k (n; t)y m,j (n ; x (n ) ) : j =,..., N(n, m) (3) starting with the Fourier basis for n =, where A m n (k + n )(k m)!(k + n + m 3)! k (n; t) = k!γ ( n ) Pk m (n; t) (33) Proposition 3. Let n N, n 3. Let µ be the Lebesgue measure on S n. For each k 0, any orthonormal basis of the space Y k (n) of spherical harmonics of order k contains an L µ-normalized spherical harmonic Y k such that Y k = N(n, k) S n = (k + n )(k + n 3)! k!(n )! S n (34) as k, where S n = n Γ ( n ) is the surface area of Sn. Proof. Let x = te n + ( ) t x(n ), t. For each k 0, the 0 orthonormal basis for Y k (n) in Proposition contains the function Y k,0, (n; x) = A 0 k(t)y 0, (n ; x (n ) ) = A 0 k(t) S n (35) Y k,0, (n; x) = (k+n )(k+n 3)! Γ ( n ) n k! S n P k (n; t) = N(n,k) S n P k(n; t) Then Y k,0, (n; x) is invariant with respect to J n,α where α = (0,..., 0, ). Thus Y k = Y k,0, is the desired function for the current orthonormal basis. For any orthonormal basis of Y k (n), the result follows by Lemma 4 and rotational symmetry on the sphere.
15 Proof (of Theorem 5). By the Funk-Hecke formula, all spherical harmonics of order k are eigenfunctions corresponding to the eigenvalue λ k as given by (5). If infinitely many of the λ k s are nonzero, then the corresponding set of L (S n )- orthonormal eigenfunctions {φ k }, being an orthonormal basis of L (S n ), contains a spherical harmonic Y k satisfying (34), for infinitely many k. It follows from Proposition 3 then that sup k φ k =. References. N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, vol. 68, pages , M. Belkin, P. Niyogi, and V. Sindwani. Manifold Regularization: a Geometric Framework for Learning from Examples. University of Chicago Computer Science Technical Report TR , 004, accepted for publication. 3. M. Belkin and P. Niyogi. Semi-supervised Learning on Riemannian Manifolds. Machine Learning, Special Issue on Clustering, vol. 56, pages 09-39, E. De Vito, A. Caponnetto, and L. Rosasco. Model Selection for Regularized Least-Squares Algorithm in Learning Theory. Foundations of Computational Mathematics, vol. 5, no., pages 59-85, J. Lafferty and G. Lebanon. Diffusion Kernels on Statistical Manifolds. Journal of Machine Learning Research, vol. 6, pages 9-63, C. Müller. Analysis of Spherical Symmetries in Euclidean Spaces. Applied Mathematical Sciences 9, Springer, New York, B. Schölkopf and A.J. Smola. Learning with Kernels. The MIT Press, Cambridge, Massachusetts, S. Smale and D.X. Zhou. Learning Theory Estimates via Integral Operators and Their Approximations, 005, to appear. 9. A.J. Smola, Z.L. Ovari and R.C. Williamson. Regularization with Dot-Product Kernels. Advances in Information Processing Systems, I. Steinwart, D. Hush, and C. Scovel. An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels Kernels. Los Alamos National Laboratory Technical Report LA-UR , December G. Wahba. Spline Interpolation and Smoothing on the Sphere. SIAM Journal of Scientific and Statistical Computing, vol., pages 5-6, 98.. G. Wahba. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59, Society for Industrial and Applied Mathematics, Philadelphia, G.N. Watson. A Treatise on the Theory of Bessel Functions, nd edition, Cambridge University Press, Cambridge, England, D.X. Zhou. The Covering Number in Learning Theory. Journal of Complexity, vol. 8, pages , 00.
Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationRKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee
RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators
More informationISSN , Volume 32, Number 2
ISSN 076-476, Volume 3, Number This article was published in the above mentioned Springer issue The material, including all portions thereof, is protected by copyright; all rights are held exclusively
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationRecall that any inner product space V has an associated norm defined by
Hilbert Spaces Recall that any inner product space V has an associated norm defined by v = v v. Thus an inner product space can be viewed as a special kind of normed vector space. In particular every inner
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationEECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels
EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationSPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS
SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties
More informationChapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.
Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space
More informationVector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.
Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH
More information2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is
More informationScattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions
Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the
More informationHilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.
Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,
More informationEigenvalues and eigenfunctions of the Laplacian. Andrew Hassell
Eigenvalues and eigenfunctions of the Laplacian Andrew Hassell 1 2 The setting In this talk I will consider the Laplace operator,, on various geometric spaces M. Here, M will be either a bounded Euclidean
More informationLearning with Consistency between Inductive Functions and Kernels
Learning with Consistency between Inductive Functions and Kernels Haixuan Yang Irwin King Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Shatin, N.T., Hong
More informationMath 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.
Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,
More informationWhat is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014
What is an RKHS? Dino Sejdinovic, Arthur Gretton March 11, 2014 1 Outline Normed and inner product spaces Cauchy sequences and completeness Banach and Hilbert spaces Linearity, continuity and boundedness
More informationMAT 578 FUNCTIONAL ANALYSIS EXERCISES
MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.
More information1 Math 241A-B Homework Problem List for F2015 and W2016
1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine
More informationStrictly Positive Definite Functions on a Real Inner Product Space
Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationDerivative reproducing properties for kernel methods in learning theory
Journal of Computational and Applied Mathematics 220 (2008) 456 463 www.elsevier.com/locate/cam Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics,
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationVectors in Function Spaces
Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also
More informationSHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction
SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms
More informationRepresentation theory and quantum mechanics tutorial Spin and the hydrogen atom
Representation theory and quantum mechanics tutorial Spin and the hydrogen atom Justin Campbell August 3, 2017 1 Representations of SU 2 and SO 3 (R) 1.1 The following observation is long overdue. Proposition
More informationREAL AND COMPLEX ANALYSIS
REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any
More informationFunctional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32
Functional Analysis Martin Brokate Contents 1 Normed Spaces 2 2 Hilbert Spaces 2 3 The Principle of Uniform Boundedness 32 4 Extension, Reflexivity, Separation 37 5 Compact subsets of C and L p 46 6 Weak
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More information3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable
More informationSolving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels
Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation
More informationStrictly positive definite functions on a real inner product space
Advances in Computational Mathematics 20: 263 271, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Strictly positive definite functions on a real inner product space Allan Pinkus Department
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationTHE PROBLEMS FOR THE SECOND TEST FOR BRIEF SOLUTIONS
THE PROBLEMS FOR THE SECOND TEST FOR 18.102 BRIEF SOLUTIONS RICHARD MELROSE Question.1 Show that a subset of a separable Hilbert space is compact if and only if it is closed and bounded and has the property
More information08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms
(February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationSOLUTIONS TO SELECTED PROBLEMS FROM ASSIGNMENTS 3, 4
SOLUTIONS TO SELECTED POBLEMS FOM ASSIGNMENTS 3, 4 Problem 5 from Assignment 3 Statement. Let be an n-dimensional bounded domain with smooth boundary. Show that the eigenvalues of the Laplacian on with
More informationOn an Eigenvalue Problem Involving Legendre Functions by Nicholas J. Rose North Carolina State University
On an Eigenvalue Problem Involving Legendre Functions by Nicholas J. Rose North Carolina State University njrose@math.ncsu.edu 1. INTRODUCTION. The classical eigenvalue problem for the Legendre Polynomials
More informationChapter 8 Integral Operators
Chapter 8 Integral Operators In our development of metrics, norms, inner products, and operator theory in Chapters 1 7 we only tangentially considered topics that involved the use of Lebesgue measure,
More informationConvergence rates of spectral methods for statistical inverse learning problems
Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)
More informationSpectral theory for compact operators on Banach spaces
68 Chapter 9 Spectral theory for compact operators on Banach spaces Recall that a subset S of a metric space X is precompact if its closure is compact, or equivalently every sequence contains a Cauchy
More informationDefinition and basic properties of heat kernels I, An introduction
Definition and basic properties of heat kernels I, An introduction Zhiqin Lu, Department of Mathematics, UC Irvine, Irvine CA 92697 April 23, 2010 In this lecture, we will answer the following questions:
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS
Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)
More informationCOMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE
COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE MICHAEL LAUZON AND SERGEI TREIL Abstract. In this paper we find a necessary and sufficient condition for two closed subspaces, X and Y, of a Hilbert
More informationMATH 220: INNER PRODUCT SPACES, SYMMETRIC OPERATORS, ORTHOGONALITY
MATH 22: INNER PRODUCT SPACES, SYMMETRIC OPERATORS, ORTHOGONALITY When discussing separation of variables, we noted that at the last step we need to express the inhomogeneous initial or boundary data as
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia
More informationI teach myself... Hilbert spaces
I teach myself... Hilbert spaces by F.J.Sayas, for MATH 806 November 4, 2015 This document will be growing with the semester. Every in red is for you to justify. Even if we start with the basic definition
More informationMATH 5640: Fourier Series
MATH 564: Fourier Series Hung Phan, UMass Lowell September, 8 Power Series A power series in the variable x is a series of the form a + a x + a x + = where the coefficients a, a,... are real or complex
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationTopics in Fourier analysis - Lecture 2.
Topics in Fourier analysis - Lecture 2. Akos Magyar 1 Infinite Fourier series. In this section we develop the basic theory of Fourier series of periodic functions of one variable, but only to the extent
More information1 Continuity Classes C m (Ω)
0.1 Norms 0.1 Norms A norm on a linear space X is a function : X R with the properties: Positive Definite: x 0 x X (nonnegative) x = 0 x = 0 (strictly positive) λx = λ x x X, λ C(homogeneous) x + y x +
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationKernels MIT Course Notes
Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we
More informationSolutions: Problem Set 4 Math 201B, Winter 2007
Solutions: Problem Set 4 Math 2B, Winter 27 Problem. (a Define f : by { x /2 if < x
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationHilbert Spaces. Contents
Hilbert Spaces Contents 1 Introducing Hilbert Spaces 1 1.1 Basic definitions........................... 1 1.2 Results about norms and inner products.............. 3 1.3 Banach and Hilbert spaces......................
More informationChapter 4 Euclid Space
Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,
More informationPreconditioning techniques in frame theory and probabilistic frames
Preconditioning techniques in frame theory and probabilistic frames Department of Mathematics & Norbert Wiener Center University of Maryland, College Park AMS Short Course on Finite Frame Theory: A Complete
More informationKernel B Splines and Interpolation
Kernel B Splines and Interpolation M. Bozzini, L. Lenarduzzi and R. Schaback February 6, 5 Abstract This paper applies divided differences to conditionally positive definite kernels in order to generate
More informationDeviation Measures and Normals of Convex Bodies
Beiträge zur Algebra und Geometrie Contributions to Algebra Geometry Volume 45 (2004), No. 1, 155-167. Deviation Measures Normals of Convex Bodies Dedicated to Professor August Florian on the occasion
More informationElementary linear algebra
Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationLEGENDRE POLYNOMIALS AND APPLICATIONS. We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates.
LEGENDRE POLYNOMIALS AND APPLICATIONS We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates.. Legendre equation: series solutions The Legendre equation is
More informationReal Analysis Qualifying Exam May 14th 2016
Real Analysis Qualifying Exam May 4th 26 Solve 8 out of 2 problems. () Prove the Banach contraction principle: Let T be a mapping from a complete metric space X into itself such that d(tx,ty) apple qd(x,
More informationNotes on Special Functions
Spring 25 1 Notes on Special Functions Francis J. Narcowich Department of Mathematics Texas A&M University College Station, TX 77843-3368 Introduction These notes are for our classes on special functions.
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More information1 Functional Analysis
1 Functional Analysis 1 1.1 Banach spaces Remark 1.1. In classical mechanics, the state of some physical system is characterized as a point x in phase space (generalized position and momentum coordinates).
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationFunctional Analysis I
Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker
More informationBasic Properties of Metric and Normed Spaces
Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion
More informationBessel Functions Michael Taylor. Lecture Notes for Math 524
Bessel Functions Michael Taylor Lecture Notes for Math 54 Contents 1. Introduction. Conversion to first order systems 3. The Bessel functions J ν 4. The Bessel functions Y ν 5. Relations between J ν and
More informationx 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7
Linear Algebra and its Applications-Lab 1 1) Use Gaussian elimination to solve the following systems x 1 + x 2 2x 3 + 4x 4 = 5 1.1) 2x 1 + 2x 2 3x 3 + x 4 = 3 3x 1 + 3x 2 4x 3 2x 4 = 1 x + y + 2z = 4 1.4)
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationIntroduction to Signal Spaces
Introduction to Signal Spaces Selin Aviyente Department of Electrical and Computer Engineering Michigan State University January 12, 2010 Motivation Outline 1 Motivation 2 Vector Space 3 Inner Product
More informationWaves on 2 and 3 dimensional domains
Chapter 14 Waves on 2 and 3 dimensional domains We now turn to the studying the initial boundary value problem for the wave equation in two and three dimensions. In this chapter we focus on the situation
More informationThe goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T
1 1 Linear Systems The goal of this chapter is to study linear systems of ordinary differential equations: ẋ = Ax, x(0) = x 0, (1) where x R n, A is an n n matrix and ẋ = dx ( dt = dx1 dt,..., dx ) T n.
More informationAnalysis in weighted spaces : preliminary version
Analysis in weighted spaces : preliminary version Frank Pacard To cite this version: Frank Pacard. Analysis in weighted spaces : preliminary version. 3rd cycle. Téhéran (Iran, 2006, pp.75.
More informationNATIONAL BOARD FOR HIGHER MATHEMATICS. Research Awards Screening Test. February 25, Time Allowed: 90 Minutes Maximum Marks: 40
NATIONAL BOARD FOR HIGHER MATHEMATICS Research Awards Screening Test February 25, 2006 Time Allowed: 90 Minutes Maximum Marks: 40 Please read, carefully, the instructions on the following page before you
More informationAnalysis Preliminary Exam Workshop: Hilbert Spaces
Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H
More informationwhich arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i
MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.
More informationOPERATOR THEORY ON HILBERT SPACE. Class notes. John Petrovic
OPERATOR THEORY ON HILBERT SPACE Class notes John Petrovic Contents Chapter 1. Hilbert space 1 1.1. Definition and Properties 1 1.2. Orthogonality 3 1.3. Subspaces 7 1.4. Weak topology 9 Chapter 2. Operators
More informationSolutions: Problem Set 3 Math 201B, Winter 2007
Solutions: Problem Set 3 Math 201B, Winter 2007 Problem 1. Prove that an infinite-dimensional Hilbert space is a separable metric space if and only if it has a countable orthonormal basis. Solution. If
More informationMATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1
MATH 5310.001 & MATH 5320.001 FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING 2016 Scientia Imperii Decus et Tutamen 1 Robert R. Kallman University of North Texas Department of Mathematics 1155
More information