Universal Kernels for Multi-Task Learning

Size: px
Start display at page:

Download "Universal Kernels for Multi-Task Learning"

Transcription

1 Journal of Machine Learning Research (XXXX) Submitted XX; Published XXX Universal Kernels for Multi-Task Learning Andrea Caponnetto Department of Computer Science The University of Chicago 1100 East 58th Street, Chicago, IL 60637, and DISI, Università di Genova Via Dodecaneso 35, Genova, Italy Charles A. Micchelli Department of Mathematics and Statistics State University of New York The University at Albany Albany, New York 12222, USA Massimiliano Pontil Department of Computer Science University College London Gower Street, London, WC1E 6BT, UK Yiming Ying Department of Engineering Mathematics University of Bristol Queen s Building, Bristol, BS8 1TR, UK caponnet@uchicago.edu cam@math.albany.edu m.pontil@cs.ucl.ac.uk mathying@gmail.com Editor: Abstract In this paper we are concerned with reproducing kernel Hilbert spaces H K of functions from an input space into a Hilbert space Y, an environment appropriate for multi-task learning. The reproducing kernel K associated to H K has its values as operators on Y. Our primary goal here is to derive conditions which ensure that the kernel K is universal. This means that on every compact subset of the input space, every continuous function with values in Y can be uniformly approximated by sections of the kernel. We provide various characterizations of universal kernels and highlight them with several concrete examples of some practical importance. Our analysis uses basic principles of functional analysis and especially the useful notion of vector measures which we describe in sufficient detail to clarify our results. Keywords. Multi-task learning, operator-valued kernels, universal approximation, vectorvalued reproducing kernel Hilbert spaces. c XXXX Andrea Caponnetto, Charles A. Micchelli, Massimiliano Pontil, and Yiming Ying.

2 Caponnetto, Micchelli, Pontil, and Ying 1. Introduction The problem of studying representations and methods for learning vector-valued functions has received increasing attention in Machine Learning in the recent years. This problem is motivated by several applications in which it is required to estimate a vector-valued function from a set of input/output data. For example, one is frequently confronted with situations in which multiple supervised learning tasks must be learned simultaneously. This problem can be framed as that of learning a vector-valued function f = (f 1, f 2,..., f n ), where each of its components is a real-valued function and corresponds to a particular task. Often, these tasks are dependent on each other in that they share some common underlying structure. By making use of this structure, each task is easier to learn. Empirical studies indicate that one can benefit significantly by learning the tasks simultaneously as opposed to learning them one by one in isolation, see e.g. (Evgeniou et al., 2005) and references therein. In this paper, we build upon the recent work of Micchelli and Pontil (2005) concerning the study of reproducing kernel Hilbert spaces (RKHS) of vector-valued functions (see also Carmeli et al., 2006, Caponnetto and De Vito, 2006, Reisert and Burkhardt, 2007). These functions are defined on a Hausdorff space X and take values on a Hilbert space Y with inner product (, ) (non necessarily separable or finite dimensional). The kernel associated to an RKHS is a function K defined on X X which takes values as operators on Y and satisfies certain properties which shall be reviewed in Section 2. The RKHS is formed by taking the closure of the linear span of kernel sections {K(, x)y, x X, y Y}, relative to the RKHS norm. We refer to such a kernel as an operator-valued kernel and matrix-valued kernel if Y is a finite dimensional Hilbert space. In (Micchelli and Pontil, 2005), it was emphasized that the algebra of operator-valued kernels is, in general, non-abelian. This implies that the structure of the RKHS is richer than that obtained by considering each coordinate of the vector valued function to be in an RKHS space of real-valued functions. The flexibility in designing operator-valued kernels makes it possible to model smoothness properties relative to both the input space X and output space Y. For example, returning to the application to multi-task learning, in which Y = R n, it is possible to choose the kernel so that certain relationships are modeled across the tasks. In this paper, we study conditions on the kernel K which ensure that its associated RKHS is dense in the space of continuous functions. This problem was proposed by Poggio et al. (2002) and, in the particular scalar case (Y = R), studied by Micchelli and Pontil (2004), Micchelli et al. (2006), Steinwart (2001), hou (2003). The condition that the RKHS with kernel K is dense in C(, Y) means that we can approximate arbitrarily well any continuous function with a function in the RKHS. That is, for every continuous function f : Y and any ɛ > 0 there exists a function g in span{k(, x)y : x, y Y} such that max{ f(x) g(x) : x } ɛ. Then, we say that the kernel K is a universal kernel if, for every compact subset of X, its associated RKHS is dense in the space of continuous functions. Because of our discussion above, the density problem for an RKHS of vector-valued functions cannot be merely reduced to study the density problems of each scalar component. Hence, in this paper we aim at studying a unified framework for this problem which includes the scalar case as a special case. 2

3 Universal operator-valued kernels The density problem is important both practically and theoretically. In practice, it is useful to know whether the kernel is dense in the space of continuous functions or in some linear subspace of it. The techniques that we present in this paper can be used to study the density of problem relative to an arbitrary such a subspace. From a theoretical point of view the density problem is important in order to study the consistency of learning algorithms based on RKHS. For example in (Caponnetto and De Vito, 2006) a detailed analysis of the regularized least-squares algorithm over vector-valued RKHS was developed, showing in particular that the density property of the RKHS implies universal consistency of the algorithm. The paper is organized as follows. In Section 2, we review the basic definition and properties of operator-valued kernels and define the notion of universal kernel. Moreover, we describe some examples of such kernels. In Section 3, we introduce the notion of feature map associated to an operator-valued kernel and show its relevance to the density problem. The main result in this section is Theorem 5, which establishes that the closure of the RKHS in the space of continuous functions is the same as the closure of the space generated by the feature map. Our analysis, which uses basic principles of functional analysis, is useful because in many circumstances it may be easier to verify universality through the feature representation than the kernel sections themselves. In Section 4 we provide an alternate proof of Theorem 5 which use the notion of vector measures and derive Corollary 12, which establishes the main tool used to verify the universality of a kernel. Finally, In section 5 we highlight our result with several concrete examples of some practical importance. 2. RKHS of Vector-Valued Functions In this section, we review the theory of reproducing kernel Hilbert spaces of vector-valued functions as in (Micchelli and Pontil, 2005) and introduce the notion of universal kernels. We begin by introducing some notation. We let Y be a Hilbert space with inner product (, ) Y (we drop the subscript Y when confusion does not arise). The vector-valued functions will take values on Y. We denote by L(Y) the space of all bounded linear operators from Y into itself, with the operator norm L := sup y =1 Ly, and by L + (Y) the set of all bounded, positive semi-definite linear operators, that is, L L + (Y) provided that, for any y Y, (y, Ly) 0. We also denote, for any L L(Y), by L its adjoint. Definition 1. We say that a function K : X X L(Y) is an operator-valued kernel on X if K(x, t) = K(t, x) for any x, t X, and it is positive semi-definite, that is, for any m N, {x j : j N m } X and {y j : j N m } Y there holds i,j N m (y i, K(x i, x j )y j ) 0. (1) For any t X and y Y, we introduce the mapping K t y : X Y defined, for every x X by (K t y)(x) := K(x, t)y. We denote by H K the RKHS of vector-valued functions associated with the operator-valued kernel K. In the spirit of Moore-Aronszjain s theorem, H K is the completion of the linear span of kernel sections {K t y : t X, y Y} with inner product, K induced by the equation K t y, f K = (y, f(t)), f H K, t X, y Y. (2) 3

4 Caponnetto, Micchelli, Pontil, and Ying The following proposition collects some useful properties of the kernel which we shall use in our development. Proposition 2. Let K be an operator-valued kernel and, for any x X, K x be defined as above. The following properties hold true for every f H K and x, t X. (a) K(x, t) K(x, x) 1 2 K(t, t) 1 2. (b) f(x) f K K(x, x) 1 2. (c) f(x) f(t) K(x, x) + K(t, t) K(x, t) K(t, x) 1 2 f K. Proof. For the proof of (a) and (b) see (Micchelli and Pontil, 2005, Sec.2). As for the last property, we observe by equation (2) that Moreover, we have that (y, f(x) f(t)) = (K x K t )y, f K K x y K t y K f K. (3) K x y K t y 2 K = ((K(x, x) + K(t, t) K(x, t) K(t, x))y, y) K(x, x) + K(t, t) K(x, t) K(t, x) y 2. Therefore, combining this equation with equation (3) and setting y f(x) f(t) = f(x) f(t), the result follows. Throughout this paper, we assume that the kernel K is continuous relative to the operator norm and, for any compact set X, we introduce the finite quantity κ() := sup x K(x, x) 1 2. This hypothesis implies that any function f H K is continuous by property (c) in Proposition 2, and that it is uniformly bounded by κ() f K on any compact set, by property (b) in Proposition 2. We now return to the formulation of the density problem. For this purpose, we recall that C(, Y) is the Banach space of continuous Y-valued continuous function on a compact subset of X with the maximum norm, defined by f, := sup x f(x) Y. We also define, for every operator-valued kernel K, the subspace of C(, Y) C K (, Y) := span{k x y : x, y Y}, where the closure is relative to the norm in the space C(, Y). Definition 3. We say that an operator-valued kernel K is a universal kernel if, for any compact subset of X, C K (, Y) = C(, Y). In the special case that Y = R n, the kernel function K takes values as n n matrices. The corresponding matrix elements can be identified by the formula (K(x, t)) pq = K x e p, K t e q K, x, t X, where e p, e q are the standard coordinate bases in R n, for p, q N n. 4

5 Universal operator-valued kernels In order to describe some of the examples of operator-valued kernels below, it is useful to first present the following generalization of Schur product of scalar kernels to matrixvalued kernels. For this purpose, for any i N m we let y i = (y 1i, y 2i,..., y ni ) R n so that equation (1) is equivalent to m n i,j=1 p,q=1 y pi (K(x i, x j )) pq y qj 0. (4) From the above observation, we conclude that K is a kernel if and only if ( (K(x i, x j ) p,q ) as a new matrix with row index (p, i) and column index (q, j) is positive semi-definite. Proposition 4. Let G and K be n n matrix-valued kernels and the element-wise product kernel K G : X X R n R n be defined, for any x, t X and p, q N n, by ( K G(x, t) ) pq := ( K(x, t) ) ( ) pq G(x, t). Then, K G is a matrix-valued kernel. pq Proof. We have to check the semi-positiveness of K G. To see this, for any m N, {y i R n : i N m } and {x i X : i N m } we observe that m i,j=1(y i, K G(x i, x j )y j ) = p,i ( y pi y qj K(xi, x j ) ) pq( G(xi, x j ) ) pq. (5) q,j By (4), (( K(x i, x j ) ) pq) is positive semi-definite as a new matrix with (p, i) and (q, j) as row and column indices, and so is (( G(x i, x j ) ) pq). Applying the Schur Lemma (Aronszajn, 1950) to these new matrices implies that equation (5) is nonnegative, and hence proves the assertion. In the following, we present some examples of operator-valued kernels. They will be used in Section 5 to illustrate the general results in Sections 3 and 4. The first example is adapted from (Micchelli and Pontil, 2005). Example 1. If, for every j N m the function G j : X X R is a scalar kernel and B j L + (Y), then the function is an operator-valued kernel. K(x, t) = m B j G j (x, t), x, t X (6) j=1 The operators B j model smoothness across the components of the vector-valued function. For example, in the context of multi-task learning (see e.g. Evgeniou et al., 2005, and references therein), we set Y = R n, hence B j are n n matrices 1. These matrices model the relationships across the tasks. Evgeniou et al. (2005) considered kernels of the form (6) with m = 2, B 1 a multiple of the identity matrix and B 2 a low rank matrix. A specific case for X = R d is (K(x, t)) p,q = λx t + (1 λ)δ pq (x t) 2, p, q N n, 1. In the context of multi-task learning the notation multi-task kernel has been used in place of matrixvalued kernel (Evgeniou et al., 2005). 5

6 Caponnetto, Micchelli, Pontil, and Ying where x t is the standard inner product in R d and λ [0, 1]. This kernel has an interesting interpretation. Using only the first term in the right hand side of the above equation (λ = 1) corresponds to learning all tasks as the same task, that is, all components of the vectorvalued function f = (f 1,..., f n ) are the same function, which will be a linear function since the kernel G 1 is linear. Whereas, using only the second term (λ = 0) corresponds to learn independent tasks, that is, the component of the function f will be in general different functions. These functions will be quadratic since G 2 is a quadratic polynomial kernel. Thus, the above kernel combines two opposite kernels to form a more flexible one. By choosing the parameter λ appropriately, the learning model can be tailored to the data at hand. We note that if the kernel K is a diagonal matrix-valued function, each component of a vector-valued function in the RKHS of K can be represented, independently of the other components, as a function in the RKHS of a scalar kernel. However, in general, an operator-valued kernel will not be diagonal and, more importantly, will not be reduced to a diagonal one by linearly transforming the output space. For example, the kernel in equation (6) cannot be reduced to a diagonal kernel, unless all the matrices B j, j N m can all be transformed into a diagonal matrix by the same transformation. Therefore, in general, the component functions share some underling structure which is contained in the kernel and cannot be treated as independent objects. This fact is further illustrated by the next example. Example 2. Let X 0 be a compact Hausdroff space, T p be a map from X from X 0 (not necessary linear) for p N n and G : X 0 X 0 R be a scalar kernel. Then, ( ) n K(x, t) := G(T p x, T q t), x, t X (7) is a matrix-valued kernel on X. A specific instance of the above example is described by Vazquez and Walter (2003) in the context of system identification. It corresponds to the choices that X 0 = X = R and T p (x) = x + τ p, where τ p R. In this case, the kernel K models delays between the components of the vector-valued function. Indeed, it is easy to verify that, for this choice, for all f H K and p N n, p,q=1 f p (x) := (f(x), e p ) = h(x τ p ), x X where h is a scalar-valued function in the RKHS of kernel G. Other choices of matrix T p are possible and provide interesting extensions of scalar kernels. For example, we make the choice K(x, t) := (e σpq x,t : p, q N n ), where σ = (σ pq ) a positive semi-definite matrix. In view of the eigenvalue decomposition of the matrix σ = n i=1 λ iu i u T i, for each i, p N n we can define the map T p (i) by T p (i) x := λ i u ip x for any x X. Therefore, we obtain that K(x, t) = ( (i) n i=1 e T p x,t q (i) t : p, q N n ) and, so, by Proposition 4, we conclude that K is a matrix-valued kernel. It is interesting to note, in passing, that, although one would expect the function K(x, t) := (e σpq x y 2) n 6 p,q=1, x, t X (8)

7 Universal operator-valued kernels to be a kernel over X = R d, we will show later in Section 5 that this is not true, unless all entries of the matrix σ are the same. The next example is taken from (Lowitzsh, 2005) and was motivated by the construction of divergence free vector-valued radial basis functions. Example 3. We let Y = X = R n, and, for any x = (x p : p N n ) X, G(x) = exp( x 2 σ ) with σ > 0. Then, the Hessian matrix given by is a matrix-valued kernel. K(x, t) := ( ( p q G)(x t) : p, q N n ) x, t X (9) To illustrate our final example we let L 2 (R) be the Hilbert space of square integrable functions on R with the norm h L 2 = R h2 (x)dx. Moreover, we denote by W 1 (R) the Sobolev space of order 1, which is defined as the space of real-valued functions h on R whose norm h W 1 = ( h 2 L + h 2 ) 1 2 L 2 2 is finite Example 4. Let Y = L 2 (R), X = R and consider the linear space of functions from R to Y which have finite norm f 2 = R f(x, ) 2 W 1 + f(x, ) 2 x dx. W 1 Then this is an RKHS and the corresponding operator-valued kernel is given, every x, t X as (K(x, t)y)(r) = e π x t R e π r s y(s)ds, y Y, r R. We note that the above example is of the same form as Example 1. More general examples for the choice that Y = L 2 (R d ) will be provided in Section 5. As an application of the above example, consider the case that the input valuable x represents time and the output is the heat distribution in a medium. Another application, which generalizes the multi-task example above is one in which the input represents a parameterized task (e.g. the profile identifying a customer) and the output is the regression function associated to that task. So, this example is amenable for learning a continuity of tasks. We postpone the proofs of these examples and the discussion about the universality of the kernels described therein to Section Kernels by Feature Maps In this section, we prove that an operator-valued kernel is universal if and only if its feature representation is universal. To explain what we have in mind, we require some additional notation. We let W be a Hilbert space and L(Y, W) be the set of all bounded linear operators from Y to W. A feature representation associated with an operator-valued kernel K is a continuous function Φ : X L(Y, W) 7

8 Caponnetto, Micchelli, Pontil, and Ying such that, for every x, t X K(x, t) = Φ (x)φ(t). (10) where, we recall, for each x X, Φ (x) is the adjoint of Φ(x) and, therefore, it is in L(W, Y). Hence, we call W the feature space later on. In the case that Y = R, the condition that Φ(x) L(Y, W) can be merely viewed as saying that Φ(x) in an element of W. Therefore, at least in this case we can rewrite equation (10) as K(x, t) = (Φ(x), Φ(t)) W. (11) Another example of practical importance corresponds to the choice W = R m and Y = R n, both finite dimensional Euclidean spaces. Here we can identify Φ(x) relative to standard basis of W and Y with the m n matrix Φ(x) = (Φ rp (x) : r N m, p N n ), where Φ rp are scalar-valued continuous functions on X. Therefore, according to (10) the matrix representation of the operator-valued kernel K is, for each x, t X, (K(x, t)) pq = r N m Φ rp (x)φ rq (t), p, q N n. (12) Returning to the general case, we emphasize that we assume that the kernel K has the representation in equation (10), although if it corresponds to a compact integral operator such a representation will follow from the spectral theorem and Mercer Theorem, see e.g. (Micchelli et al., 2006). Associated with a feature representation as described above is the following closed linear subspace of C(, Y) C Φ (, Y) := { Φ ( )w : w W }, where the closure is taken relative to the norm of C(, Y), and the continuity of the functions Φ ( )w follows from the assumed continuity of K(, ) by the inequality Φ (x)w Φ (t)w 2 Φ (x) Φ (t) 2 w 2 W = (Φ (x) Φ (t))(φ(x) Φ(t)) w 2 W = K(x, x) + K(t, t) K(x, t) K(t, x) w 2 W. The meaning of the phrase the feature representation is universal is that C Φ (, Y) = C(, Y) for every compact subset of the input space X. The theorem below demonstrates, as we mention above, that the kernel K is universal if and only if its feature representation is universal. Theorem 5. If K is a continuous operator-valued kernel with feature representation Φ, then for every compact subset of X, we have that C K (, Y) = C Φ (, Y). Proof. The theorem follows straightforwardly from Lemmas 6, 7 and 8. As we know, the feature representation of a given kernel is not unique, therefore we conclude by Theorem 5 that if some feature representation of an operator-valued kernel is universal then every feature representation is universal. We shall give two different proofs of this general theorem. The first one will use a technique highlighted in (Micchelli and Pontil, 2005) and will be given in this section. The 8

9 Universal operator-valued kernels second proof will be given in the next section and uses the notion of vector measure. Both approaches adopt the point of view of Micchelli et al. (2006), in which Theorem 5 is proved in the special case that Y = R. We now begin to explain in detail our first proof. We denote the unit ball in Y by B 1 := {y : y Y, y 1} and let be a prescribed compact subset of X. Recall that B 1 is not compact in the norm topology on Y unless Y is finite dimensional, but it is compact in the weak topology on Y since Y is a Hilbert space (see e.g. Yosida, 1980). Remember that a basis for the open neighborhood of the origin in the weak topology is a set of the form {y : y Y, (y, y i ) 1, i N m }, where y 1,..., y m are arbitrary vectors in Y. We put on B 1 the weak topology and conclude, by Tychonoff theorem (see e.g. Folland, 1999, p.136), that the set B 1 is also compact in the product topology. The above observation allows us to associate Y-valued functions defined on to scalarvalued functions defined on B 1. Specifically, we introduce the map γ : C(, Y) C( B 1 ) which maps any function f C(, Y) to the function γ(f) C( B 1 ) defined by the action (x, y) (f(x), y) Y. Consequently, it follows that the map γ is isometric, since sup x f(x) Y = sup sup x y 1 (f(x), y) Y = sup sup γ(f)(x, y), (13) x y B 1 where the first equality follows by Cauchy-Schwarz inequality. Moreover, we will denote by C γ ( B 1 ) the image of C(, Y) under the mapping γ. In particular, this space is a closed linear subspace of C( B 1 ) and, hence, a Banach space. Similarly, to any operator-valued kernel K on we associate a scalar kernel J on B 1 defined, for every (x, y), (t, u) X B 1, as J((x, y), (t, u)) := (K(x, t)u, y). (14) Moreover, we denote by C J ( B 1 ) the closure in C( B 1 ) of the set of the sections of the kernel, {J((x, y), (, )) : (x, y) B 1 }. It is important to realize that whenever K is a valid operator-valued kernel, then J is a valid scalar kernel. The lemma below relates the set C K (, Y) to the corresponding set C J ( B 1 ) for the kernel J on B 1. Lemma 6. If is a compact subset of X and K is a continuous operator-valued kernel then γ(c K (, Y)) = C J ( B 1 ). Proof. The assertion follows by equation (14) and the continuity of the map γ. In order to prove Theorem 5, we also need to provide a similar lemma for the set C Φ (, Y). Before we state the next lemma, we note that knowing the features of the operator-valued kernel K leads us to the features for the scalar-kernel J associated to K. Specifically, for every (x, y), (t, u) X B 1, we have that J((x, y), (t, u)) = (Ψ(x, y), Ψ(t, u)) W, (15) where the continuous function Ψ : X B 1 W is defined as Ψ(x, y) = Φ(x)y, x X, y B 1. 9

10 Caponnetto, Micchelli, Pontil, and Ying Thus, equation (15) parallels equation (11) except that X is replaced by X B 1. We also denote by C Ψ ( B 1 ) the closed linear subspace of C( B 1 ), { (Ψ( ), w) W : w W }. Lemma 7. If is a compact subset of X and K is a continuous operator-valued kernel with feature representation Φ then γ(c Φ (, Y)) = C Ψ ( B 1 ). Proof. The proof is immediate. Indeed, for each x X, w W, y Y, we have that (Φ (x)w, y) Y = (w, Φ(x)y) W = (Ψ(x, y), w) W. Aimed at proving the last result used in the proof of Theorem 5, we review some facts about signed measures and bounded linear functionals on continuous functions. Let Ω be any prescribed compact Hausdorff space and C(Ω) the space of all real-valued continuous functions with norm,ω. We also use the notation B(Ω) to denote the Borel σ-algebra on Ω. Now, we recall the description of the dual space of C(Ω). By the Riesz representation theorem the linear functional L in the dual space of C(Ω) is identified as a regular signed Borel measure ν on Ω (see e.g. Folland (1999)), that is, L(g) = g(x)dν(x), g C(Ω). Ω The variation of ν is given, for any E B(Ω), by { } ν (E) := sup j N ν(a j) : {A j : j N} pairwise disjoint and j N A j = E, and L = ν where ν = ν (Ω) and L is the operator norm of L defined by L = sup g,ω =1 L(g). Recall that a Borel measure ν is regular if, for any E B(X ), ν(e) = inf { ν(u) : E U, U open } = sup { ν(ū) : Ū E, Ū compact}. In particular, every finite Borel measure on Ω is regular, see Folland (1999, p.217). We denote by M(Ω) the space of all regular signeded measures on Ω with total variation norm. Proposition 8. For any compact set X, and any continuous operator-valued kernel K with feature representation Φ, we have that C Ψ ( B 1 ) = C J ( B 1 ). Proof. For any compact set X, recall that B 1 is compact if B 1 is endowed with the weak topology of Y. Hence, the result follows by applying (Micchelli et al., 2006, Theorem 4) to the scalar kernel J on the set B 1 with the feature representation given by equation (15). However, for the convenience of the reader we review the step of the argument used to prove that theorem. The basic idea is the observation that two closed subspaces of a Banach space are equal if and only if whenever a continuous linear functional vanishes on either one of the subspaces, it must also vanish on the other one. This is a consequence of the Hahn-Banach Theorem (see e.g. Lax, 2002). In the case at hand, we know by the Riesz Representation Theorem that all continuous linear functionals L on C( B 1 ) are given by a regular signed Borel measure ν, that is for every F C( B 1 ), we have that L(F ) = F (x, y)dν(x, y). B 1 10

11 Universal operator-valued kernels Now, suppose that L vanishes on C J ( B 1 ), then we conclude, by (15), that 0 = (Ψ(x, y), Ψ(t, u)) W dν(x, y)dν(t, u). B 1 However, we also have that B 1 B 1 (Ψ(x, y), Ψ(t, u)) W dν(x, y)dν(t, u) = Ψ(x, y)dν(x, y) B 1 B 1 and, so, we conclude that 2 W (16) B 1 Ψ(x, y)dν(x, y) = 0. (17) The proof of equation (16) and the interpretation of the W-valued integral appearing in equation (17) is explained in detail in Micchelli et al. (2006). So, we conclude that L vanishes on C Ψ ( B 1 ). Conversely, if L vanishes on C Ψ ( B 1 ) then, for any x, y B 1, we have that ( ) J((x, y), (t, u))dν(t, u) = Ψ(x, y), Ψ(t, u)ν(t, u) = 0 B 1 B 1 that is, L vanishes on C J ( B 1 ). 4. Characterizations of Universality In this section, we provide an alternate proof of Theorem 5. We feel that this task is worthwhile as it seems appropriate to work directly in the space C(, Y) rather than bypass it and retreat to more familiar space C( B 1 ). The basic principle is the same, namely, we know that two linear subspaces of a Banach space are equal as long as whenever a bounded linear functional vanishes on one of them, it vanishes on the other. Thus, to implement this principle we are led to consider the dual space of C(, Y). To this end, we introduce the notion of vector measures and rely upon the information about them from (Diestel and Uhl, Jr., 1977). Before we get to the definition of vector measures we mention in passing that the need to have a variable description of the dual space of C(, Y) arose also in the context of the feature space perspective for learning the kernel, see (Micchelli and Pontil, 2005). To introduce our first definition, recall that throughout this paper X denotes a Hausdorff space, X any compact subset of X and B() the Borel σ-algebra of. Definition 9. A map µ : B() Y is called a Borel vector measure if µ is countably additive, that is, µ( j=1 E j) = j=1 µ(e j) in the norm of Y, for all sequences {E j : j N} of pairwise disjoint sets in B() It is important to note that the definition of vector measures given in Diestel and Uhl, Jr. (1977) only requires it to be finitely additive. For our purpose here, we only use countably additive measures and thus do not require the general concept of Diestel and Uhl, Jr. (1977). 11

12 Caponnetto, Micchelli, Pontil, and Ying For any vector measure µ, the variation of µ is defined, for any E B(), by the equation { } µ (E) := sup j N µ(a j) : {A j : j N} pairwise disjoint and j N A j = E. In our terminology we conclude from (Diestel and Uhl, Jr., 1977, p.3) that µ is a vector measure if and only if the corresponding variation µ is a scalar measure as explained in Section 3. Whenever µ () <, we call µ a vector measure of bounded variation on. Moreover, we say that a Borel vector measure µ on is regular if its variation measure µ is regular as defined in Section 3. We denote by M(, Y) the Banach space of all vector measures with bounded variation and norm µ := µ (). For any scalar measure ν M( B 1 ), we define a Y-valued function on B(), by the equation µ(e) := ydν(x, y), E B(). (18) E B 1 Let us confirm that µ is indeed a vector measure. For this purpose, choose any sequence of pairwise disjoint subsets {E j : j N} B(), and observe that j N µ(e j) Y j N dν(x, y) Y ν ( B 1 ), E j B 1 which implies that µ () is finite and, hence, µ is a regular vector measure. This observation suggests that we define, for any f C(, Y), the integral of f relative to µ as (f(x), dµ(x)) := (f(x), y)dν(x, y). (19) B 1 By the standard techniques of measure theory, for any vector measure µ M(, Y) the integral on the right-hand side is well-defined. One of our goals below is to show that given any vector measure µ, there corresponds a scalar measure ν such that equation (19) still holds. Before doing so, let us point out a useful property about the integral appearing in the left hand side of equation (19). Specifically, for any y Y, we associate to any µ M(, Y), a scalar measure µ y defined, for any E B(), by the equation µ y (E) := (y, µ(e)). Therefore, we conclude, for any f C(), that (yf(x), dµ(x)) = f(x)dµ y (x). To prepare for our description of the dual space of C(, Y), we introduce, for each f C(, Y), a linear functional L defined by, L µ f := (f(x), dµ(x)). (20) Lemma 10. If µ M(, Y) then L µ C (, Y) and L µ = µ. 12

13 Universal operator-valued kernels Proof. By the definition of the integral appearing in the right-hand side of equation it follows (20) (see e.g. Diestel and Uhl, Jr., 1977), for any f C(, Y), that L µ f µ sup f(x) Y. (21) x Therefore, we obtain that L µ µ, and thus L µ C (, Y). To show that L = µ, it remains to establish that µ L µ. To this end, for any ε > 0 and, by the definition of µ, we conclude that there exist pairwise disjoint sets {A j : j N n } such that j N n A j and µ := µ () ε + j N n µ(a j ) Y. We introduce the function g = µ(a j N j ) n µ(a j ) Y χ Aj which satisfies, for any x, the bound g(x) Y 1. Since µ is a regular measure on, applying Lusin s theorem (Folland, 1999, p.217) to the function χ Aj, there exists a real-valued continuous function f j C() such ε that f j (x) 1 for any x and f j = χ Aj, except on a set E j with µ (E j ). (n+1)2 j+1 We now define a function h : Y Y by setting h(y) = y, if y Y 1 and h(y) = y y Y, if y Y 1, and introduce another function in C(, Y) given by f := µ(a j N j ) n µ(a j ) Y f j. Therefore, the function f = h f is in C(, Y) as well, because f C(, Y) and, for any y, z Y, h(y) h(z) Y 2 y z Y. Moreover, we observe, for any x ( ) j N c, n E j that f(x) = g(x) and, for any x, that f(x) Y 1. We are now ready to estimate the total variation of µ. First, observe that f(x) g(x) Y d µ (x) (n + 1) µ (E j ) ε, j N n and consequently we obtain the inequality µ j N n µ(a j ) Y + ε = (g(x), dµ(x)) + ε (f(x) g(x), dµ(x)) + (f(x), dµ(x)) + ε f(x) g(x) Y d µ (x) + L µ + ε 2ε + L µ. This finishes the proof of the lemma. We will use a vector-valued form of the Riesz representation theorem called Dinculeanu- Singer theorem, see e.g. (Diestel and Uhl, Jr., 1977, p.182). Lemma 11. For any compact set X, the map µ L µ is an isomorphism of Banach spaces from M(, Y) to C (, Y). Proof. For each µ M(X, Y), there exists an L µ C (, Y) given by equation (20). The isometry of the map µ L µ follows from Lemma 10. Therefore, it suffices to prove, for every L C (, Y), that there is a µ M(, Y) such that L µ = L. To this end, note that L γ 1 C γ( B 1 ) since γ is an isometric map from C(, Y) onto C γ ( B 1 ). Since C γ ( B 1 ) is a closed subspace of C( B 1 ), applying the Hahn-Banach extension theorem (see e.g. Folland, 1999, p.222) yields that, 13

14 Caponnetto, Micchelli, Pontil, and Ying for any L Cγ( B 1 ), there exists an extension functional L C ( B 1 ) such that L(F ) = L γ 1 (F ) for any F C γ ( B 1 ). Moreover, recalling that B 1 is compact if B 1 is equipped with the weak topology, by the Riesz representation theorem, for any L, there exists a scalar measure ν on B 1 such that L(F ) = F (x, y)dν(x, y), F C( B 1 ). B 1 Equivalently, for any f C(, Y) there holds Lf = L γ 1 (F ) = F (x, y)dν(x, y) = B 1 (f(x), dµ(x)) = L µ f, where µ is defined i terms of ν as in Equation (18). This finishes the identification between functionals in C(, Y) and vector measures with bounded variation. It is interesting to remark that, as the consequence of the proof of Lemma 11, for every regular vector measure there corresponds a scalar measure ν on B 1 for which equation (18) holds true. Indeed, for any µ M(, Y) we have established in the proof of Lemma 11 that there exists a regular scalar measure ν on B 1 such that L µ f = (f(x), y)dν(x, y). B 1 Since we established the isometry between C (, Y) and M(, Y), it follows that equation (18) holds true. In order to provide our alternate proof of Theorem 5, we need to attend to one further issue. Specifically, we need to define the integral K(t, x)(dµ(x)) as an element in Y. For this purpose, for any µ M(, Y) and t we define a linear functional L t on Y at y Y as L t y := (K(x, t)y, dµ(x)). (22) Since its norm has the property L t ( sup x K(x, t) ) µ, by the Riesz representation lemma, we conclude that there exists a unique element ȳ in Y such that (K(x, t)y, dµ(x)) = (ȳ, y). It is this vector ȳ which we denote by the integral K(t, x)(dµ(x)). Similarly, we define the integral Φ(x)(dµ(x)) as an element in W. To do this, we note that Φ(x) = Φ (x) and Φ (x)y 2 = K(x, x)y, y. Hence, we conclude that 14

15 Universal operator-valued kernels Φ(x) K(x, x) 1 2 κ(). Consequently, the linear functional L on W defined, for any w W, by ( ) L(w) := Φ (x)w, dµ(x) satisfies the inequality L κ() µ. Hence, we conclude that there exists a unique element w W such that L(w) = ( w, w) for any w W. Now, we denote w by Φ(x)(dµ(x)) which means that ( ) ( ) Φ(x)(dµ(x)), w = Φ (x)w, dµ(x). (23) W We have now assembled all the necessary properties of vector measures to provide an alternate proof of Theorem 5 in Section 3. Proof of Theorem 5. We see from the feature representation (10) that K(t, x)(dµ(x)) = Φ (t)φ(x)(dµ(x)) = Φ (t) ( Φ(x)(dµ(x)) ), t. From this equation, we easily see that if Φ(x)(dµ(x)) = 0 then, for every t, K(t, x)(dµ(x)) = 0. On the other hand, applying (23) with the choice w = Φ(x)(dµ(x)) we get ( Φ (t) Therefore, if, for any t, by equation (23), ) Φ(x)(dµ(x)), dµ(t) = K(t, x)(dµ(x)) = 0 then (Φ (x)w, dµ(x)) = 0, w W. Φ(x)(dµ(x)) 2. W Φ(x)(dµ(x)) = 0, or equivalently, Consequently, a linear functional vanishes on C K (, Y) if and only if it vanishes on C Φ (, Y) and thus, we obtained that C K (, Y) = C Φ (, Y). We end this section with a comment that will be useful to us in developing several examples of universal kernels in Section 5. This remark is a basic principle of function analysis which has guided our approach to the study of the problem addressed in this paper. To this end, we recall the notion of the annihilator of a set V which is defined by V := { µ M(, Y) : (v(x), dµ(x)) = 0, v V }. Notice that the annihilator of the closed linear span of V is the same as that of V. Consequently, by applying the basic principle stated at the beginning of this section, we conclude that the linear span of V is dense in C(, Y), that is, span(v) = C(, Y) if and only if the annihilator V = {0}. Hence, applying this observation to the set of kernel sections K() := {K(, x)y : x, y Y} or to the set of its corresponding feature sections Φ() := {Φ ( )w : w W}, we obtain from Lemma 11 and Theorem 5, the following useful fact. 15

16 Caponnetto, Micchelli, Pontil, and Ying Corollary 12. Let be a compact subset of X, K a continuous operator-valued kernel, and Φ its feature representation. Then, C K (, Y) = C(, Y) if and only if either one of the following statements is valid. (1) K() = (2) Φ() = { µ M(, Y) : { µ M(, Y) : K(t, x)(dµ(x)) = 0, } t = { 0 } ; } Φ(x)(dµ(x)) = 0 = { 0 }. 5. Universal Kernels In this section, we prove the universality of some kernels, based on the theorems developed above. Specifically, the examples highlighted in Section 2 will be discussed in detail. 5.1 Product of Scalar Kernels and Operators Our first example is produced by coupling a scalar kernel with an operator in L + (Y). Given a scalar kernel G on X and an operator B L + (Y), we define the function K : X X L(Y) by K(x, t) = G(x, t)b, x, t X. (24) For any {x j X : j N m } and {y j Y : j N m }, we know that (G(x i, x j )) i,j N m and ((By i, y j )) i,j N m are positive semi-definite. Applying Schur s lemma implies that the matrix (G(x i, x j )(By i, y j )) i,j N m is positive semi-definite and hence, K is positive semidefinite. Moreover, K (x, t) = K(x, t) = K(t, x) for any x, t X. Therefore, we conclude by Definition 1 that K is an operator-valued kernel. Our goal below is to use the feature representation of the scalar kernel G to introduce the corresponding one of kernel K. To this end, we first let W be a Hilbert space and φ : X W a feature map of the scalar kernel G, so that G(x, t) = (φ(x), φ(t)) W, x, t X. Then, we introduce the tensor vector space W Y whose elements are tensors w y with w W and y Y satisfying, for any w 1, w 2 W, y 1, y 2 Y and c R, the multi-linear relation cw y = w cy = c(w y), (w 1 + w 2 ) y = w 1 y + w 2 y, and w (y 1 + y 2 ) = w y 1 + w y 2. This vector space W Y becomes a Hilbert space provided that it is endowed with inner product norm defined, for any w 1 y 1, w 2 y 2 W Y, by w 1 y 1, w 2 y 2 = (w 1, w 2 ) W (y 1, y 2 ) Y. (25) The above tensor product suggests us to define the map Φ : X L(Y, W Y) of kernel K by Φ(x)y := φ(x) By, x X, y Y, 16

17 Universal operator-valued kernels and it follows that Φ : X L(W Y, Y) is given by Φ (x)(w y) := (φ(x), w) W By, x X, w W, and y Y. (26) From the above observation, it is easy to check, for any x, t X and y, u Y, that (K(x, t)y, u) = Φ(x)y, Φ(t)u. Therefore, we conclude that Φ is a feature map for the operator-valued kernel K. Finally, we say that an operator B L + (Y) is positive definite if (By, y) is positive whenever y is nonzero. We are now ready to present the result on universality of kernel K. Theorem 13. Let G : X X R be a continuous scalar kernel, B L + (Y) and K be defined by equation (24). Then, K is an operator-valued universal kernel if and only if the scalar kernel G is universal and the operator B is positive definite. Proof. By Corollary 12 and the feature representation (26), we only need to show that Φ() = {0} if and only if G is universal and the operator B is positive definite. We begin with the sufficiency. Suppose that there exists a nonzero vector measure µ such that, for any w y W Y, there holds (Φ (x)(w y), dµ(x)) = (φ(x), w) W ( By, dµ(x)) = 0. (27) Here, with a little abuse of notation we interpret, for a fixed y Y, ( By, dµ(x)) as a scalar measure defined, for any E B(), by ( By, dµ(x)) = ( By, µ(e)). E Since µ M(, Y), ( By, dµ(x)) is a regular signed scalar measure. Therefore, we see from (27) that ( By, dµ(x)) φ() for any y Y. Remember that G is universal if and only if φ() = {0}, and thus we conclude from (27) that ( By, dµ(x)) = 0 for any y Y. It follows that ( By, µ(e)) = 0 for any y Y and E B(). Thus, for any fixed set E taking the choice y = Bµ(E) implies that (Bµ(E), µ(e)) = 0. Since E is arbitrary, this means µ = 0 and thus finishes the proof for the sufficiency. To prove the necessity, suppose first that G is not universal and hence, we know that, for some compact subset of X, there exists a nonzero scalar measure ν M() such that ν φ(), that is, (φ(x), w)dν(x) = 0 for any w W. This suggests us to choose, for a nonzero y 0 Y, the nonzero vector measure µ = y 0 ν defined by µ(e) := y 0 ν(e) for any E B(). Hence, the integral in equation (27) equals (Φ (x)(w y), dµ(x)) = ( By, y 0 ) (φ(x), w) W dν(x) = 0. Therefore, we conclude that there exists a nonzero vector measure µ Φ(), which implies that K is not universal. If B is not positive definite, namely, there exists a nonzero element y 1 Y such that (By 1, y 1 ) = 0. However, we observe that By 1 2 = (By 1, y 1 ) which implies that By 1 = 17

18 Caponnetto, Micchelli, Pontil, and Ying 0. This suggests us to choose a nonzero vector measure µ = y 1 ν with some nonzero scalar measure ν. Therefore, we conclude, for any w W and y Y, that (Φ (x)(w y), dµ(x)) = ( By, y 1 ) (φ(x), w) W dν(x) = (y, By 1 ) (φ(x), w) W dν(x) = 0, which implies that the nonzero vector measure µ Φ(). This finishes the proof of the theorem. In the special case Y = R n, the operator B is an n n positive semi-definite matrix. Then, Theorem 13 tells us that the matrix-valued kernel K(x, t) := G(x, t)b is universal if and only if G is universal and the matrix B is of full rank. We proceed to consider kernels produced by a finite combination of scalar kernels and operators. Specifically, we consider, for any j N m, that G j : X X R be a scalar kernel and B j L + (Y). We are interested in the kernel defined, for any x, t X, by K(x, t) := j N m G j (x, t)b j. (28) Suppose also, for each scalar kernel G j, that there exists feature Hilbert spaces W j and feature maps φ j : X W j. To explain the associated feature maps of kernel K, we need to define a Hilbert feature space. For this purpose, let H j be a Hilbert space with inner products (, ) j for j N m and we introduce the direct sum Hilbert space j N m H j as follows. The elements in this space are of the form (h 1,..., h m ) with h j H j, and its inner product is defined, for any (h 1,..., h m ), (h 1,..., h m) j N m H j, by (h 1,..., h m ), (h 1,..., h m) := j N m (h j, h j) j. This observation suggests us to define the feature space of kernel K by the direct sum Hilbert space W := j N m (W j Y), and its the map Φ : X L(Y, W), for any x X and y Y, by Φ(x)y := (φ 1 (x) B 1 y,..., φ m (x) B m y). (29) Hence, its adjoint operator Φ : X L( j N m (W j Y), Y) is given, for any (w 1 y 1,..., w m y m ) j N m (W j Y), by Φ (x)((w 1 y 1,..., w m y m )) := (φ j (x), w j ) Wj Bj y j. j N m Using the above observation, it is easy to see that, for any x, t X, K(x, t) = Φ (x)φ(t). Thus K is an operator-valued kernel and Φ is a feature map of K. We are now in a position to state the result about the universality of the kernel K. 18

19 Universal operator-valued kernels Theorem 14. Suppose that G j : X X R is a continuous scalar universal kernel, and B j L + (Y) for j N m. Then, K(x, t) := j N m G j (x, t)b j is universal if and only if j N m B j is positive definite. Proof. We only need to prove that Φ() = {0} for any compact set if and only if j N m B j is positive definite. To see this let us notice that, by the definition of Φ and Φ() associated to kernel K, µ Φ() implies, for any (w 1 y 1,..., w m y m ) j N m (W j Y), that (φ j (x), w j ) Wj ( Bj y j, dµ(x)) = 0. j N m Since w j W j is arbitrary, the above equation is equivalent to (φ j (x), w j ) Wj ( B j y j, dµ(x)) = 0, w j W j, y j Y and j N m, (30) which implies that ( B j y j, dµ(x)) φ() for any j N m. Recall that G j is universal if and only if φ() = {0}. Therefore, equation (30) holds true if and only if (µ(e), B j y j ) = ( B j µ(e), y j ) = 0, E B(), y j Y. (31) To move on to the next step, we will show that equation (31) is true if and only if (µ(e), B j µ(e)) = 0, E B(). (32) To see this, we observe, for any j N m, that B j µ(e) 2 = (µ(e), B j µ(e)). Hence, equation (32) implies equation (31). Conversely, applying equation (31) with the choice y j = µ(e) directly yields equation (32). However, we know, for any y Y and j N m, that (B j y, y) is nonnegative. Therefore, equation (32) is equivalent to that ( ) ( B j )µ(e), µ(e) = 0, E B(). (33) j N m Therefore, we conclude that µ Φ() if and only if the above equation holds true. Obviously, by equation (33), we see that if j N m B j is positive definite then µ = 0. This means that kernel K is universal. Suppose that j N m B j is not positive definite, that is, there exists a nonzero y 0 Y such that ( ) 1 j N m B 2 j y 0 2 := ( ( ) j N m B j )y 0, y 0 = 0. Hence, choosing a nonzero vector measure µ := y 0 ν, with ν any nonzero scalar measure, implies that equation (33) holds true and, thus kernel K is not universal. This finishes the proof of the theorem. We are in a position to analyze Examples 1 and 4 given in the Section 2. Example 1 (continued) Since the function K considered in Example 1 has the feature representation (29), it is an operator-valued kernel. Moreover, necessary and sufficient conditions for the universality of K are given in Theorem 14 19

20 Caponnetto, Micchelli, Pontil, and Ying We now discuss a class of kernel which include that in Example 4. To this end, we use the notation + = {0} N and, for any smooth function f : R m R and index α = (α 1,..., α m ) m +, we denote its α partial derivative by ( α f)(x) := α k αf(x) α 1 x 1... α mx m. Then, recall (Stein, 1976) that the Sobolev space W k with integer order k is the space of real valued functions with norm defined by f 2 W := k R α f(x) 2 dx. (34) m This space can be extended to any fractional index s > 0. To see this, we need the Fourier transform defined for any f L 1 (R m ) as ˆf(ξ) := e R 2π x,ξ f(x)dx, ξ R m. m It has a natural extension to L 2 (R m ) satisfying the Plancherel formula f L 2 (R m ) = ˆf L 2 (R m ). In particular, we observe, for any α d and ξ R m, that α f(ξ) = (2πiξ) α ˆf(ξ). Hence, by Plancherel formula, we see, for any f W k with k N, that its norm f W k is equivalent to ( (1 + 4π ξ R 2 ) k ˆf(ξ) 1 dξ) 2 2. m This observation suggests us to introduce fractional Sobolev space W s (see e.g. Stein (1976)) with any order s > 0 with norm defined, for any real function f, by f 2 W s := R m (1 + 4π 2 ξ 2 ) s ˆf(ξ) 2 dξ. Finally, we need the Sobolev embedding lemma (see e.g. Folland (1999), Stein (1976)) which states that, for any s > m 2, there exists an absolute constant c such that, for any f W s and any x R m, there holds f(x) c f W s. The next result applies to Example 4 in the high dimensional case. Proposition 15. Let Y = L 2 (R d ), X = R and let H be the space of real-valued functions with norm f 2 := f(x, ) 2 + W d+1 f 2 x (x, ) 2 dx. W d+1 2 R Then this is an RKHS and the corresponding operator-valued kernel is given, every x, t X as (K(x, t)y)(r) = e π x t R d e π r s y(s)ds, y Y, r R d (35) is a universal operator-valued kernel. Proof. By applying Sobolev embedding theorem there exists a constant c such that, for any y B 1 and f H, (y, f(x)) Y c f. 20

21 Universal operator-valued kernels Hence, by the Riesz representation lemma there exists a reproducing kernel that H is an RKHS (Micchelli and Pontil, 2005). In the following, our target is to show that K given by equation (35) is the kernel associated to H. To see this, it suffices to show that the reproducing property holds, that is, for any f H, y Y and x X (y, f(x)) Y = K x y, f, (36) where, denotes the inner product in H. By the Plancherel formula, we observe that the left-hand side of equation (36) equals R R ŷ(τ)e 2πi x,ξ ˆf(ξ, τ)dξdτ. d On the other hand, note that K x y(x )( ) := (K(x, x)y)( ) Y, and we denote its Fourier transform by K(, x)y)( )(ξ, τ) = R d R,ξ e 2πi x e 2πi r,τ (K(x, x)y)(r)drdx which by Plancherel formula again, is equal to e 2πi x,ξ (1 + 4π 2 ξ 2 ) d+1 2 ŷ(τ) (1 + 4π 2 τ 2 ). (37) However, the right-hand side of equation (36) is identical to R R K(, x)y)( )(ξ, τ) ˆf(ξ, τ)(1 + 4π 2 ξ 2 ) d+1 2 (1 + 4π 2 τ 2 ). d Putting (37) into the above equation, we immediately know that the reproducing property (36) holds true. This verifies that K is the kernel of the Hilbert space H. To prove the universality of this kernel, let be any prescribed compact subset of X, we define the Poisson kernel, for any x, t R, by G(x, t) := e x t and the operator B : L 2 (R d ) L 2 (R d ) by Bg(r) := e R σ r s g(s)ds, r R d. d Then, K(x, t) = e σ x t B. Note that Bg( )(ξ) = c d ĝ(ξ) (1 + 4π 2 ξ 2 ) d+1 2 Hence, by Theorem 13 it suffices to prove that G is universal and B is positive definite. The universality of G follows from Steinwart (2001). To show the positive definiteness of B, applying the Plancherel formula implies that the equation (Bg, g) = 0 is equivalent to ĝ(ξ) 2 c d = 0. R d (1 + 4π 2 ξ 2 ) d

22 Caponnetto, Micchelli, Pontil, and Ying Therefore, ĝ = 0 and, so, g = 0 almost everywhere, which means that B is positive definite. Hence, we conclude by Theorem 13, that K is universal. Let us proceed to discuss continuous parameterized kernels. Let Σ be a locally compact Hausdorff space and, for any σ Σ, B(ω) is an n n positive semi-definite matrix. We are interested in the following kernel K(x, t) = G(σ)(x, t)b(σ)dp(σ), x, t X. (38) Σ To investigate this kind of kernels, for any σ Σ we assume that G(σ) be a scalar kernel with feature representation given by, for any x, t X, G(x, t) = φ σ (x), φ g s(t) W. We introduce the Hilbert space W = L 2 (Σ, W Y, p) with norm, for any f : Σ W Y, defined by f 2 W := f(σ) 2 W Ydp(σ). Σ Next, we define a map Φ : X L(Y, L 2 (Σ, W Y, p)), for any x X and σ Σ, by Φ(x)y(σ) := φ σ (x) ( B(σ)y). By the exact argument as that before Theorem 14, we know that K is a kernel and has the feature map Φ with feature space W. We are ready to present a sufficient condition on the universality of K. Theorem 16. Let G(σ) be a continuous universal kernel and B(σ) be positive definite in the support of the probability measure p. Suppose K be defined by equation (38). Then, the operator-valued kernel K is universal. Proof. Notice the feature W = L 2 (Σ, W Y, p). By Corollary 12, for any compact set X we need to check the equation φ σ (x) B(σ)(dµ(x)) = 0 which the above equation equals zero should be interpreted as φ σ (x) B(σ)(dµ(x)) 2 dp(σ) = 0 W support(p) Therefore, there exists a σ 0 support(p) such that φ σ (x) B(σ)(dµ(x)) = 0. Equivalently, φ σ0 (x)( B(σ 0 )dµ(x), y) = 0 for any y Y. But G(σ 0 ) is universal, ap- pealing to the feature characterization in the scalar case implies that the scalar measure ( B(σ 0 )dµ(x), y) = 0. Since y Y is arbitrary, we have that µ 0 which completes the proof of this theorem. An example in the continuously parameterized scenario can be described as below. 22

Universal Multi-Task Kernels

Universal Multi-Task Kernels Journal of Machine Learning Research (XXXX) Submitted XX; Published XXX Universal Multi-Task Kernels Andrea Caponnetto Department of Computer Science The University of Chicago 1100 East 58th Street, Chicago,

More information

Universal Multi-Task Kernels

Universal Multi-Task Kernels Journal of Machine Learning Research 9 (2008) 1615-1646 Submitted 7/07; Revised 1/08; Published 7/08 Universal Multi-Task Kernels Andrea Caponnetto Department of Mathematics City University of Hong Kong

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

CONTENTS. 4 Hausdorff Measure Introduction The Cantor Set Rectifiable Curves Cantor Set-Like Objects...

CONTENTS. 4 Hausdorff Measure Introduction The Cantor Set Rectifiable Curves Cantor Set-Like Objects... Contents 1 Functional Analysis 1 1.1 Hilbert Spaces................................... 1 1.1.1 Spectral Theorem............................. 4 1.2 Normed Vector Spaces.............................. 7 1.2.1

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

CHAPTER V DUAL SPACES

CHAPTER V DUAL SPACES CHAPTER V DUAL SPACES DEFINITION Let (X, T ) be a (real) locally convex topological vector space. By the dual space X, or (X, T ), of X we mean the set of all continuous linear functionals on X. By the

More information

Compact operators on Banach spaces

Compact operators on Banach spaces Compact operators on Banach spaces Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto November 12, 2017 1 Introduction In this note I prove several things about compact

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

Commutative Banach algebras 79

Commutative Banach algebras 79 8. Commutative Banach algebras In this chapter, we analyze commutative Banach algebras in greater detail. So we always assume that xy = yx for all x, y A here. Definition 8.1. Let A be a (commutative)

More information

CHAPTER VIII HILBERT SPACES

CHAPTER VIII HILBERT SPACES CHAPTER VIII HILBERT SPACES DEFINITION Let X and Y be two complex vector spaces. A map T : X Y is called a conjugate-linear transformation if it is a reallinear transformation from X into Y, and if T (λx)

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm Chapter 13 Radon Measures Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm (13.1) f = sup x X f(x). We want to identify

More information

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

MAT 578 FUNCTIONAL ANALYSIS EXERCISES MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

7 Complete metric spaces and function spaces

7 Complete metric spaces and function spaces 7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m

More information

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014 What is an RKHS? Dino Sejdinovic, Arthur Gretton March 11, 2014 1 Outline Normed and inner product spaces Cauchy sequences and completeness Banach and Hilbert spaces Linearity, continuity and boundedness

More information

Appendix A Functional Analysis

Appendix A Functional Analysis Appendix A Functional Analysis A.1 Metric Spaces, Banach Spaces, and Hilbert Spaces Definition A.1. Metric space. Let X be a set. A map d : X X R is called metric on X if for all x,y,z X it is i) d(x,y)

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Bounded and continuous functions on a locally compact Hausdorff space and dual spaces

Bounded and continuous functions on a locally compact Hausdorff space and dual spaces Chapter 6 Bounded and continuous functions on a locally compact Hausdorff space and dual spaces Recall that the dual space of a normed linear space is a Banach space, and the dual space of L p is L q where

More information

Complete Nevanlinna-Pick Kernels

Complete Nevanlinna-Pick Kernels Complete Nevanlinna-Pick Kernels Jim Agler John E. McCarthy University of California at San Diego, La Jolla California 92093 Washington University, St. Louis, Missouri 63130 Abstract We give a new treatment

More information

A Concise Course on Stochastic Partial Differential Equations

A Concise Course on Stochastic Partial Differential Equations A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original

More information

Topological vectorspaces

Topological vectorspaces (July 25, 2011) Topological vectorspaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ Natural non-fréchet spaces Topological vector spaces Quotients and linear maps More topological

More information

Spectral Theory, with an Introduction to Operator Means. William L. Green

Spectral Theory, with an Introduction to Operator Means. William L. Green Spectral Theory, with an Introduction to Operator Means William L. Green January 30, 2008 Contents Introduction............................... 1 Hilbert Space.............................. 4 Linear Maps

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Functional Analysis I

Functional Analysis I Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker

More information

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan Wir müssen wissen, wir werden wissen. David Hilbert We now continue to study a special class of Banach spaces,

More information

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first Math 632/6321: Theory of Functions of a Real Variable Sample Preinary Exam Questions 1. Let (, M, µ) be a measure space. (a) Prove that if µ() < and if 1 p < q

More information

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES CHRISTOPHER HEIL 1. Compact Sets Definition 1.1 (Compact and Totally Bounded Sets). Let X be a metric space, and let E X be

More information

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e., Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional

More information

Fourier Transform & Sobolev Spaces

Fourier Transform & Sobolev Spaces Fourier Transform & Sobolev Spaces Michael Reiter, Arthur Schuster Summer Term 2008 Abstract We introduce the concept of weak derivative that allows us to define new interesting Hilbert spaces the Sobolev

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction Functional analysis can be seen as a natural extension of the real analysis to more general spaces. As an example we can think at the Heine - Borel theorem (closed and bounded is

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

Stone-Čech compactification of Tychonoff spaces

Stone-Čech compactification of Tychonoff spaces The Stone-Čech compactification of Tychonoff spaces Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto June 27, 2014 1 Completely regular spaces and Tychonoff spaces A topological

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

Problem Set 6: Solutions Math 201A: Fall a n x n,

Problem Set 6: Solutions Math 201A: Fall a n x n, Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

1 Inner Product Space

1 Inner Product Space Ch - Hilbert Space 1 4 Hilbert Space 1 Inner Product Space Let E be a complex vector space, a mapping (, ) : E E C is called an inner product on E if i) (x, x) 0 x E and (x, x) = 0 if and only if x = 0;

More information

SPECTRAL THEORY EVAN JENKINS

SPECTRAL THEORY EVAN JENKINS SPECTRAL THEORY EVAN JENKINS Abstract. These are notes from two lectures given in MATH 27200, Basic Functional Analysis, at the University of Chicago in March 2010. The proof of the spectral theorem for

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

ON MATRIX VALUED SQUARE INTEGRABLE POSITIVE DEFINITE FUNCTIONS

ON MATRIX VALUED SQUARE INTEGRABLE POSITIVE DEFINITE FUNCTIONS 1 2 3 ON MATRIX VALUED SQUARE INTERABLE POSITIVE DEFINITE FUNCTIONS HONYU HE Abstract. In this paper, we study matrix valued positive definite functions on a unimodular group. We generalize two important

More information

CHAPTER X THE SPECTRAL THEOREM OF GELFAND

CHAPTER X THE SPECTRAL THEOREM OF GELFAND CHAPTER X THE SPECTRAL THEOREM OF GELFAND DEFINITION A Banach algebra is a complex Banach space A on which there is defined an associative multiplication for which: (1) x (y + z) = x y + x z and (y + z)

More information

CHAPTER I THE RIESZ REPRESENTATION THEOREM

CHAPTER I THE RIESZ REPRESENTATION THEOREM CHAPTER I THE RIESZ REPRESENTATION THEOREM We begin our study by identifying certain special kinds of linear functionals on certain special vector spaces of functions. We describe these linear functionals

More information

LECTURE 1: SOURCES OF ERRORS MATHEMATICAL TOOLS A PRIORI ERROR ESTIMATES. Sergey Korotov,

LECTURE 1: SOURCES OF ERRORS MATHEMATICAL TOOLS A PRIORI ERROR ESTIMATES. Sergey Korotov, LECTURE 1: SOURCES OF ERRORS MATHEMATICAL TOOLS A PRIORI ERROR ESTIMATES Sergey Korotov, Institute of Mathematics Helsinki University of Technology, Finland Academy of Finland 1 Main Problem in Mathematical

More information

FRAMES AND TIME-FREQUENCY ANALYSIS

FRAMES AND TIME-FREQUENCY ANALYSIS FRAMES AND TIME-FREQUENCY ANALYSIS LECTURE 5: MODULATION SPACES AND APPLICATIONS Christopher Heil Georgia Tech heil@math.gatech.edu http://www.math.gatech.edu/ heil READING For background on Banach spaces,

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

A Brief Introduction to Functional Analysis

A Brief Introduction to Functional Analysis A Brief Introduction to Functional Analysis Sungwook Lee Department of Mathematics University of Southern Mississippi sunglee@usm.edu July 5, 2007 Definition 1. An algebra A is a vector space over C with

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

Riesz Representation Theorems

Riesz Representation Theorems Chapter 6 Riesz Representation Theorems 6.1 Dual Spaces Definition 6.1.1. Let V and W be vector spaces over R. We let L(V, W ) = {T : V W T is linear}. The space L(V, R) is denoted by V and elements of

More information

Real Analysis Qualifying Exam May 14th 2016

Real Analysis Qualifying Exam May 14th 2016 Real Analysis Qualifying Exam May 4th 26 Solve 8 out of 2 problems. () Prove the Banach contraction principle: Let T be a mapping from a complete metric space X into itself such that d(tx,ty) apple qd(x,

More information

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

6 Classical dualities and reflexivity

6 Classical dualities and reflexivity 6 Classical dualities and reflexivity 1. Classical dualities. Let (Ω, A, µ) be a measure space. We will describe the duals for the Banach spaces L p (Ω). First, notice that any f L p, 1 p, generates the

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

A VERY BRIEF REVIEW OF MEASURE THEORY

A VERY BRIEF REVIEW OF MEASURE THEORY A VERY BRIEF REVIEW OF MEASURE THEORY A brief philosophical discussion. Measure theory, as much as any branch of mathematics, is an area where it is important to be acquainted with the basic notions and

More information

Another Riesz Representation Theorem

Another Riesz Representation Theorem Another Riesz Representation Theorem In these notes we prove (one version of) a theorem known as the Riesz Representation Theorem. Some people also call it the Riesz Markov Theorem. It expresses positive

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

CHAPTER II THE HAHN-BANACH EXTENSION THEOREMS AND EXISTENCE OF LINEAR FUNCTIONALS

CHAPTER II THE HAHN-BANACH EXTENSION THEOREMS AND EXISTENCE OF LINEAR FUNCTIONALS CHAPTER II THE HAHN-BANACH EXTENSION THEOREMS AND EXISTENCE OF LINEAR FUNCTIONALS In this chapter we deal with the problem of extending a linear functional on a subspace Y to a linear functional on the

More information

Infinite-dimensional Vector Spaces and Sequences

Infinite-dimensional Vector Spaces and Sequences 2 Infinite-dimensional Vector Spaces and Sequences After the introduction to frames in finite-dimensional vector spaces in Chapter 1, the rest of the book will deal with expansions in infinitedimensional

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Banach Spaces V: A Closer Look at the w- and the w -Topologies BS V c Gabriel Nagy Banach Spaces V: A Closer Look at the w- and the w -Topologies Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we discuss two important, but highly non-trivial,

More information

SOLUTIONS TO SOME PROBLEMS

SOLUTIONS TO SOME PROBLEMS 23 FUNCTIONAL ANALYSIS Spring 23 SOLUTIONS TO SOME PROBLEMS Warning:These solutions may contain errors!! PREPARED BY SULEYMAN ULUSOY PROBLEM 1. Prove that a necessary and sufficient condition that the

More information

We denote the space of distributions on Ω by D ( Ω) 2.

We denote the space of distributions on Ω by D ( Ω) 2. Sep. 1 0, 008 Distributions Distributions are generalized functions. Some familiarity with the theory of distributions helps understanding of various function spaces which play important roles in the study

More information

Chapter 8 Integral Operators

Chapter 8 Integral Operators Chapter 8 Integral Operators In our development of metrics, norms, inner products, and operator theory in Chapters 1 7 we only tangentially considered topics that involved the use of Lebesgue measure,

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Analysis Comprehensive Exam Questions Fall 2008

Analysis Comprehensive Exam Questions Fall 2008 Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Functional Analysis Exercise Class

Functional Analysis Exercise Class Functional Analysis Exercise Class Week 9 November 13 November Deadline to hand in the homeworks: your exercise class on week 16 November 20 November Exercises (1) Show that if T B(X, Y ) and S B(Y, Z)

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Bernstein s inequality and Nikolsky s inequality for R d

Bernstein s inequality and Nikolsky s inequality for R d Bernstein s inequality and Nikolsky s inequality for d Jordan Bell jordan.bell@gmail.com Department of athematics University of Toronto February 6 25 Complex Borel measures and the Fourier transform Let

More information

2. Function spaces and approximation

2. Function spaces and approximation 2.1 2. Function spaces and approximation 2.1. The space of test functions. Notation and prerequisites are collected in Appendix A. Let Ω be an open subset of R n. The space C0 (Ω), consisting of the C

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

Measure and integration

Measure and integration Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.

More information

Infinite-Dimensional Triangularization

Infinite-Dimensional Triangularization Infinite-Dimensional Triangularization Zachary Mesyan March 11, 2018 Abstract The goal of this paper is to generalize the theory of triangularizing matrices to linear transformations of an arbitrary vector

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Introduction to Functional Analysis

Introduction to Functional Analysis Introduction to Functional Analysis Carnegie Mellon University, 21-640, Spring 2014 Acknowledgements These notes are based on the lecture course given by Irene Fonseca but may differ from the exact lecture

More information

MAT 449 : Problem Set 7

MAT 449 : Problem Set 7 MAT 449 : Problem Set 7 Due Thursday, November 8 Let be a topological group and (π, V ) be a unitary representation of. A matrix coefficient of π is a function C of the form x π(x)(v), w, with v, w V.

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

x n x or x = T -limx n, if every open neighbourhood U of x contains x n for all but finitely many values of n

x n x or x = T -limx n, if every open neighbourhood U of x contains x n for all but finitely many values of n Note: This document was written by Prof. Richard Haydon, a previous lecturer for this course. 0. Preliminaries This preliminary chapter is an attempt to bring together important results from earlier courses

More information

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

Lebesgue-Radon-Nikodym Theorem

Lebesgue-Radon-Nikodym Theorem Lebesgue-Radon-Nikodym Theorem Matt Rosenzweig 1 Lebesgue-Radon-Nikodym Theorem In what follows, (, A) will denote a measurable space. We begin with a review of signed measures. 1.1 Signed Measures Definition

More information

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32 Functional Analysis Martin Brokate Contents 1 Normed Spaces 2 2 Hilbert Spaces 2 3 The Principle of Uniform Boundedness 32 4 Extension, Reflexivity, Separation 37 5 Compact subsets of C and L p 46 6 Weak

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

I teach myself... Hilbert spaces

I teach myself... Hilbert spaces I teach myself... Hilbert spaces by F.J.Sayas, for MATH 806 November 4, 2015 This document will be growing with the semester. Every in red is for you to justify. Even if we start with the basic definition

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information