Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Similar documents
Kernel Method: Data Analysis with Positive Definite Kernels

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

I teach myself... Hilbert spaces

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

CHAPTER VIII HILBERT SPACES

1 Inner Product Space

Hilbert Spaces. Contents

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Introduction to Signal Spaces

OPERATOR THEORY ON HILBERT SPACE. Class notes. John Petrovic

Your first day at work MATH 806 (Fall 2015)

Spectral Theory, with an Introduction to Operator Means. William L. Green

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

Recall that any inner product space V has an associated norm defined by

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

REAL AND COMPLEX ANALYSIS

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Analysis Preliminary Exam Workshop: Hilbert Spaces

C.6 Adjoints for Operators on Hilbert Spaces

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Chapter 8 Integral Operators

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Reproducing Kernel Hilbert Spaces

CHAPTER II HILBERT SPACES

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Elementary linear algebra

Your first day at work MATH 806 (Fall 2015)

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Infinite-dimensional Vector Spaces and Sequences

Inner Product Spaces An inner product on a complex linear space X is a function x y from X X C such that. (1) (2) (3) x x > 0 for x 0.

Reproducing Kernel Hilbert Spaces

Functional Analysis Exercise Class

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

Normed Vector Spaces and Double Duals

1 Functional Analysis

FUNCTIONAL ANALYSIS HAHN-BANACH THEOREM. F (m 2 ) + α m 2 + x 0

Reproducing Kernel Hilbert Spaces

October 25, 2013 INNER PRODUCT SPACES

FUNCTIONAL ANALYSIS-NORMED SPACE

Real Analysis Notes. Thomas Goller

Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India

An introduction to some aspects of functional analysis

MTH 503: Functional Analysis

Chapter 7: Bounded Operators in Hilbert Spaces

Problem Set 6: Solutions Math 201A: Fall a n x n,

A Concise Course on Stochastic Partial Differential Equations

Functional Analysis Exercise Class

AN INTRODUCTION TO THE THEORY OF REPRODUCING KERNEL HILBERT SPACES

Compact operators on Banach spaces

Spectral theory for compact operators on Banach spaces

Functional Analysis. AMAT 617 University of Calgary Winter semester Lecturer: Gilad Gour Notes taken by Mark Girard.

MAT 449 : Problem Set 7

THE STONE-WEIERSTRASS THEOREM AND ITS APPLICATIONS TO L 2 SPACES

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

Functional Analysis F3/F4/NVP (2005) Homework assignment 2

Reproducing Kernel Hilbert Spaces

1 Exponential manifold by reproducing kernel Hilbert spaces

CIS 520: Machine Learning Oct 09, Kernel Methods

THEOREMS, ETC., FOR MATH 515

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.

Exercise Solutions to Functional Analysis

An Introduction to Kernel Methods 1

Functional Analysis Exercise Class

Functional Analysis Review

Chapter 5. Basics of Euclidean Geometry

Kernel Methods. Machine Learning A W VO

MA5206 Homework 4. Group 4. April 26, ϕ 1 = 1, ϕ n (x) = 1 n 2 ϕ 1(n 2 x). = 1 and h n C 0. For any ξ ( 1 n, 2 n 2 ), n 3, h n (t) ξ t dt

FUNCTIONAL ANALYSIS LECTURE NOTES: ADJOINTS IN HILBERT SPACES

RIESZ BASES AND UNCONDITIONAL BASES

CHAPTER V DUAL SPACES

THE PROBLEMS FOR THE SECOND TEST FOR BRIEF SOLUTIONS

MATH 5640: Fourier Series

x 3y 2z = 6 1.2) 2x 4y 3z = 8 3x + 6y + 8z = 5 x + 3y 2z + 5t = 4 1.5) 2x + 8y z + 9t = 9 3x + 5y 12z + 17t = 7

Inequalities in Hilbert Spaces

The Dual of a Hilbert Space

Functional Analysis Exercise Class

On Riesz-Fischer sequences and lower frame bounds

Frame expansions in separable Banach spaces

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Real and Complex Analysis, 3rd Edition, W.Rudin Elementary Hilbert Space Theory

Vectors in Function Spaces

Typical Problem: Compute.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

LECTURE 7. k=1 (, v k)u k. Moreover r

In Chapter 14 there have been introduced the important concepts such as. 3) Compactness, convergence of a sequence of elements and Cauchy sequences,

The weak topology of locally convex spaces and the weak-* topology of their duals

1 Continuity Classes C m (Ω)

Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference

Transcription:

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced Studies April 25, 2008 / Statistical Learning Theory II

Outline Positive definite kernel 1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 2 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel 1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 3 / 48

Definition of positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel Definition. Let X be a set. k : X X R is a positive definite kernel if k(x, y) = k(y, x) and for every x 1,..., x n X and c 1,..., c n R n c i c j k(x i, x j ) 0, i,j=1 i.e. the symmetric matrix k(x 1, x 1 ) k(x 1, x n ) (k(x i, x j )) n i,j=1 =..... k(x n, x 1 ) k(x n, x ) is positive semidefinite. The symmetric matrix (k(x i, x j )) n i,j=1 is often called a Gram matrix. 4 / 48

Definition: complex-valued case Definition and properties of positive definite kernel Examples of positive definite kernel Definition. Let X be a set. k : X X C is a positive definite kernel if for every x 1,..., x n X and c 1,..., c n C n c i c j k(x i, x j ) 0. i,j=1 Remark. The Hermitian property k(y, x) = k(x, y) is derived from the positive-definiteness. [Exercise] 5 / 48

Remarks on the terminology Definition and properties of positive definite kernel Examples of positive definite kernel In the matrix theory, an symmetric matrix A = (A ij ) is said to be positive definite if for any c 1,..., c n n i,j=1 c ic j A ij > 0, and A is positive semidefinite (nonnegative definite) if for any c 1,..., c n n i,j=1 c ic j A ij 0. The definition of "positive definite" kernel requires only the positive semidefiniteness (non-negative definiteness) of the Gram matrix. This unmatched terminology is caused for the historical reason. A symmetric kernel k : X X R is called strictly positive definite if for any different x 1,..., x n X and c 1,..., c n R with at least one c i non-zero, n i,j=1 c ic j k(x i, x j ) > 0, that is, the Gram matrix (k(x i, x j )) n i,j=1 is positive definite. 6 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel Basic Properties of positive definite kernels Fact. Assume k : X X C is positive definite. Then, for any x, y in X, 1 k(x, x) 0. 2 k(x, y) 2 k(x, x)k(y, y). Proof. (1) is obvious. For (2), with the fact k(y, x) = k(x, y), the definition of positive definiteness implies that the eigenvalues of the hermitian matrix ( k(x, x) ) k(x, y) k(x, y) k(y, y) is non-negative, thus, its determinant k(x, x)k(y, y) k(x, y) 2 is non-negative. 7 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel Operations that Preserve Positive Definiteness I Proposition 1 If k i : X X C (i = 1, 2,...) are positive definite kernels, then so are the following: 1 (positive combination) ak 1 + bk 2 (a, b 0). 2 (product) k 1 k 2 (k 1 (x, y)k 2 (x, y)). 3 (limit) lim i k i (x, y), assuming the limit exists. Remark. Proposition 1 says that the set of all positive definite kernels is closed (w.r.t. pointwise convergence) convex cone stable under multiplication. Proof. (1): Obvious. (3): Just notice that the non-negativity in the definition holds also for the limit. 8 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel Operations that Preserve Positive Definiteness II (2): It suffices to show that two Hermitian matrices A and B are positive semidefinite, so is their component-wise product. This is done by the following lemma. Definition. For two matrices A and B of the same size, the matrix C with C ij = A ij B ij is called the Hadamard product of A and B. The Hadamard product of A and B is denoted by A B. Lemma 2 Let A and B be non-negative Hermitian matrices of the same size. Then, A B is also non-negative. 9 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel Operations that Preserve Positive Definiteness III Proof. Let A = UΛU be the eigendecomposition of A, where U = (u 1,..., u p ): a unitary matrix Λ: diagonal matrix with non-negative entries (λ 1,..., λ p ) U = U T. Then, for arbitrary c 1,..., c p C, c i c j (A B) ij = i,j=1 p λ a c i c j u a i ū a j B ij = a=1 where ξ a = (c 1 u a 1,..., c p u a p) T C p. p λ a ξ at Bξ a, a=1 Since ξ at Bξ a and λ a are non-negative for each a, so is the sum. 10 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel Basic construction of positive definite kernels I Proposition 3 Let V be an vector space with an inner product,. If we have a map Φ : X V, x Φ(x), a positive definite kernel on X is defined by k(x, y) = Φ(x), Φ(y). Proof. Let x 1,..., x n in X and c 1,..., c n C. n i,j=1 c ic j k(x i, x j ) = n i,j=1 c ic j Φ(x i ), Φ(x j ) n = i=1 c iφ(x i ), n j=1 c jφ(x j ) = n i=1 c iφ(x i ) 2 0. 11 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel Basic construction of positive definite kernels II Proposition 4 Let k : X X C be a positive definite kernel and f : X C be an arbitrary function. Then, is positive definite. In particular, is a positive definite kernel. Proof is left as an exercise. k(x, y) = f(x)k(x, y)f(y) f(x)f(y) 12 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel 1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 13 / 48

Examples Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel Real valued positive definite kernels on R n : - Linear kernel - Exponential k 0 (x, y) = x T y k E (x, y) = exp(βx T y) (β > 0) - Gaussian RBF (radial basis function) kernel ( k G (x, y) = exp 1 2σ 2 x y 2) (σ > 0) - Laplacian kernel ( k L (x, y) = exp α ) n i=1 x i y i - Polynomial kernel (α > 0) k P (x, y) = (x T y + c) d (c 0, d N) 14 / 48

Definition and properties of positive definite kernel Examples of positive definite kernel Proof. Linear kernel: Proposition 3 Exponential: exp(βx T y) = 1 + βx T y + β2 2! (xt y) 2 + β3 3! (xt y) 3 + Use Proposition 1. Gaussian RBF kernel: ( exp 1 ) ( 2σ 2 x y 2) = exp ( x 2 x T y ) ) 2σ 2 exp σ 2 exp ( y 2 2σ 2. Apply Proposition 4. Laplacian kernel: The proof is shown later. Polynomial kernel: Just sum and product. 15 / 48

Definition of Hilbert space Basic properties of Hilbert space Completion 1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 16 / 48

Vector space with inner product Definition of Hilbert space Basic properties of Hilbert space Completion Definition. Let V be a vector space over a field K = R or C. V is called an inner product space if it has an inner product (or scalar product, dot product) (, ) : V V K such that the following rules hold for every x, y, z V and α K; 1 (Strong positivity) (x, x) 0, and (x, x) = 0 if and only if x = 0, 2 (Addition) (x + y, z) = (x, z) + (y, z), 3 (Scalar multiplication) (αx, y) = α(x, y), 4 (Hermitian) (y, x) = (x, y). If (V, (, )) is an inner product, the norm of x V is defined by x = (x, x) 1/2, and the metric is induced by Cauchy-Schwarz inequality d(x, y) = x y. (x, y) x y. Remark: Cauchy-Schwarz inequality holds without requiring x = 0 x = 0. 17 / 48

Hilbert space I Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Definition. A vector space with inner product (H, (, )) is called Hilbert space if the induced metric is complete, i.e. every Cauchy sequence converges to an element in H. Remark 1: Let (X, d) be a metric space. A sequence {x n } n=1 in X is called Cauchy sequence if d(x n, x m ) 0 for n, m. Remark 2: A Hilbert space may be either finite or infinite dimensional. Example 1. R n and C n are finite dimensional Hilbert space with the ordinary inner product (x, y) R n = n i=1 x iy i or (x, y) C n = n i=1 x iy i. 18 / 48

Hilbert space II Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Example 2. L 2 (Ω, µ). Let (Ω, B, µ) is a measure space. { L = f : Ω C } f 2 dµ <. The inner product on L is define by (f, g) = fgdµ. L 2 (Ω, µ) is defined by the equivalent classes identifying f and g if their values differ only on a measure-zero set. - L 2 (Ω, µ) is complete. [See e.g. [Rud86] for the proof.] - L 2 (R n, dx) is infinite dimensional. 19 / 48

Definition of Hilbert space Basic properties of Hilbert space Completion 1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 20 / 48

Orthogonality Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Orthogonal complement. Let H be a Hilbert space and V be a closed subspace. V := {x H (x, y) = 0 for all y V } is a closed subspace, and called the orthogonal complement. Orthogonal projection. Let H be a Hilbert space and V be a closed subspace. Every x H can be uniquely decomposed x = y + z, y V and z V, that is, H = V V. 21 / 48

Boundedness I Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Let H 1 and H 2 be Hilbert spaces. A linear transform T : H 1 H 2 is often called operator. Definition. A linear operator H 1 and H 2 is called bounded if sup T x H2 <. x H1 =1 The operator norm of a bounded operator T is defined by T = T x H2 sup T x H2 = sup. x H1 =1 x 0 x H1 Fact. If T : H 1 H 2 is bounded, T x H2 T x H1. 22 / 48

Boundedness II Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Proposition 5 A linear transform is bounded if and only if it is continuous. Proof. Assume T : H 1 H 2 is bounded. Then, means continuity of T. T x T x 0 T x x 0 Assume T is continuous. For any ε > 0, there is δ > 0 such that T x < ε for all x {y H 1 y < 2δ}. Then, sup x =1 T x sup x =δ δ T x δε. 23 / 48

Riesz lemma Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Definition. A linear functional is a linear transform from H to C (or R). The vector space of all the bounded linear functionals called the dual space of H, and is denoted by H. Theorem 6 (Riesz lemma) For each φ H, there is a unique y φ H such that φ(x) = (x, y φ ) ( x H). Proof. If φ(x) = 0 for all x, take y = 0. Otherwise, let V = {x H φ(x) = 0}. Since φ is a bounded linear functional, V is a closed subspace, and not equal to H. Take z V with z = 1, then obviously for any x H, φ(x)z φ(z)x V. The inner product with z shows φ(x)(z, z) φ(z)(x, z) = 0, which gives φ(x) = φ(x)(z, z) = φ(z)(x, z) Thus, y = φ(z)z gives the expression in the theorem. 24 / 48

CONS I Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion ONS and CONS. A subset {u i } i I of H is called an orthonormal system (ONS) if (u i, u j ) = δ ij (δ ij is Kronecker s delta). A subset {u i } i I of H is called a complete orthonormal system (CONS) if it is ONS and if (x, u i ) = 0 ( i I) implies x = 0. Theorem 7 Let {u a } a A be an ONS in a Hilbert space. Then, there is a CONS of H that contains {u a } a A. In particular, every Hilbert space has a CONS. Proof is omitted. (Use Zorn s lemma.) 25 / 48

CONS II Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Theorem 8 Let H be a Hilbert space and {u i } i I be a CONS. Then, for each x H, x = i I (x, u i)u i, (Fourier expansion) and x 2 = i I (x, u i) 2. (Parseval s equality) Proof omitted. The first equality means that R.H.S. converges to x in H independent of order. Example: CONS of L 2 ([0 2π], dx) u n (t) = 1 2π e 1nt (n = 0, 1, 2,...) Then, f(t) = n=0 a nu n (t) is the (ordinary) Fourier expansion of a periodic function. 26 / 48

CONS III Positive definite kernel Definition of Hilbert space Basic properties of Hilbert space Completion Definition. A metric space is called separable if it has a countable dense subset. Theorem 9 A Hilbert space is separable if and only if it has a countable CONS. Sketch of proof. For only if part, apply Gram-Schmidt procedure. For the other direction, use the Fourier expansion with rational coefficients. Assumption In this course, a Hilbert space is assumed to be separable unless otherwise stated. 27 / 48

Definition of Hilbert space Basic properties of Hilbert space Completion 1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 28 / 48

Definition of Hilbert space Basic properties of Hilbert space Completion Completion of inner product space Theorem 10 Let H 0 be an inner product space. Then, there is a Hilbert space H such that H 0 is isomorphic to a dense subspace of H. H is unique up to isomorphism, and called the completion of H 0. Outline of the proof. 1 X = { {u n } n=1 H 0 {u n } n=1is a Cauchy sequence of H }. 2 Define an equivalence relation on X by {u n } {v n } u n v n 0 (n ). 3 Show X := X/ is an inner product space by defining ([{u n }], [{v n }]) := lim n (u n, v n ). 4 Show that the map J : X X, u [{u, u, u,...}] is isometric, and the image is a dense subspace. 5 Show X is complete. (This part is the most technical.) 29 / 48

1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 30 / 48

Reproducing kernel Hilbert space I Definition. Let X be a set. A reproducing kernel Hilbert space (RKHS) (over X ) is a Hilbert space H consisting of functions on X such that for each x X there is a function k x H with the property f, k x H = f(x) ( f H) (reproducing property). Write k(, x) = k x ( ). The function k is called a reproducing kernel of H. Proposition 11 (RKHS positive definite kernel) A reproducing kernel of a RKHS is a positive definite kernel on X. Proof. n i,j=1 c ic j k(x i, x j ) = n i,j=1 c ic j k(, x i ), k(, x j ) = n i=1 c ik(, x i ), n j=1 c jk(, x j ) 0 31 / 48

Reproducing kernel Hilbert space II Fact. The reproducing kernel on a Hilbert space is unique, if exists. Proof. Suppose k and k are reproducing kernels. Then, k(x, y) = k(, y), k(, x) = k(, x), k(, y) = k(y, x) = k(x, y).. Fact. k(, x) = k(x, x). Proof. k(, x) 2 = k(, x), k(, x) = k(x, x). 32 / 48

Reproducing kernel Hilbert space III Proposition 12 Let H be a Hilbert space consisting of functions on a set X. Then, H is a RKHS if and only if the evaluation map e x : H K, e x (f) = f(x), is a continuous linear functional for each x X. Proof. Assume H is a RKHS. The boundedness of e x is obvious from e x (f) = f, k x k x f. Conversely, assume the evaluation map is continuous. By Riesz lemma, there is k x H such that f, k x = e x (f) = f(x). 33 / 48

Positive definite kernel and RKHS I Theorem 13 (positive definite kernel RKHS) Let k : X X C (or R) be a positive definite kernel on a set X. Then, there uniquely exists a RKHS H k consisting of functions on X such that 1 k(, x) H k for every x X, 2 Span{k(, x) x X } is dense in H k, 3 k is a reproducing kernel on H k, i.e. f, k(, x) H = f(x) ( x X, f H k ). Remark. If we define Φ : X H k, x k(, x), then, Φ(x), Φ(y) = k(, x), k(, y) = k(x, y). RKHS associated with a pos. def. kernel k gives a desired feature space! 34 / 48

Positive definite kernel and RKHS II One-to-one correspondence between positive definite kernels and RKHS. k H k Theorem 13 gives an injective map from the positive definite kernels to RKHS. Conversely, the reproducing kernel of a RKHS is a positive definite kernel (Proposition 11). 35 / 48

Proof of Theorem 13 Proof. (Described in R case.) Construction of an inner product space: H 0 := Span{k(, x) x X }. Define an inner product on H 0 : for f = n i=1 a ik(, x i ) and g = m j=1 b jk(, y j ), f, g := n i=1 m j=1 a ib j k(x i, y j ). This is independent of the way of representing f and g from the expression f, g = m j=1 b jf(y j ) = n i=1 a ig(x i ). 36 / 48

Reproducing property on H 0 : f, k(, x) = n i=1 a ik(x i, x) = f(x). Well-defined as an inner product: It is easy to see, is bilinear form, and by the positive definiteness of f. f 2 = n i,j=1 a ia j k(x i, x j ) 0 If f = 0, from Cauchy-Schwarz inequality, 1 for all x X ; thus f = 0. f(x) = f, k(, x) f k(, x) = 0 1 Note that Cauchy-Schwarz inequality holds without assuming strong positivity of the inner product. 37 / 48

Completion: Let H be the completion of H 0. H 0 is dense in H by the completion. H is realized by functions: Let {f n} be a Cauchy sequence in H. For each x X, {f n(x)} is a Cauchy sequence, because f n(x) f m(x) = f n f m, k(, x) f n f m k(, x). Define f(x) = lim n f n(x). This value is the same for equivalent sequences, because {f n} {g n} implies f n(x) g n(x) = f n g n, k(, x) f n g n k(, x) 0. Thus, any element [{f n}] in H can be regarded as a function f on X. 38 / 48

Continuity of functions in RKHS The functions in a RKHS are "nice" functions under some conditions. Proposition 14 Let k be a positive definite kernel on a topological space X, and H k be the associated RKHS. If Re[k(x, x)] is continuous for every x X, then all the functions in H k are continuous. Proof. Let f be an arbitrary function in H k. f(x) f(y) = f, k(, x) k(, y) f k(, x) k(, y). The assertion is easy from k(, x) k(, y) 2 = k(x, x) + k(y, y) 2Re[k(x, y)]. It is also known ([BTA04]) that if k(x, y) is differentiable, then all the functions in H k are differentiable. 39 / 48

1 Positive definite kernel Definition and properties of positive definite kernel Examples of positive definite kernel 2 Definition of Hilbert space Basic properties of Hilbert space Completion 3 40 / 48

RKHS of polynomial kernel Polynomial kernel on R: k(x, y) = (xy + c) d (c > 0, d N). Fact H k is d + 1 dimensional vector space with a basis {1, x, x 2,..., x d }. Proof. Let G = Span{1, x, x 2,..., x d }. Span{k(, z) z R m } G from k(x, z) = z d x d + d C 1cz d 1 x d 1 + d C 2c 2 z d 2 x d 2 + + d C d 1 c d 1 zx+c d. Any polynomial of degree d belongs to H k, because for any (a 0,..., a d ) the linear equation z d 0 z 0 1 b 0 a 0/c d z1 d z 1 1 b 1......... = a 1/c d 1 dc d 1.. zd d z d 1 b d a d is solvable. Then, d i=0 bik(x, zi) = d i=0 aixi. 41 / 48

RKHS as a Hilbertian subspace X : set. C X : all functions on X with the pointwise-convergence topology 2. G = L 2 (T, µ), where (T, B, µ) is a measure space. Suppose H( ; x) L 2 (T, µ) for all x X. Construct a continuous embedding j : L 2 (T, µ) C X, F f(x) = F (t)h(t; x)dµ(t) = (F, H( ; x)) G. Assume Span{H(t; x) x X } is dense in L 2 (T, µ). Then, j is injective. 2 f n f f n(x) f(x) for every x. 42 / 48

RKHS as a Hilbertian subspace II Define H := Imj. Define an inner product on H by f, g H := (F, G) G where f = j(f ), g = j(g). We have j : L 2 (T, µ) = H (isomorphic) as Hilbert spaces, and H = { f C X F L 2 (T, µ), f(x) = } F (t)h(t; x)dµ(t). Proposition 15 H is a RKHS, and its reproducing kernel is k(x, y) = j(h( ; x)), j(h( ; y)) H = H(t; x)h(t; y)dµ(t). Proof. f(x) = (F, H(, x)) G = f, j(h(, x)) H. 43 / 48

by Fourier transform Special case given by Fourier transform. X = T = R. G = L 2 (R, ρ(t)dt). ρ(t): continuous, ρ(t) > 0, ρ(t)dt <. H(t; x) = e 1xt. Note: Span{H(t; x) x X } is dense L 2 (R, ρ(t)dt). - Fact. H = { f L 2 2 ˆf(t) } (R, dx) ρ(t) dt <. f, g H = ˆf(t)ĝ(t) dt. ρ(t) k(x, y) = e 1(x y)t ρ(t)dt. 3 3 We can directly confirm this a positive definite kernel. 44 / 48

by Fourier transform II Proof. Let f = j(f ). By definition, f(x) = F (t)e 1tx ρ(t)dt. (Fourier transform) Since F (t)ρ(t) L 1 (R, dt) L 2 (R, dt) 4, the Fourier isometry of L 2 (R, dt) tells f(x) L 2 (R, dx) and ˆf(t) = 1 2π f(x)e 1xt dx = F (t)ρ(t). Thus, F (t) = ˆf(t) ρ(t). By the definition of the inner product, for f = j(f ) and g = j(g), In addition, f, g H = (F, G) G = ˆf(t) ĝ(t) ρ(t) ρ(t) ρ(t)dt = ˆf(t)ĝ(t) ρ(t) dt. F L 2 (R, ρ(t)dt) ˆf(t) ρ(t) L2 (R, ρ(t)dt) ˆf(t) 2 ρ(t) dt <. 4 Because ρ(t) is bounded, F L 2 (R, ρ(t)dt) means F (t) 2 ρ(t) 2 L 1 (R, dt) 45 / 48

by Fourier transform III Examples. Gaussian RBF kernel: k(x, y) = exp { 1 2σ 2 x y 2}. Let ρ(t) = 1 σ2 exp{ 2π 2 t2 }, i.e. G = L 2 1 σ 2 (R, 2π e 2 t2 dt). Reproducing kernel = Gaussian RBF kernel: k(x, y) = 1 e 1(x y)t e σ2 2 t2 dt = 1 ( 2π σ exp 1 (x y)2) 2σ2 H = { f L 2 (R, dx) f, g = ˆf(t) 2 exp ( σ 2 } 2 t2) dt <. ˆf(t)ĝ(t) exp ( σ 2 2 t2) dt 46 / 48

by Fourier transform IV Laplacian kernel: k(x, y) = exp { β x y }. Let ρ(t) = 1 2π i.e. 1 t 2 +β 2, G = L 2 (R, dt 2π(t 2 + β 2 ) ). Reproducing kernel = Laplacian kernel: k(x, y) = 1 e 1(x y)t 1 2π t 2 + β dt = 1 ( ) 2 2β exp β x y 1 2π(t 2 +1).] [Note: the Fourier image of exp( x y ) is { H = f L 2 (R, dx) ˆf(t) } 2 (t 2 + β 2 )dt <. f, g = ˆf(t)ĝ(t)(t 2 + β 2 )dt 47 / 48

Summary of Chapter 1 and 2 We would like to use a feature vector Φ : X H to incorporate higher order moments. The inner product in the feature space must be computed efficiently. Ideally, Φ(x), Φ(y) = k(x, y). To satisfy the above relation, the kernel k must be positive definite. A positive definite kernel k defines an associated RKHS, where k is the reproducing kernel; k(, x), k(, y) = k(x, y). Use the RKHS as a feature space, and Φ : x k(, x) as the feature map. 48 / 48

References A good reference on Hilbert (and Banach) space is [Rud86]. A more advanced one on functional analysis is [RS80] among many others. For reproducing kernel Hilbert spaces, the original paper is [Aro50]. Statistical aspects are discussed in [BTA04]. Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 69(3):337 404, 1950. Alain Berlinet and Christine Thomas-Agnan. in probability and statistics. Kluwer Academic Publisher, 2004. Michael Reed and Barry Simon. Functional Analysis. Academic Press, 1980. Walter Rudin. Real and Complex Analysis (3rd ed.). McGraw-Hill, 1986. 49 / 48