Kernel Method: Data Analysis with Positive Definite Kernels

Similar documents
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Hilbert Spaces. Contents

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

I teach myself... Hilbert spaces

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Functional Analysis Review

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

OPERATOR THEORY ON HILBERT SPACE. Class notes. John Petrovic

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Your first day at work MATH 806 (Fall 2015)

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Analysis Preliminary Exam Workshop: Hilbert Spaces

CHAPTER VIII HILBERT SPACES

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

CHAPTER II HILBERT SPACES

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.

Recall that any inner product space V has an associated norm defined by

Introduction to Signal Spaces

Spectral Theory, with an Introduction to Operator Means. William L. Green

CIS 520: Machine Learning Oct 09, Kernel Methods

Your first day at work MATH 806 (Fall 2015)

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

Inner Product Spaces An inner product on a complex linear space X is a function x y from X X C such that. (1) (2) (3) x x > 0 for x 0.

AN INTRODUCTION TO THE THEORY OF REPRODUCING KERNEL HILBERT SPACES

October 25, 2013 INNER PRODUCT SPACES

1 Exponential manifold by reproducing kernel Hilbert spaces

1 Inner Product Space

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Chapter 8 Integral Operators

Elementary linear algebra

Infinite-dimensional Vector Spaces and Sequences

C.6 Adjoints for Operators on Hilbert Spaces

1 Functional Analysis

THEOREMS, ETC., FOR MATH 515

Functional Analysis Review

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

FUNCTIONAL ANALYSIS LECTURE NOTES: ADJOINTS IN HILBERT SPACES

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

Real Analysis Notes. Thomas Goller

Inequalities in Hilbert Spaces

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

Basic Calculus Review

Wiener Measure and Brownian Motion

10-701/ Recitation : Kernels

Reproducing Kernel Hilbert Spaces

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Review and problem list for Applied Math I

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Functional Analysis Exercise Class

Kernel Methods. Machine Learning A W VO

4 Linear operators and linear functionals

REAL AND COMPLEX ANALYSIS

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

An Introduction to Kernel Methods 1

Quantum Computing Lecture 2. Review of Linear Algebra

Spectral theory for compact operators on Banach spaces

Kernels A Machine Learning Overview

MAT 449 : Problem Set 7

Mathematics Department Stanford University Math 61CM/DM Inner products

The following definition is fundamental.

LECTURE 7. k=1 (, v k)u k. Moreover r

Chapter 7: Bounded Operators in Hilbert Spaces

Vectors in Function Spaces

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

Functional Analysis II held by Prof. Dr. Moritz Weber in summer 18

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Chapter 5. Basics of Euclidean Geometry

Reproducing Kernel Hilbert Spaces

Support Vector Machines

Mathematical foundations - linear algebra

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Linear Algebra. Session 12

THE PROBLEMS FOR THE SECOND TEST FOR BRIEF SOLUTIONS

Fourier and Wavelet Signal Processing

4th Preparation Sheet - Solutions

An introduction to some aspects of functional analysis

1 Math 241A-B Homework Problem List for F2015 and W2016

Exercise Solutions to Functional Analysis

Functional Analysis F3/F4/NVP (2005) Homework assignment 2

MTH 503: Functional Analysis

Statistical Convergence of Kernel CCA

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

On Riesz-Fischer sequences and lower frame bounds

A Brief Introduction to Functional Analysis

Reproducing Kernel Hilbert Spaces

MATH 590: Meshfree Methods

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

Reproducing Kernel Hilbert Spaces

Transcription:

Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University of Advanced Studies / Tokyo Institute of Technology Nov. 17-26, 2010 Intensive Course at Tokyo Institute of Technology

2 / 24 Outline Positive definite kernel Definition and examples of positive definite kernel Reproducing kernel Hilbert space RKHS and positive definite kernel Some basic properties Properties of positive definite kernels Properties of RKHS

3 / 24 Positive definite kernel Definition and examples of positive definite kernel Reproducing kernel Hilbert space RKHS and positive definite kernel Some basic properties Properties of positive definite kernels Properties of RKHS

4 / 24 Definition of Positive Definite Kernel Definition. Let X be a set. k : X X R is a positive definite kernel if k(x, y) = k(y, x) and for every x 1,..., x n X and c 1,..., c n R n c i c j k(x i, x j ) 0, i,j=1 i.e. the symmetric matrix k(x 1, x 1 ) k(x 1, x n ) (k(x i, x j )) n i,j=1 =..... k(x n, x 1 ) k(x n, x ) is positive semidefinite. The symmetric matrix (k(x i, x j )) n i,j=1 is often called a Gram matrix.

5 / 24 Definition: Complex-valued Case Definition. Let X be a set. k : X X C is a positive definite kernel if for every x 1,..., x n X and c 1,..., c n C n c i c j k(x i, x j ) 0. i,j=1 Remark. The Hermitian property k(y, x) = k(x, y) is derived from the positive-definiteness. [Exercise]

6 / 24 Some Basic Properties Facts. Assume k : X X C is positive definite. Then, for any x, y in X, 1. k(x, x) 0. 2. k(x, y) 2 k(x, x)k(y, y). Proof. (1) is obvious. For (2), with the fact k(y, x) = k(x, y), the definition of positive definiteness implies that the eigenvalues of the hermitian matrix ( k(x, x) ) k(x, y) k(x, y) k(y, y) is non-negative, thus, its determinant k(x, x)k(y, y) k(x, y) 2 is non-negative.

Examples Real valued positive definite kernels on R n : - Linear kernel 1 k 0 (x, y) = x T y - Exponential k E (x, y) = exp(βx T y) (β > 0) - Gaussian RBF (radial basis function) kernel ( k G (x, y) = exp 1 2σ 2 x y 2) (σ > 0) - Laplacian kernel ( k L (x, y) = exp α ) n i=1 x i y i (α > 0) - Polynomial kernel k P (x, y) = (x T y + c) d (c 0, d N) 1 [Exercise] prove that the linear kernel is positive definite. 7 / 24

8 / 24 Positive definite kernel Definition and examples of positive definite kernel Reproducing kernel Hilbert space RKHS and positive definite kernel Some basic properties Properties of positive definite kernels Properties of RKHS

9 / 24 Reproducing kernel Hilbert space Definition. Let X be a set. A reproducing kernel Hilbert space (RKHS) (over X ) is a Hilbert space H consisting of functions on X such that for each x X there is a function k x H with the property f, k x H = f(x) ( f H) (reproducing property). k(, x) := k x ( ) is called a reproducing kernel of H. Fact 1. A reproducing kernel is Hermitian (or symmetric). Proof. k(y, x) = k(, x), k y = k x, k y = k y, k x = k(, y), k x = k(x, y). Fact 2. The reproducing kernel is unique, if exists. [Exercise]

10 / 24 Positive Definite Kernel and RKHS I Proposition 1 (RKHS positive definite kernel) The reproducing kernel of a RKHS is positive definite. Proof. n c i c j k(x i, x j ) = i,j=1 n c i c j k(, x i ), k(, x j ) i,j=1 n = c i k(, x i ), i=1 n j=1 c j k(, x j ) = n c i k(, x i ) 2 0. i=1

11 / 24 Positive Definite Kernel and RKHS II Theorem 2 (positive definite kernel RKHS. Moore-Aronszajn) Let k : X X C (or R) be a positive definite kernel on a set X. Then, there uniquely exists a RKHS H k on X such that 1. k(, x) H k for every x X, 2. Span{k(, x) x X } is dense in H k, 3. k is the reproducing kernel on H k, i.e., f, k(, x) H = f(x) ( x X, f H k ). Proof omitted.

12 / 24 Positive Definite Kernel and RKHS III One-to-one correspondence between positive definite kernels and RKHS. k H k Proposition 1: RKHS positive definite kernel k. Theorem 2: k H k (injective).

13 / 24 RKHS as Feature Space If we define Φ : X H k, x k(, x), then, Φ(x), Φ(y) = k(, x), k(, y) = k(x, y). RKHS associated with a positive definite kernel k gives a desired feature space!! In kernel methods, the above feature map and feature space are always used.

14 / 24 Positive definite kernel Definition and examples of positive definite kernel Reproducing kernel Hilbert space RKHS and positive definite kernel Some basic properties Properties of positive definite kernels Properties of RKHS

15 / 24 Operations that Preserve Positive Definiteness I Proposition 3 If k i : X X C (i = 1, 2,...) are positive definite kernels, then so are the following: 1. (positive combination) ak 1 + bk 2 (a, b 0). 2. (product) k 1 k 2 (k 1 (x, y)k 2 (x, y)). 3. (limit) lim i k i (x, y), assuming the limit exists. Remark. From Proposition 3, the set of all positive definite kernels is a closed (w.r.t. pointwise convergence) convex cone stable under multiplication.

Operations that Preserve Positive Definiteness II Proof. (1): Obvious. (3): The non-negativity in the definition holds also for the limit. (2): It suffices to show that two Hermitian matrices A and B are positive semidefinite, so is their component-wise product. This is done by the following lemma. Definition. For two matrices A and B of the same size, the matrix C with C ij = A ij B ij is called the Hadamard product of A and B. The Hadamard product of A and B is denoted by A B. Lemma 4 Let A and B be non-negative Hermitian matrices of the same size. Then, A B is also non-negative. 16 / 24

17 / 24 Operations that Preserve Positive Definiteness III Proof. Let A = UΛU be the eigendecomposition of A, where U = (u 1,..., u p ): a unitary matrix, i.e., U = U T Λ: diagonal matrix with non-negative entries (λ 1,..., λ p ). Then, for arbitrary c 1,..., c p C, p p c i c j (A B) ij = λ a c i c j u a i ū a j B ij = λ a ξ at Bξ a, i,j=1 a=1 a=1 where ξ a = (c 1 u a 1,..., c p u a p) T C p. Since ξ at Bξ a and λ a are non-negative for each a, so is the sum.

18 / 24 Feature Map must be Positive Definite Proposition 5 Let V be an vector space with an inner product,. If we have a map Φ : X V, x Φ(x), a positive definite kernel on X is defined by k(x, y) = Φ(x), Φ(y). Proof. Let x 1,..., x n in X and c 1,..., c n C. n i,j=1 c ic j k(x i, x j ) = n i,j=1 c ic j Φ(x i ), Φ(x j ) n = i=1 c iφ(x i ), n j=1 c jφ(x j ) = n i=1 c iφ(x i ) 2 0.

Proposition 6 Modification Let k : X X C be a positive definite kernel and f : X C be an arbitrary function. Then, is positive definite. In particular, and k(x, y) = f(x)k(x, y)f(y) f(x)f(y) k(x, y) k(x, x) k(y, y) (normalized kernel) are positive definite. Proof is left as an exercise. 19 / 24

20 / 24 Proofs for Positive Definiteness of Examples Linear kernel: Proposition 5 Exponential: exp(βx T y) = 1 + βx T y + β2 2! (xt y) 2 + β3 3! (xt y) 3 + Use Proposition 3. Gaussian RBF kernel: ( exp 1 2σ 2 x y 2) = exp ( x 2 ) ( x T y ) 2σ 2 exp σ 2 exp ( y 2 ) 2σ 2. Apply Proposition 6. Laplacian kernel: The proof is shown later. Polynomial kernel: Just sum and product.

21 / 24 Positive definite kernel Definition and examples of positive definite kernel Reproducing kernel Hilbert space RKHS and positive definite kernel Some basic properties Properties of positive definite kernels Properties of RKHS

Proposition 7 Definition by Evaluation Map Let H be a Hilbert space consisting of functions on a set X. Then, H is a RKHS if and only if the evaluation map e x : H K, e x (f) = f(x), is a continuous linear functional for each x X. Proof. Assume H is a RKHS. The boundedness of e x is obvious from e x (f) = f, k x k x f. Conversely, assume the evaluation map is continuous. By Riesz lemma, there is k x H such that f, k x = e x (f) = f(x), which means H is a RKHS having k x as a reproducing kernel. 22 / 24

Continuity The functions in a RKHS are "nice" functions under some conditions. Proposition 8 Let k be a positive definite kernel on a topological space X, and H k be the associated RKHS. If Re[k(y, x)] is continuous for every x, y X, then all the functions in H k are continuous. Proof. Let f be an arbitrary function in H k. f(x) f(y) = f, k(, x) k(, y) f k(, x) k(, y). The assertion is easy from k(, x) k(, y) 2 = k(x, x) + k(y, y) 2Re[k(x, y)]. Remark. If k(x, y) is differentiable, then all the functions in H k are differentiable. c.f. L 2 space contains non-continuous functions. 23 / 24

24 / 24 Summary of Sections 1 and 2 We would like to use a feature map Φ : X H to incorporate nonlinearity or high order moments. The inner product in the feature space must be computed efficiently. Ideally, Φ(x), Φ(y) = k(x, y). To satisfy the above relation, the kernel k must be positive definite. A positive definite kernel k defines an associated RKHS, where k is the reproducing kernel; k(, x), k(, y) = k(x, y). Use a RKHS as a feature space, and Φ : x k(, x) as the feature map.

25 / 24 Appendix: Quick introduction to Hilbert spaces Definition of Hilbert space Basic properties of Hilbert space Appendix: Proofs Proof of Theorem 2

26 / 24 Vector space with inner product I Definition. V : vector space over a field K = R or C. V is called an inner product space if it has an inner product (or scalar product, dot product) (, ) : V V K such that for every x, y, z V 1. (Strong positivity) (x, x) 0, and (x, x) = 0 if and only if x = 0, 2. (Addition) (x + y, z) = (x, z) + (y, z), 3. (Scalar multiplication) (αx, y) = α(x, y) ( α K), 4. (Hermitian) (y, x) = (x, y).

Vector space with inner product II (V, (, )): inner product space. Norm of x V : x = (x, x) 1/2. Metric between x and y: d(x, y) = x y. Theorem 9 Cauchy-Schwarz inequality (x, y) x y. Remark: Cauchy-Schwarz inequality holds without requiring x = 0 x = 0. 27 / 24

Hilbert space I Definition. A vector space with inner product (H, (, )) is called Hilbert space if the induced metric is complete, i.e. every Cauchy sequence 2 converges to an element in H. Remark 1: A Hilbert space may be either finite or infinite dimensional. Example 1. R n and C n are finite dimensional Hilbert space with the ordinary inner product (x, y) R n = n i=1 x iy i or (x, y) C n = n i=1 x iy i. 2 A sequence {x n} n=1 in a metric space (X, d) is called a Cauchy sequence if d(x n, x m) 0 for n, m. 28 / 24

29 / 24 Hilbert space II Example 2. L 2 (Ω, µ). Let (Ω, B, µ) is a measure space. { L = f : Ω C } f 2 dµ <. The inner product on L is define by (f, g) = fgdµ. L 2 (Ω, µ) is defined by the equivalent classes identifying f and g if their values differ only on a measure-zero set. - L 2 (Ω, µ) is complete. - L 2 (R n, dx) is infinite dimensional.

30 / 24 Appendix: Quick introduction to Hilbert spaces Definition of Hilbert space Basic properties of Hilbert space Appendix: Proofs Proof of Theorem 2

31 / 24 Orthogonal complement. Orthogonality Let H be a Hilbert space and V be a closed subspace. V := {x H (x, y) = 0 for all y V } is a closed subspace, and called the orthogonal complement. Orthogonal projection. Let H be a Hilbert space and V be a closed subspace. Every x H can be uniquely decomposed x = y + z, y V and z V, that is, H = V V.

32 / 24 Complete orthonormal system I ONS and CONS. A subset {u i } i I of H is called an orthonormal system (ONS) if (u i, u j ) = δ ij (δ ij is Kronecker s delta). A subset {u i } i I of H is called a complete orthonormal system (CONS) if it is ONS and if (x, u i ) = 0 ( i I) implies x = 0. Fact: Any ONS in a Hilbert space can be extended to a CONS.

33 / 24 Complete orthonormal system II Separability A Hilbert space is separable if it has a countable CONS. Assumption In this course, a Hilbert space is always assumed to be separable.

Complete orthonormal system III Theorem 10 (Fourier series expansion) Let {u i } i=1 be a CONS of a separable Hilbert space. For each x H, Proof omitted. x = i=1 (x, u i)u i, (Fourier expansion) x 2 = i=1 (x, u i) 2. (Parseval s equality) Example: CONS of L 2 ([0 2π], dx) u n (t) = 1 2π e 1nt (n = 0, 1, 2,...) Then, f(t) = n=0 a nu n (t) is the (ordinary) Fourier expansion of a periodic function. 34 / 24

35 / 24 Bounded operator I Let H 1 and H 2 be Hilbert spaces. A linear transform T : H 1 H 2 is often called operator. Definition. A linear operator H 1 and H 2 is called bounded if sup T x H2 <. x H1 =1 The operator norm of a bounded operator T is defined by T = T x H2 sup T x H2 = sup. x H1 =1 x 0 x H1 (Corresponds to the largest singular value of a matrix.) Fact. If T : H 1 H 2 is bounded, T x H2 T x H1.

36 / 24 Proposition 11 Bounded operator II A linear operator is bounded if and only if it is continuous. Proof. Assume T : H 1 H 2 is bounded. Then, means continuity of T. T x T x 0 T x x 0 Assume T is continuous. For any ε > 0, there is δ > 0 such that T x < ε for all x H 1 with x < 2δ. Then, 1 sup T x = sup x =1 x =δ δ T x ε δ.

Riesz lemma I Definition. A linear functional is a linear transform from H to C (or R). The vector space of all the bounded (continuous) linear functionals called the dual space of H, and is denoted by H. Theorem 12 (Riesz lemma) For each φ H, there is a unique y φ H such that φ(x) = (x, y φ ) ( x H). Proof. Consider the case of R for simplicity. ) Obvious by Cauchy-Schwartz. 37 / 24

38 / 24 Riesz lemma II ) If φ(x) = 0 for all x, take y = 0. Otherwise, let V = {x H φ(x) = 0}. Since φ is a bounded linear functional, V is a closed subspace, and V H. Take z V with z = 1. By orthogonal decomposition, for any x H, x (x, z)z V. Apply φ, then Take y φ = φ(z)z. φ(x) (x, z)φ(z) = 0, i.e., φ(x) = (x, φ(z)z).

Riesz lemma III 39 / 24

40 / 24 Appendix: Quick introduction to Hilbert spaces Definition of Hilbert space Basic properties of Hilbert space Appendix: Proofs Proof of Theorem 2

41 / 24 Proof. (Described in R case.) Proof of Theorem 2 I Construction of an inner product space: H 0 := Span{k(, x) x X }. Define an inner product on H 0 : for f = n i=1 a ik(, x i ) and g = m j=1 b jk(, y j ), f, g := n i=1 m j=1 a ib j k(x i, y j ). This is independent of the way of representing f and g from the expression f, g = m j=1 b jf(y j ) = n i=1 a ig(x i ).

42 / 24 Proof of Theorem 2 II Reproducing property on H 0 : f, k(, x) = n i=1 a ik(x i, x) = f(x). Well-defined as an inner product: It is easy to see, is bilinear form, and f 2 = n i,j=1 a ia j k(x i, x j ) 0 by the positive definiteness of f. If f = 0, from Cauchy-Schwarz inequality, 3 f(x) = f, k(, x) f k(, x) = 0 for all x X ; thus f = 0.

Proof of Theorem 2 III Completion: Let H be the completion of H 0. H 0 is dense in H by the completion. H is realized by functions: Let {f n } be a Cauchy sequence in H. For each x X, {f n (x)} is a Cauchy sequence, because f n (x) f m (x) = f n f m, k(, x) f n f m k(, x). Define f(x) = lim n f n (x). This value is the same for equivalent sequences, because {f n } {g n } implies f n (x) g n (x) = f n g n, k(, x) f n g n k(, x) 0. Thus, any element [{f n }] in H can be regarded as a function f on X. 3 Note that Cauchy-Schwarz inequality holds without assuming strong positivity of the inner product. 43 / 24