Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Similar documents
Kernel Method: Data Analysis with Positive Definite Kernels

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

One-Dimensional Diffusion Operators

Real Variables # 10 : Hilbert Spaces II

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

Commutative Banach algebras 79

1 Preliminaries on locally compact spaces and Radon measure

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

THEOREMS, ETC., FOR MATH 515

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Hilbert Spaces. Contents

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Chapter 8 Integral Operators

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

Definition and basic properties of heat kernels I, An introduction

Extension of positive definite functions

REAL ANALYSIS II HOMEWORK 3. Conway, Page 49

Problem Set 6: Solutions Math 201A: Fall a n x n,

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

Your first day at work MATH 806 (Fall 2015)

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

CIS 520: Machine Learning Oct 09, Kernel Methods

1 Fourier Transformation.

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

Reminder Notes for the Course on Measures on Topological Spaces

Notation. General. Notation Description See. Sets, Functions, and Spaces. a b & a b The minimum and the maximum of a and b

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Chapter 7: Bounded Operators in Hilbert Spaces

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Math The Laplacian. 1 Green s Identities, Fundamental Solution

REAL AND COMPLEX ANALYSIS

Kernel Density Estimation

be the set of complex valued 2π-periodic functions f on R such that

1 Compact and Precompact Subsets of H

CHAPTER V DUAL SPACES

Short note on compact operators - Monday 24 th March, Sylvester Eriksson-Bique

Functional Analysis Review

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Problem Set 5. 2 n k. Then a nk (x) = 1+( 1)k

Sobolev spaces. May 18

Real Analysis Notes. Thomas Goller

Kernels A Machine Learning Overview

Chapter 1. Introduction

Potential Theory on Wiener space revisited

THE L 2 -HODGE THEORY AND REPRESENTATION ON R n

Representer theorem and kernel examples

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

Wiener Measure and Brownian Motion

Distributions. Chapter INTRODUCTION

MTH 404: Measure and Integration

Analysis IV : Assignment 3 Solutions John Toth, Winter ,...). In particular for every fixed m N the sequence (u (n)

The Kernel Trick. Carlos C. Rodríguez October 25, Why don t we do it in higher dimensions?

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

Outline of Fourier Series: Math 201B

OPERATOR THEORY ON HILBERT SPACE. Class notes. John Petrovic

2. Function spaces and approximation

SOLUTIONS TO SOME PROBLEMS

Integral Jensen inequality

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Strictly Positive Definite Functions on a Real Inner Product Space

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

Math 4263 Homework Set 1

Spectral theory for compact operators on Banach spaces

10-701/ Recitation : Kernels

Geometry on Probability Spaces

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Your first day at work MATH 806 (Fall 2015)

A Concise Course on Stochastic Partial Differential Equations

Integration on Measure Spaces

Strictly positive definite functions on a real inner product space

Functional Analysis Review

Reproducing Kernel Hilbert Spaces

THE INVERSE FUNCTION THEOREM

1 Fourier transform as unitary equivalence

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel Methods for Computational Biology and Chemistry

Spectral Mapping Theorem

8 Singular Integral Operators and L p -Regularity Theory

Function Space and Convergence Types

MATHS 730 FC Lecture Notes March 5, Introduction

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Rational and H dilation

Introduction to Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

THE FORM SUM AND THE FRIEDRICHS EXTENSION OF SCHRÖDINGER-TYPE OPERATORS ON RIEMANNIAN MANIFOLDS

FIXED POINT METHODS IN NONLINEAR ANALYSIS

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Review and problem list for Applied Math I

x 0 + f(x), exist as extended real numbers. Show that f is upper semicontinuous This shows ( ɛ, ɛ) B α. Thus

MORE NOTES FOR MATH 823, FALL 2007

MATH 590: Meshfree Methods

Let R be the line parameterized by x. Let f be a complex function on R that is integrable. The Fourier transform ˆf = F f is. e ikx f(x) dx. (1.

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication

Fredholm Theory. April 25, 2018

Lebesgue Spaces of Matrix Functions

Reproducing Kernel Hilbert Spaces

1 Exponential manifold by reproducing kernel Hilbert spaces

Transcription:

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced Studies June 20, 2008 / Statistical Learning Theory II

Outline Positive and negative definite kernels 1 Positive and negative definite kernels 2 3 2 / 31

1 Positive and negative definite kernels 2 3 3 / 31

Review: operations that preserve positive definiteness I Proposition 1 If k i : X X C (i = 1, 2,...) are positive definite kernels, then so are the following: 1 (positive combination) ak 1 + bk 2 (a, b 0). 2 (product) k 1 k 2 (k 1 (x, y)k 2 (x, y)). 3 (limit) lim i k i (x, y), assuming the limit exists. Remark. Proposition 1 says that the set of all positive definite kernels is closed (w.r.t. pointwise convergence) convex cone stable under multiplication. Example: If k(x, y) is positive definite, e k(x,y) = 1 + k + 1 2 k2 + 1 3! k3 + is also positive definite. 4 / 31

Review: operations that preserve positive definiteness II Proposition 2 Let k : X X C be a positive definite kernel and f : X C be an arbitrary function. Then, is positive definite. In particular, is a positive definite kernel. k(x, y) = f(x)k(x, y)f(y) f(x)f(y) 5 / 31

Review: operations that preserve positive definiteness III Corollary 3 (Normalization) Let k : X X C be a positive definite kernel. If k(x, x) > 0 for any x X, then k(x, y) k(x, y) = k(x, x)k(y, y) is positive definite. This is called normalization of k. Note that k(x, y) 1 for any x, y X. Example: Polynomial kernel k(x, y) = (x T y + c) d (c > 0). (x k(x, T y + c) d y) = (x T x + c) d/2 (y T y + c). d/2 6 / 31

1 Positive and negative definite kernels 2 3 7 / 31

Definition. A function ψ : X X C is called a negative definite kernel if it is Hermitian i.e. ψ(y, x) = ψ(x, y), and n c i c j ψ(x i, x j ) 0 i,j=1 for any x 1,..., x n (n 2) in X and c 1,..., c n C with n i=1 c i = 0. Note: a negative definite kernel is not necessarily minus pos. def. kernel because of the condition n i=1 c i = 0. 8 / 31

Properties of negative definite kernels Proposition 4 1 If k is positive definite, ψ = k is negative definite. 2 Constant functions are negative definite. (2) n i,j=1 c ic j = n i=1 c i n j=1 c j = 0. Proposition 5 If ψ i : X X C (i = 1, 2,...) are negative definite kernels, then so are the following: 1 (positive combination) aψ 1 + bψ 2 (a, b 0). 2 (limit) lim i ψ i (x, y), assuming the limit exists. The set of all negative definite kernels is closed (w.r.t. pointwise convergence) convex cone. Multiplication does not preserve negative definiteness. 9 / 31

Example of negative definite kernel Proposition 6 Let V be an inner product space, and φ : X V. Then, is a negative definite kernel on X. ψ(x, y) = φ(x) φ(y) 2 Proof. Suppose n i=1 ci = 0. n i,j=1cicj φ(xi) φ(xj) 2 = n { i,j=1 cicj φ(xi) 2 + φ(x j) 2 (φ(x i), φ(x j)) (φ(x j), φ(x } i)) = n i=1 ci φ(xi) 2 n j=1 cj + n j=1 cj φ(xj) 2 n i=1 ci ( n i=1 ciφ(xi), n j=1 cjφ(xj)) ( n j=1 cjφ(xj), n i=1 ciφ(xi)) = n i=1 ciφ(xi) 2 n i=1 ciφ(xi) 2 0 10 / 31

Relation between positive and negative definite kernels Lemma 7 Let ψ(x, y) be a hermitian kernel on X. Fix x 0 X and define ϕ(x, y) = ψ(x, y) + ψ(x, x 0 ) + ψ(x 0, y) ψ(x 0, x 0 ). Then, ψ is negative definite if and only if ϕ is positive definite. Proof. "If" part is easy (exercise). Suppose ψ is neg. def. Take any x i X and c i C (1 = 1,..., n). Define c 0 = n i=1 c i. Then, 0 n i,j=0 c ic j ψ(x i, x j ) [for x 0, x 1,..., x n ] = n i,j=1 c ic j ψ(x i, x j ) + c 0 n i=1 c iψ(x i, x 0 ) + c 0 n j=1 c iψ(x 0, x j ) + c 0 2 ψ(x 0, x 0 ) = n i,j=1 c { ic j ψ(xi, x j ) ψ(x i, x 0 ) ψ(x 0, x j ) + ψ(x 0, y 0 ) } = n i,j=1 c ic j ϕ(x i, x j ). 11 / 31

Schoenberg s theorem Theorem 8 (Schoenberg s theorem) Let X be a nonempty set, and ψ : X X C be a kernel. ψ is negative definite if and only if exp( tψ) is positive definite for all t > 0. Proof. If part: 1 exp( tψ(x, y)) ψ(x, y) = lim. t 0 t Only if part: We can prove only for t = 1. Take x 0 X and define ϕ(x, y) = ψ(x, y) + ψ(x, x 0 ) + ψ(x 0, y) ψ(x 0, x 0 ). ϕ is positive definite (Lemma 7). e ψ(x,y) = e ϕ(x,y) e ψ(x,x0) e ψ(y,x0) e ψ(x0,x0). This is also positive definite. 12 / 31

1 Positive and negative definite kernels 2 3 13 / 31

More examples I Positive and negative definite kernels Proposition 9 If ψ : X X C is negative definite and ψ(x, x) 0. Then, for any 0 < p 1, ψ(x, y) p is negative definite. Proof. Use the following formula. ψ(x, y) p = p Γ(1 p) 0 t p 1 (1 e tψ(x,y) )dt The integrand is negative definite for all t > 0.. For any 0 < p 2 and α > 0, exp( α x y p ) is positive definite on R n. α = 2 Gaussian kernel. α = 1 Laplacian kernels. 14 / 31

More examples II Positive and negative definite kernels Proposition 10 If ψ : X X C is negative definite and ψ(x, x) 0. Then, log(1 + ψ(x, y)) is negative definite. Proof. log(1 + ψ(x, y)) = 0 (1 e tψ(x,y) ) e t t dt. 15 / 31

More example III Positive and negative definite kernels Corollary 11 If ψ : X X (0, ) is negative definite. Then, log ψ(x, y) is negative definite. Proof. For any c > 0, log(ψ + 1/c) = log(1 + cψ) log c is negative definite. Take the limit of c. ψ(x, y) = x + y is negative definite on R. ψ(x, y) = log(x + y) is negative definite on (0, ). 16 / 31

More examples IV Positive and negative definite kernels Proposition 12 If ψ : X X C is negative definite and Reψ(x, y) 0. Then, for any a > 0, 1 ψ(x, y) + a is positive definite. Proof. 1 ψ(x, y) + a = 0 e t(ψ(x,y)+a) dt. The integrand is positive definite for all t > 0.. For any 0 < p 2, is positive definite on R. 1 1 + x y p 17 / 31

1 Positive and negative definite kernels 2 3 18 / 31

Positive definite functions Definition. Let φ : R n C be a function. φ is called a positive definite function (or function of positive type) if k(x, y) = φ(x y) is a positive definite kernel on R n, i.e. n i,j=1 c ic j φ(x i x j ) 0 for any x 1,..., x n X and c 1,..., c n C. A positive definite kernel of the form φ(x y) is called shift invariant (or translation invariant). Gaussian and Laplacian kernels are examples of shift-invariant positive definite kernels. 19 / 31

I The characterizes all the continuous shift-invariant kernels on R n. Theorem 13 (Bochner) Let φ be a continuous function on R n. Then, φ is positive definite if and only if there is a finite non-negative Borel measure Λ on R n such that φ(x) = e 1ω T x dλ(ω). φ is the inverse Fourier (or Fourier-Stieltjes) transform of Λ. Roughly speaking, the shift invariant functions are the class that have non-negative Fourier transform. 20 / 31

II The Fourier kernel e 1x T ω is a positive definite function for all ω R n. exp( 1(x y) T ω) = exp( 1x T ω)exp( 1y T ω). The set of all positive definite functions is a convex cone, which is closed under the pointwise-convergence topology. The generator of the convex cone is the Fourier kernels {e 1x T ω ω R n }. Example on R: (positive scales are neglected) exp( 1 2σ 2 x 2 ) exp( α x ) exp( σ2 2 ω 2 ) 1 ω 2 + α 2 is extended to topological groups and semigroups [BCR84]. 21 / 31

1 Positive and negative definite kernels 2 3 22 / 31

Integral characterization of positive definite kernels I Ω: compact Hausdorff space. µ: finite Borel measure on Ω. Proposition 14 Let K(x, y) be a continuous function on Ω Ω. K(x, y) is a positive definite kernel on Ω if and only if K(x, y)f(x)f(y)dxdy 0 for each function f L 2 (Ω, µ). Ω c.f. Definition of positive definiteness: K(x i, x j )c i c j 0. Ω i,j 23 / 31

Integral characterization of positive definite kernels II Proof. ( ). For a continuous function f, a Riemann sum satisfies i,j K(x i, x j )f(x i )f(x j )µ(e i )µ(e j ) 0. The integral is the limit of such sums, thus non-negative. For f L 2 (Ω, µ), approximate it by a continuous function. ( ). Suppose n i,j=1 c ic j K(x i, x j ) = δ < 0. By continuity of K, there is an open neighborhood U i of x i such that n i,j=1 c ic j K(z i, z j ) δ/2. for all z i U i. We can approximate c i i µ(u I i) U i by a continuous function f with arbitrary accuracy. 24 / 31

Integral Kernel Positive and negative definite kernels (Ω, B, µ): measure space. K(x, y): measurable function on Ω Ω such that K(x, y) 2 dxdy <. (square integrability) Ω Ω Define an operator T K on L 2 (Ω, µ) by (T K f)(x) = K(x, y)f(y)dy (f L 2 (Ω, µ)). T K : integral operator with integral kernel K. Fact: T K f L 2 (Ω, µ). Ω ) TK f(x) 2 dx = { K(x, y)f(y)dy } 2 dx K(x, y) 2 dy f(y) 2 dydx = K(x, y) 2 dxdy f 2 L 2. 25 / 31

Hilbert-Schmidt operator I H: separable Hilbert space. Definition. An operator T on H is called Hilbert-Schmidt if for a CONS {ϕ i } i=1 i=1 T ϕ i 2 <. For a Hilbert-Schmidt operator T, the Hilbert-Schmidt norm T HS is defined by T HS = ( i=1 T ϕ i 2) 1/2. T HS does not depend on the choice of a CONS. ) From Parseval s equality, for a CONS {ψ j } j=1, T 2 HS = i=1 T ϕ i 2 = = j=1 i=1 j=1 (ψ j, T ϕ i ) 2 i=1 (T ψ j, ϕ i ) 2 = j=1 T ψ j 2. 26 / 31

Hilbert-Schmidt operator II Fact: T T HS. Hilbert-Schmidt norm is an extension of Frobenius norm of a matrix: T 2 HS = (ψ j, T ϕ i ) 2. i=1 j=1 (ψ j, T ϕ i ) is the component of the matrix expression of T with the CONS s {ϕ i } and {ψ j }. 27 / 31

Hilbert-Schmidt operator and integral kernel I Recall (T K f)(x) = with square integrable kernel K. Theorem 15 Ω K(x, y)f(y)dy (f L 2 (Ω, µ)) Assume L 2 (Ω, µ) is separable. Then, T K is a Hilbert-Schmidt operator, and T K 2 HS = K(x, y) 2 dxdy. Proof. Let {ϕ i } be a CONS. From Parseval s equality, K(x, y) 2 dy = i (K(x, ), ϕ i) L 2 2 = i K(x, y)ϕ i(y)dy 2 = i TKϕi(x) 2. Integrate w.r.t. x, ({ϕ i } is also a CONS) K(x, y) 2 dxdy = i T Kϕ i 2 = T K 2 HS. 28 / 31

Hilbert-Schmidt operator and integral kernel II Converse is true! Theorem 16 Assume L 2 (Ω, µ) is separable. For any Hilbert-Schmidt operator T on L 2 (Ω, µ), there is a square integrable kernel K(x, y) such that T ϕ = K(x, y)ϕ(y)dy. Outline of the proof. Fix a CONS {ϕ i }. Define K n (x, y) = n i=1 (T ϕ i)(x)ϕ i (y) (n = 1, 2, 3,..., ). We can show {K n (x, y)} is a Cauchy sequence in L 2 (Ω Ω, µ µ), and the limit works as K in the statement. 29 / 31

Integral operator by positive definite kernel Ω: compact Hausdorff space. µ: finite Borel measure on Ω. K(x, y): continuous positive definite kernel on Ω. (T K f)(x) = K(x, y)f(y)dy (f L 2 (Ω, µ)) Fact: From Proposition 14 Ω (T K f, f) L2 (Ω,µ) 0 ( f L 2 (Ω, µ)). In particular, any eigenvalue of T K is non-negative. 30 / 31

Positive and negative definite kernels K(x, y): continuous positive definite kernel on Ω. {λ i } i=1, {ϕ i} i=1 : the positive eigenvalues and eigenfunctions of T K. λ 1 λ 2 > 0, lim λ i = 0. i T K ϕ i = λ i ϕ i, K(x, y)ϕi (y)dy = λ i ϕ i (x). Theorem 17 (Mercer) K(x, y) = λ i ϕ i (x)ϕ i (y), i=1 where the convergence is absolute and uniform over Ω Ω. Proof is omitted. See [RSN65], Section 98, or [Ito78], Chapter 13. 31 / 31

References I Christian Berg, Jens Peter Reus Christensen, and Paul Ressel. Harmonic Analysis on Semigroups. Springer-Verlag, 1984. Seizo Ito. Kansu-Kaiseki III (Iwanami kouza Kiso-suugaku). Iwanami Shoten, 1978. Frigyes Riesz and Béla Sz.-Nagy. Functional Analysis (2nd ed.). Frederick Ungar Publishing Co, 1965. 32 / 31