Introduction to Kernel methods

Size: px

Start display at page:

Download "Introduction to Kernel methods"

Magdalene Houston
5 years ago
Views:

1 Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc 19th Oct, 2012

2 Introduction Kernel ethods akes Machine Learning ore applicable. Kernels are siilarity easures Kernels can help integrate different sources of data

3 Agenda 1 Kernel Trick SVM and Non-linear Classification 2 Definition of Kernel functions 3 Kernels and Hilbert Spaces RKHS, Representer theore etc

4 PART 1: KERNEL TRICK

5 Binary classification Classifier f : X { 1,1}. f (x) = sign(w x + b) Data: D = {(x i,y i ) i = 1,...,} x i X,y i {1, 1}

6 Binary classification Classifier f : X { 1,1}. f (x) = sign(w x + b) Data: D = {(x i,y i ) i = 1,...,} x i X,y i {1, 1} find f fro D

7 Review of C-SVM in w,b C C-SVM forulation ax(1 y i (w x i + b),0) w 2 axiize α 1 2 ij α i α j y i y j x i x j + α i subject to 0 α i C, α i y i = 0 i At optiality w = α iy i x i f (x) = sign( α i y i x i x + b)

8 C-SVM in feature spaces Let us work with a feature ap, Φ(x). axiize α 1 2 ij α i α j y i y j Φ(x i ) Φ(x j ) + α i and our classifier is subject to 0 α i C, α i y i = 0 i f (x) = sign( α i y i Φ(x i ) Φ(x) + b) The dot product between any pair of exaples coputed in the feature space be denoted by K(x,z) = Φ(x) Φ(z)

9 C-SVM in feature spaces Let us work with a feature ap, Φ(x). axiize α 1 2 ij α i α j y i y j K(x i,x j ) + α i and our classifier is subject to 0 α i C, α i y i = 0 i f (x) = sign( α i y i K(x i,x) + b) The dot product between any pair of exaples coputed in the feature space be denoted by K(x,z) = Φ(x) Φ(z)

10 An exaple Let x IR 2 and Φ(x) = [x 2 1 x2 2 2x1 x 2 ] K(x,z) = Φ(x) Φ(z) = x 2 1z x 1 x 2 z 1 z 2 + x 2 2z 2 2 =< x,z > 2 If K(x,z) = (x z) r is a dot product in a ( ) d+r 1 r feature space corresponding to x,z IR d. If d = 256,r = 4, the feature space size is 6,35,376. However if we know K one can still solve the SVM forulation without explicitly evaluating Φ

11 Kernel function Kernel function K : X IR is a Kernel function if K(x,z) = K(z,x) syetric Kis positive seidefinite, i.e. n,x 1,...,x n X, the atrix K ij = K(x i,x j ) is psd Recall that a K IR d d is psd if u Ku 0 for all u IR d.

12 Exaples of Kernel function K(x,z) = Φ(x) Φ(z) where φ : E IR d K is syetric i.e. K(x,z) = K(z,x)

13 Exaples of Kernel function K(x,z) = Φ(x) Φ(z) where φ : E IR d K is syetric i.e. K(x,z) = K(z,x) Positive Seidefinite: Let D = {x 1,x 2,...,x n } be set of arbitrarily chosen n eleents of E. Define K ij = Φ(x i ) Φ(x j ) For any u IR n it is straightforward to see that u Ku = Φ(D)u Φ(D) = [Φ(x 1 ),...,Φ(x n )]

14 Exaples of Kernel functions K(x,z) = x z Φ(x) = x K(x,z) = (x z) r Φ t1 t 2...t d (x) = r! t 1!t 2!...t d! xt 1 1 x t x t d d d t i = r K(x,z) = e γ x z 2

15 Kernel Construction Let K 1 and K 2 be two valid kernels. K(x,y) = Φ(x) Φ(y) K(u,v) = K 1 (u,v)k 2 (u,v) K = αk 1 + βk 2 α,β 0 ˆK(x,y) = K(x, y) K(x,x) K(y,y)

16 Kernel Construction Let K 1 and K 2 be two valid kernels. K(x,y) = Φ(x) Φ(y) K(u,v) = K 1 (u,v)k 2 (u,v) K = αk 1 + βk 2 α,β 0 ˆK(x,y) = K(x, y) K(x,x) K(y,y) K(x,y) = li K(x,y) = x y K(x,y) = (x y) i N N i=0 (x y) i = e x y i! ˆK(x,y) = e 1 2 x y 2

17 Kernel function and feature ap A theore due to Mercer guarantees a feature ap for syetric, psd kernel functions. Loosely stated For a syetric function K : X X IR, there exists an expansion K(x,z) = Φ(x) Φ(z) iff X g(x)g(z)k(x, z)dxdz 0

18 PART 2: Kernels and Hilbert spaces

19 What is a Dot product(aka Inner Product) Let X be a vector space. What is a Dot product Syetry < u,v >=< v,u > u,v X Bilinear < αu + βv,w >= α < u,w > +β < v,w > u,v,w, X Positive Seidefinite < u,u > 0 u X < u,u >= 0 iff u = 0 Nor x = x,x x = 0 = x = 0

20 Exaples of Dot products X = IR n,< u,v >= u v X = IR n,< u,v >= { X = L 2 (X) = f : f,g X < f,g >= n X λ i u i v i λ i 0 } f (x) 2 dx < X f (x)g(x)dx

21 Cauchy Schwartz inequality Cauchy Schwartz inequality Let X be an inner product space. x,y x y x,y X and equality holds iff x = αz for soe scalar α Proof: α IR x αz 2 0 x 2 2α x,z + α 2 z 2 0 α Let α = x,z and the inequality follows by taking square roots. The z 2 clai about equality follows fro the definition of nor.

22 Hilbert Space: Basic facts Defn: A Inner product space (H,, H ) is a Hilbert Space if it is separable and coplete. We will denote the nor as H. The orthogonal copleent of M, where M H be a subspace of H is defined as M = {z x,z H = 0, x M} Hilbert space Projection theore Let M be a subspace of Hilbert space H,, H. For every x H the following holds There exists an unique Π M (x) M such that Π M (x) = argin z M x z H x Π M (x) M z,x Π M (x) H = 0 z M x 2 H = Π M(x) 2 H + y 2 H where x = Π M (x) + y where y M

23 Reproducing kernel Hilbert Space(RKHS) Let K be any kernel function. Consider the following set H = {f f (.) = α i K(.,x i ) x i X, N} Dot product For any f,g H, f (.) = Is it a dot product? 1 α i K(.,x i ), g(.) = f,g H = 1 2 j=1 2 α i β j K(x i,x j ) β j K(.,x j )

24 Reproducing kernel Hilbert Space(RKHS) As K is syetric, f,g H = g,f H f (.),f (.) = j=1 α i α j K(x i,x j ) Recall that K is a psd atrix if K is kernel function and so f (.),f (.) H 0 Reproducible Property for any f H f (x) = i=i α i K(x,x i ) = α i K(.,x i ),K(.,x) = f (.),K(.,x) Applying C-S inequality f (x) f,f H K(x,x) holds leading to f (x) = 0 whenever f,f H = 0

25 Representer theore Representer theore Let K be a valid kernel defined on X and H be the corresponding RKHS. Let Ω be an increasing function. The optiization proble in G(g) = g H l(g(x i ),y i ) + Ω( g 2 H ) is solved when g = α ik(.,x i )

26 Representer theore Representer theore Let K be a valid kernel defined on X and H be the corresponding RKHS. Let Ω be an increasing function. The optiization proble in G(g) = g H l(g(x i ),y i ) + Ω( g 2 H ) is solved when g = α ik(.,x i ) Proof: Let M = { α ik(.,x i ) i = 1,...,}. Clearly M is a subspace of H. Take any g H. g(x i ) = g,k(.,x i ) = g M + g per,k(.,x i ) = g M,K(.,x i ) + g per,k(.,x i ) = g M,K(.,x i ) = g M (x i ) As Ω is an increasing function, Ω( g 2 H ) Ω( g M 2 H )

27 References Kernel ethods in Coputational Biology Scholkopf et al Kernel ethods for Pattern Analysis John Shawe Taylor and N. Cristanini Learning with Kernels Scholkopf and Sola 2002

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed