Lecture 9: Multi Kernel SVM

Size: px

Start display at page:

Download "Lecture 9: Multi Kernel SVM"

Allyson Preston
6 years ago
Views:

1 Lecture 9: Multi Kernel SVM Stéphane Canu Sao Paulo 204 April 6, 204

2 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL: the ultiple kernel solution

3 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Standard Learning with Kernels User kernel k data Learning Machine f

4 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Learning Kernel fraework User kernel faily k data Learning Machine f, k(.,.)

5 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 fro SVM SVM: single kernel k f(x) = n i= α i k (x,x i )+b =

6 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 fro SVM to Multiple Kernel Learning (MKL) SVM: single kernel k MKL: set of M kernels k,...,k,...,k M learn classier and cobination weights can be cast as a convex optiization proble f(x) = n i= M α i d k (x,x i )+b = M d = and 0 d = =

7 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 fro SVM to Multiple Kernel Learning (MKL) SVM: single kernel k MKL: set of M kernels k,...,k,...,k M learn classier and cobination weights can be cast as a convex optiization proble f(x) = = n M M α i d k (x,x i )+b d = and 0 d i= = = n α i K(x,x i )+b with K(x,x i ) = i= = M d k (x,x i )

8 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Multiple Kernel The odel n M f(x) = α i d k (x, x i )+b, i= = M d = and 0 d = Given M kernel functions k,...,k M that are potentially well suited for a given proble, find a positive linear cobination of these kernels such that the resulting kernel k is optial k(x,x ) = M = d k (x,x ), with d 0, d = Learning together The kernel coefficients d and the SVM paraeters α i, b.

9 Multiple Kernel: illustration Stéphane Canu (INSA Rouen - LITIS) April 6, / 2

10 Multiple Kernel Strategies Wrapper ethod (Weston et al., 2000; Chapelle et al., 2002) solve SVM gradient descent on d on criterion: argin criterion span criterion Kernel Learning & Feature Selection use Kernels as dictionary Ebedded Multi Kernel Learning (MKL) Stéphane Canu (INSA Rouen - LITIS) April 6, / 2

11 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Multiple Kernel functional Learning The proble (for given C) 2 f 2 H + C ξ i ( i with y i f(xi )+b ) +ξ i ; ξ i 0 i in f H,b,ξ,d M d =, d 0, = f = M f and k(x,x ) = d k (x,x ), with d 0 = The functional fraework H = M = H f, g H = d f, g H

12 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Multiple Kernel functional Learning The proble (for given C) f 2 H 2 d + C ( i with y i in {f },b,ξ,d f (x i )+b ) +ξ i ; ξ i 0 d =, d 0, ξ i i Treated as a bi-level optiization task in f 2 H {f },b,ξ 2 d + C ξ i in ( i d IR M with y i f (x i )+b ) +ξ i ; ξ i 0 s.t. d =, d 0, i

13 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Multiple Kernel representer theore and dual The Lagrangian: L = 2 d f 2 H + C i ξ i i ( ( α i y i f (x i )+b ) ) ξ i i β i ξ i Associated KKT stationarity conditions: L = 0 d f ( ) = n α i y i k (,x i ) i= =, M Representer theore f( ) = f ( ) = n α i y i d k (,x i ) i= } {{ } K(,x i ) We have a standard SVM proble with respect to function f and kernel K.

14 Stéphane Canu (INSA Rouen - LITIS) April 6, 204 / 2 Multiple Kernel Algorith Use a Reduced Gradient Algorith in d IR M s.t. J(d) d =, d 0, SipleMKL algorith set d = M for =,...,M while stopping criterion not et do copute J(d) using an QP solver with K = d K copute J d, and projected gradient as a descent direction D γ copute optial stepsize d d +γd end while Iproveent reported using the Hessian Rakotoaonjy et al. JMLR 08

15 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Coputing the reduced gradient At the optial the prial cost = dual cost f 2 H 2 d + C ξ i i }{{} prial cost with G = d G where G,ij = k (x i,x j ) Dual cost is easier for the gradient d J(d) = 2 α G α = 2 α Gα e α }{{} dual cost Reduce (or project) to check the constraints d = D = 0 D = d J(d) d J(d) and D = M =2 D

16 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Coplexity For each iteration: SVM training: O(nn sv + n 3 sv). Inverting K sv,sv is O(n 3 sv), but ight already be available as a by-product of the SVM training. Coputing H: O(Mn 2 sv) Finding d: O(M 3 ). The nuber of iterations is usually less than 0. When M < n sv, coputing d is not ore expensive than QP.

Stéphane Canu (INSA Rouen - LITIS) April 6, 204 4 / 2 MKL on

17 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 MKL on the 0-caltech dataset

18 Support vector regression (SVR) the t-insensitive loss { in f H with 2 f 2 H f(x i ) y i t, i =, n The support vector regression introduce slack variables { in (SVR) f H 2 f 2 H + C ξ i with f(x i ) y i t +ξ i 0 ξ i i =, n a typical ulti paraetric quadratic progra (pqp) piecewise linear regularization path α(c, t) = α(c 0, t 0 )+( C C 0 )u+ C 0 (t t 0 )v 2d Pareto s front (the tube width and the regularity)

19 y y Support vector regression illustration Support Vector Machine Regression.5 Support Vector Machine Regression x x C large C sall there exists other forulations such as LP SVR...

20 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Multiple Kernel Learning for regression The proble (for given C and t) in {f },b,ξ,d s.t. f 2 H 2 d + C ξ i i f (x i )+b y i t +ξi iξ i 0 i d =, d 0, regularization forulation in f 2 H {f },b,d 2 d + C t, ax( f (x i )+b y i 0) i d =, d 0, Equivalently ( ax in },b,ξ,d i ) f (x i )+b y i t, 0 + 2C d f 2 H +µ d

21 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Multiple Kernel functional Learning The proble (for given C and t) in {f },b,ξ,d s.t. f 2 H 2 d + C ξ i i f (x i )+b y i t +ξi iξ i 0 i d =, d 0, Treated as a bi-level optiization task in f 2 H {f },b,ξ 2 d + C ξ i i in d IR s.t. f M (x i )+b y i t +ξi ξ i 0 i s.t. d =, d 0, i

22 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Multiple Kernel experients LinChirp Wave Blocks Spikes x x x x Single Kernel Kernel Dil Kernel Dil-Trans Data Set Nor. MSE (%) #Kernel Nor. MSE #Kernel Nor. MSE LinChirp.46 ± ± ± 0.20 Wave 0.98 ± ± ± 0.07 Blocks.96 ± ± ± 0.3 Spike 6.85 ± ± ± 0.84 Table: Noralized Mean Square error averaged over 20 runs.

23 Conclusion on ultiple kernel (MKL) MKL: Kernel tuning, variable selection... extention to classification and one class SVM SVM KM: an efficient Matlab toolbox (available at MLOSS) 2 Multiple Kernels for Iage Classification: Software and Experients on Caltech-0 3 new trend: Multi kernel, Multi task and nuber of kernels

24 Stéphane Canu (INSA Rouen - LITIS) April 6, / 2 Bibliography A. Rakotoaonjy, F. Bach, S. Canu & Y. Grandvalet. SipleMKL. J. Mach. Learn. Res. 2008, 9: M. Gönen & E. Alpaydin Multiple kernel learning algoriths. J. Mach. Learn. Res. 2008;2:

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a