Kernel Methods: Support Vector Machines

Size: px

Start display at page:

Download "Kernel Methods: Support Vector Machines"

Kevin Stevenson
6 years ago
Views:

1 Kerel Methods: Support Vector Machies Marco ricavelli 8//0 Mobile Robotics ad Olfactio Lab AASS Research Cetre, Örebro Uiversity State of the Art Methods of Data Modelig ad Machie Learig, IMRIS program, Fall 0

My gratitude to the former teachers of this course that provided me their

2 Ackoledgmets hese slides have bee adapted from the slides used i previous years for the Machie Learig course at Örebro Uiversity. My gratitude to the former teachers of this course that provided me their slides ad greatly simplified my ork. horstei Rögvaldsso Erik Berglud Kerel Methods Support Vector Machies

3 Repetitio. Repetitio. Model compleity 3. Support Vector Machie (SVM) 4. SVM for o-liearly separable data 5. oliear SVM 3 Kerel Methods Support Vector Machies

4 oliear regressio We assume a oliear process: y( ) g( ) e With i.i.d. oise e. We use a oliear model family F. y( ) f (, ) 4 Kerel Methods Support Vector Machies

5 Geeralized Liear model f(; )= + h ()+ + 0 M h M () Liear i the parameters. Reduces to the liear regressio case, but ith more variables. Requires a good guess o the basis fuctios h k (): Polyomials rigoometric fuctios Bessel fuctios Ad may more... 5 Kerel Methods Support Vector Machies

6 Eample Polyomial model he true fuctio is a Bessel fuctio 6 Kerel Methods Support Vector Machies

7 K earest eighbour regressio he predictio equals y of the earest eighbour (K=). he predictio equals the average, mode, media, etc... of the y of K earest eighbours. he predictio equals the eighted average of the y of K earest eighbours. y()= ˆ K k= (r k k )y(mk ) Where m k is the ide of the k:th eighbor ad r k is the distace (m k ) 7 Kerel Methods Support Vector Machies

8 3 earest eighbours (r / K )= k k 0 for k K otherise m = ;m = 4;m3 y ˆ = = 5 y( )+ y( 4 )+ y( 5 ) / 3 8 Kerel Methods Support Vector Machies

9 Kerel Methods Support Vector Machies 9 Quadratic Gaussia Classifier Assume p( c k ) Gaussia ith differet meas u k ad differet covariace matrices k. D is the dimesio of the iput space. Estimate meas ad covariace matrices for the categories maimizig the likelihood of the dataset p(d u k, k ): k K k K D k u u c p ep ) det( ) ( ) ( / c k k k u ) ( ) ( ˆ k K k c k k k u u k ˆ ) ( ˆ ) ( ) (

10 Eample: Quadratic Gaussia Classifier raiig error = 0.07% est error = 0.03% 0 Kerel Methods Support Vector Machies

11 Eample: Quadratic Gaussia Classifier raiig error = 0.07% est error = 0.03% Kerel Methods Support Vector Machies

12 K earest eighbours classificatio Estimate the posterior probabilities accordig to eighbours p(c ˆ j )= K K j Maimum a posteriori classificatio cˆ arg ma j c p(c ˆ j ) Kerel Methods Support Vector Machies

13 Eample: 5- classifier est error = 0.4% 3 Kerel Methods Support Vector Machies

14 Decisio rees Split ito smaller ad smaller subsets. Each split icreases ode purity (e.g. etropy). Splits usually made alog variable aes. his geerates a subdivisio ito hypercubes. Backards pruig importat. 4 Kerel Methods Support Vector Machies

15 Eample: Decisio ree 5 Kerel Methods Support Vector Machies

16 he Multilayer Perceptro Combie several sigle layer perceptros. Each sigle layer perceptro uses a sigmoid shaped trasfer fuctio like the logistic or hyperbolic taget fuctio. φ(z)= φ(z)= tah(z) + ep ( z) 6 Kerel Methods Support Vector Machies

17 raiig a Multilayer Perceptro he simplest algorithm for traiig a multilayer perceptro is the backpropagatio algorithm:. Select small radom eights.. Util haltig coditio:. Select a radom traiig eample.. Calculate the output of the hidde layer. Forard Step 3. Calculate the output of the output layer. 4. Calculate error for output layer. Backards Step 5. Calculate error for hidde layer. 6. Update eights. 7 Kerel Methods Support Vector Machies

18 raiig a Multilayer Perceptro First order methods: use oly the first derivative of the error Backpropagatio Backpropagatio ith Mometum Bold Driver Secod order methods: use the first ad the secod derivative of the error Leveberg-Marquardt 8 Kerel Methods Support Vector Machies

19 Model Compleity. Repetitio. Model compleity 3. Support Vector Machie (SVM) 4. SVM for o-liearly separable data 5. oliear SVM 9 Kerel Methods Support Vector Machies

20 Bias & Variace Error Bias + = Variace +σ ε 0 Kerel Methods Support Vector Machies

21 Model Selectio Model ith the best geeralizatio is a bias vs. variace trade-off Optimal order of fit Variace Bias Kerel Methods Support Vector Machies

22 What is model compleity? A comple model family F cotais may models. he cardiality of the model space (recall hat e said regardig PAC reasoig). he model compleity ca be measured by the Vapik-Chervoekis dimesio (VC dimesio). Kerel Methods Support Vector Machies

23 he VC dimesio he VC-dimesio of model family F is he size of the largest data set that ca be shattered by the family F. A data set is shattered by a model family F if all dichotomies of the data set ca be realized by fuctios f if. 3 Kerel Methods Support Vector Machies

24 Eample: Liear Classifiers here are 8 possible dichotomies of a set of 3 poits i dimesios. hey ca all be realized ith a lie. F ={liear classifiers} shatters a set of 3 poits i dimesios. 4 Kerel Methods Support Vector Machies

25 Eample: Liear Classifiers here are 8 possible dichotomies of a set of 3 poits i dimesios. hey ca all be realized ith a lie. F ={liear classifiers} shatters a set of 3 poits i dimesios. Igored eceptio 5 Kerel Methods Support Vector Machies

26 Eample: Liear Classifiers A set of 4 poits i dimesios is ot shattered by F ={liear classifiers}. Hoever, 87.5% of the dichotomies are liearly separable. 6 Kerel Methods Support Vector Machies

27 Eample: Liear Classifiers A set of 4 poits i dimesios is ot shattered by F ={liear classifiers}. Hoever, 87.5% of the dichotomies are liearly separable. ot separable ith a lie (recall XOR fuctio) 7 Kerel Methods Support Vector Machies

28 VC dimesio of liear classifiers he VC dimesio of liear classifiers is D+. his meas that if < D+, all the dichotomies ca be realized, the the problem is trivial for a liear classifier. he capacity of liear classifiers is (D+). his meas that if < (D+), at least half of the dichotomies ca be realized, the the problem is as good as trivial for a liear classifier. 8 Kerel Methods Support Vector Machies

29 VC dimesio of liear classifiers he VC dimesio of liear classifiers is D+. his meas that if < D+, all the dichotomies ca be realized, the the problem is trivial for a liear classifier. he capacity of liear classifiers is (D+). his meas that if < (D+), at least half of the dichotomies ca be realized, the the problem is as good as trivial for a liear classifier. o esure good geeralizatio, keep >> VC-dimesio, otherise the model ill overfit to the data. 9 Kerel Methods Support Vector Machies

30 P(liear,D) What is the probability that a data set ith D variables (i.e. embedded i D dimesios) ad observatios is liearly separable? P(liear,D) 0.5+ erf / (D+ ) / he trasitio from trivial to hard gets more ad more abrupt ith icreasig dimesio. 30 Kerel Methods Support Vector Machies

31 VC dimesio of other classifiers Multilayer perceptro VC dimesio proportioal to the umber of eights. K earest eighbour Ifiite VC dimesio. Decisio ree Ifiite VC dimesio if features are real valued. 3 Kerel Methods Support Vector Machies

32 VC dimesio of other classifiers Multilayer perceptro VC dimesio proportioal to the umber of eights. K earest eighbour Ifiite VC dimesio. Decisio ree Ifiite VC dimesio if features are real valued. eed for mechaisms to cotrol model compleity! 3 Kerel Methods Support Vector Machies

33 Ho to cotrol model compleity? Regularizatio Combie the data error (e.g. the SSE) ith a parameter regularizatio term (a prior ). Committees E = = y() y() ˆ rai L differet models ad make predictios usig the average of the predictios made by each model (baggig). rai the base classifiers i sequece. Each classifier is traied usig a eighted form of the dataset i hich the eights associated to data poits deped o the performace of previous classifiers. + R(W) 33 Kerel Methods Support Vector Machies

34 Ho to cotrol model compleity? Regularizatio Combie the data error (e.g. the SSE) ith a parameter regularizatio term (a prior ). Committees E = = y() y() ˆ rai L differet models ad make predictios usig the average of the predictios made by each model (baggig). rai the base classifiers i sequece. Each classifier is traied usig a eighted form of the dataset i hich the eights associated to data poits deped o the performace of previous classifiers. + R(W) Depeds oly o model parameters W. Regularizatio parameter l to be selected! 34 Kerel Methods Support Vector Machies

35 Selectio of l Lies represets averages over the crossvalidatio sets. Error bars are the 95% sigificace limits for the averages. log0l l log 0 35 Kerel Methods Support Vector Machies

36 Regularizatio: Bias & Variace Bias icreases (a little) ad variace decreases 36 Kerel Methods Support Vector Machies

37 Committees: Bias ad Variace Bias ~uchaged, but variace decreases It is usually a good idea to use a committee 37 Kerel Methods Support Vector Machies

38 Support Vector Machie (SVM). Repetitio. Model compleity 3. Support Vector Machie (SVM) 4. SVM for o-liearly separable data 5. oliear SVM 38 Kerel Methods Support Vector Machies

39 Liearly separable problem here are ifitely may lies (decisio boudaries) that have zero traiig error. Is there oe of them that is preferable? 39 Kerel Methods Support Vector Machies

40 Liearly separable problem here are ifitely may lies (decisio boudaries) that have zero traiig error. Is there oe of them that is preferable? he lie ith the largest margi 40 Kerel Methods Support Vector Machies

41 ote o margi ad error here is a PAC-like / VC-like theorem that states that the geeralizatio error for a cosistet hypothesis f ith margi g goes like: err(f) ~ γ his meas that maimizig the margi e are miimizig the geeralizatio error. 4 Kerel Methods Support Vector Machies

42 Maimum margi classifier he margi is defied to be the smallest distace betee the decisio boudary ad ay of the samples. 4 Kerel Methods Support Vector Machies

43 Maimum margi classifier he margi is defied to be the smallest distace betee the decisio boudary ad ay of the samples. Support Vectors Closest poit to the decisio boudary 43 Kerel Methods Support Vector Machies

44 otatio (chage title) Liear model y( ) b Decisio boudary y( ) 0 arget values t =+ blue dots t =- red dots Liear Separability, t y( ) 0 44 Kerel Methods Support Vector Machies

45 he optimizatio problem Maimizig the margi is equal to miimizig subject to the costraits b b for blue dots for red dots 45 Kerel Methods Support Vector Machies

46 Computig the margi he hyperplae separatig the to classes (solid black lie) is defied by: a he dashed hyperplaes are give by: a b a b 46 Kerel Methods Support Vector Machies

47 Computig the margi Divide by b (defie a e scale): b b b a b b a b a Where: b a b 47 Kerel Methods Support Vector Machies

48 Computig the margi We obtai the folloig system of equatios: ( margi l) l +l 48 Kerel Methods Support Vector Machies

49 Computig the margi Kerel Methods Support Vector Machies 49 Let s solve the system for the variable margi: l l margi ) ( l l margi

50 Computig the margi Kerel Methods Support Vector Machies 50 Let s solve the system for the variable margi: l l margi ) ( l l margi l l margi margi

51 he optimizatio problem Maimizig the margi is equal to miimizig or subject to the costraits for blue dots for red dots 5 Kerel Methods Support Vector Machies

52 he optimizatio problem Maimizig the margi is equal to miimizig or subject to the costraits for blue dots for red dots We tur this ito a quadratic programmig problem 5 Kerel Methods Support Vector Machies

53 Cove optimizatio problems Miimizig a cove fuctio over a cove set: is a cove fuctio. he costraits + ad + - defie a cove set (itersectio of half-spaces is a cove set). hus, SVM is a cove optimizatio problem. I particular, sice is a quadratic fuctio, SVM is a Quadratic Programmig (QP) problem. Cove optimizatio problems ehibit ice properties: If a local miimum eists the it is a global miimum. We ca certify he a miimum is reached (KK coditios). 53 Kerel Methods Support Vector Machies

54 Quadratic programmig problem Write the costraits compactly as: t α 0 Miimize cost (Lagragia), here ad are free parameters: L P (, α, l) = = t α, 0 Where e have itroduced o egative Lagrage multipliers l that epress the costraits: L P = t α 0 54 Kerel Methods Support Vector Machies

55 Quadratic programmig problem Write the costraits compactly as: t α 0 ½ icluded for later coveiece Miimize cost (Lagragia), here ad are free parameters: L P (, α, l) = = t α, 0 Where e have itroduced o egative Lagrage multipliers l that epress the costraits: L P = t α 0 55 Kerel Methods Support Vector Machies

56 Dual problem Kerel Methods Support Vector Machies 56 he direct solutio of this this problem ould be very comple. We shall covert it ito a equivalet problem that is much easier to solve. At the solutio the derivatives of L P.r.t. ad are equal to zero: = P = P = t = L = t = α L 0 0 = = t = t 0

57 Dual problem Kerel Methods Support Vector Machies 57 Usig the to coditios just obtaied e ca elimiate ad from L(,,l) ad obtai the dual represetatio. = P y α t α t = α L ),, ( l

58 Dual problem Kerel Methods Support Vector Machies 58 Usig the to coditios just obtaied e ca elimiate ad from L(,,l) ad obtai the dual represetatio. = P y α t α t = α L ),, ( l = = = t t 0

59 Dual problem Kerel Methods Support Vector Machies 59 Usig the to coditios just obtaied e ca elimiate ad from L(,,l) ad obtai the dual represetatio. = P y α t α t = α L ),, ( l = = = t t 0 = = m= m m m D t t = L

60 Primal ad Dual problem Kerel Methods Support Vector Machies 60 For cove optimizatio problems the miimizatio of the primal problem L P is equivalet to maimizatio of the dual problem L D : otice that e have a l for every traiig sample. l =0 for all the poits that are ot support vector ad l >0 for all the support vectors. 0, ),, ( = P α t = L l 0 0,, ) ( = = m= m m m D t t t = L l

61 Primal ad Dual problem Kerel Methods Support Vector Machies 6 For cove optimizatio problems the miimizatio of the primal problem L P is equivalet to maimizatio of the dual problem L D : otice that e have a l for every traiig sample. l =0 for all the poits that are ot support vector ad l >0 for all the support vectors. 0, ),, ( = P α t = L l 0 0,, ) ( = = m= m m m D t t t = L l IMPORA Oly dot products of the iputs i the problem!

62 Predictio he output of a SVM i the predictio phase is calculated as follos: y()= sg α = sg Ω s t α Ω s set of support vectors 6 Kerel Methods Support Vector Machies

63 Predictio he output of a SVM i the predictio phase is calculated as follos: y()= sg α = sg Ω s t α Ω s set of support vectors IMPORA Oly dot products of the iput i the predictio phase! 63 Kerel Methods Support Vector Machies

64 SVM for o-liearly separable data. Repetitio. Model compleity 3. Liear Support Vector Machie (SVM) 4. SVM for o-liearly separable data 5. oliear SVM 64 Kerel Methods Support Vector Machies

65 o-liearly separable problem here are o lies (decisio boudaries) that have zero traiig error. I this case there is o solutio to the optimizatio problem preseted i the previous sectio. 65 Kerel Methods Support Vector Machies

66 o-liearly separable problem here are o lies (decisio boudaries) that have zero traiig error. I this case there is o solutio to the optimizatio problem preseted i the previous sectio. We eed to allo some misclassificatio 66 Kerel Methods Support Vector Machies

67 he soft margi SVM Each data poit is alloed to violate the hard costrait by a amout (slack variables, idividual for each poit). We must keep a limit o the sizes of the slack variables, so e iclude oe more term i the cost e miimize. he slack variables are supposed to take care of the oise i the data. Hard costrait: t [ α] Soft costrait: t [ α] - 67 Kerel Methods Support Vector Machies

68 he slack variables = 0 for poits that are o or iside the correct side of the margi boudary. = t -y( ) for other poits. 0< for poits betee the margi boudary ad the decisio boudary. for poits o the decisio boudary. for poits o the rog side of the decisio boudary (misclassified poits). 68 Kerel Methods Support Vector Machies

69 Soft margi QP problem Our goal is to maimize the margi hile softly pealizig poits that lie o the rog side of the margi boudary. We therefore miimize: C Subject to the costraits: t [ α] - he parameter C>0 cotrols the trade-off betee the slack variable pealty ad the margi. 69 Kerel Methods Support Vector Machies

70 Soft margi QP problem: Lagragia Kerel Methods Support Vector Machies 70 he e Lagragia L P the is: At the solutio the derivatives of L P.r.t., ad are equal to zero: 0 0,, ),,,, ( = P α t C = L l P = P = P C L = t = L = t = α L 0 0

71 Dual soft margi problem Substitutig the three relatioship obtaied by settig the partial derivetives of the L P.r.t.,, ad to zero e obtai the dual problem L D : L D ( l) = = = m= m t t m m,0 C, t 0 We ca otice that the fuctio to be optimized is the same tha i the hard margi case. Although the costraits o l chaged. o they are bo-costraits bouded by 0 ad C. his is due to the equatio l =C- ad the costraits 0. With C-> e recover the hard margi problem. 7 Kerel Methods Support Vector Machies

72 Ho is a SVM traied? Usually traied maimizig the dual L D. A stadard QP solver ca do the trick. Algorithms that do ot perform ay matri operatio have bee developed for hadlig the case of large data sets. he most famous algorithm is Sequetial Miimal Optimizatio (SMO). LIBSVM is based o a improved versio of SMO. Methods have bee developed for perfomig the traiig miimizig the primal L P. hese methods are relevat i case a approimate solutio is good eough. Implemetatio more comple tha SMO. 7 Kerel Methods Support Vector Machies

73 Sequetial Miimal Optimizatio Computatioal compleity scales somehere betee liear ad quadratic i the traiig set size. Required memory liear i the traiig set size. Solves the dual problem. he costraits are:,0 Starts from a feasible poit (l all equals to 0) ad matais feasibility modifyig to l at each iteratios. y C 0 73 Kerel Methods Support Vector Machies

74 Sequetial Miimal Optimizatio Kerel Methods Support Vector Machies 74 Solves a series of QP problems ivolvig to l aalytically. Miimizatio of a quadratic fuctio of to bo-costraied variables. Efficiet heuristic for choosig the to lambdas to optimize Most of the time spet optimizig 0< l <C. hose poits are support vectors (l >0) that are ot misclassified(l <C). Stops he each sample satisfies the KK coditios: 0 0 t y C t y C t y l l l j j j j t y l

75 oliear SVM. Repetitio. Model compleity 3. Liear Support Vector Machie (SVM) 4. SVM for o-liearly separable data 5. oliear SVM 75 Kerel Methods Support Vector Machies

76 Ho ca e make SVM oliear? Project data ito high-dimesioal space Z. here e ko, due to VC dimesio of liear classifiers, that it ill be liearly separable. ( ) Whe projectig back ito the origial space X the decisio boudary ill be i geeral oliear. Key fact: e do t eed to ko the projectio! z 76 Kerel Methods Support Vector Machies

77 SVM: raiig ad Predictig Kerel Methods Support Vector Machies 77 raiig a SVM maimizig the dual: Makig predictios ith a SVM: 0,,0 ) ( = = m= m m m D t C t t = L l ctors support ve set of sg sg s s Ω Ω α t = α y()=

78 SVM: raiig ad Predictig Kerel Methods Support Vector Machies 78 raiig a SVM maimizig the dual: Makig predictios ith a SVM: 0,,0 ) ( = = m= m m m D t C t t = L l ctors support ve set of sg sg s s Ω Ω α t = α y()= Oly dot products of the iput both i the traiig ad i the predictio phase!

79 Eample: from D to 3D oliear Problem Liear Problem p y ( p) y y 79 Kerel Methods Support Vector Machies

80 Eample: Mappig ad Kerel Kerel Methods Support Vector Machies 80 ) ( ) ( ), ( y y y y y y y y p p p p K p y ) ( y y p I practice for most kerels the mappig (p) is ot eve ko!

81 Video: Mappig ad Kerel haks to Udi Aharoi, Youube lik for the video: 8 Kerel Methods Support Vector Machies

82 Dot product Kerel rick Recall that both i traiig ad predictio phase of the SVM oly dot products of the iput are eeded. If e ca fid kerel fuctio K such that: K, ) ( ) ( ) he e do t eve have to ko the mappig to solve the problem... his has to advatages: ( m m. Save a lot of computatio by ot havig to compute the mappig ad the trai i the high dimesioal space.... he data ca be projected i a deliberately high dimesioal space, eve ifiite... (e have to be careful ith this!) 8 Kerel Methods Support Vector Machies

83 Valid Kerels: Mercer s theorem Kerel Methods Support Vector Machies 83 Defie the matri (Gram matri): If K is symmetric, K = K, ad positive semi-defiite, the K[ i, j ] is a valid kerel, i.e., there eists a mappig z = ( ) such that z i z j = K[ i, j ]. y K y K y K y K y K y K y K y K y K K,,,,,,,,,

84 Eamples of Kerels Gaussia K i j [ i, j ] e Parameters Polyomial K[ i, j ] i j d d Sigmoidal K[, ] tah i j i j 0, 0 84 Kerel Methods Support Vector Machies

85 Eamples of Kerels Gaussia K i j [ i, j ] e Parameters Polyomial K[ i, j ] i j d d Sigmoidal K[, ] tah i j i j 0, 0 ot every parameter cofiguratio geerates a valid kerel! 85 Kerel Methods Support Vector Machies

86 Costructig e kerels Give valid kerels K (, ) ad K (, ) e kerels ca be costructed applyig alloed operatios: K(, ) = K (, ) + K (, ) K(, ) = K (, )K (, ) K(, ) = gk (, ) More i the tetbook Kerel Methods Support Vector Machies

87 Gaussia Kerel K i j [ i, j ] e Probably the most commoly used kerel. Much ork i literature about properties of SVM ith Gaussia Kerel. It has ifiite VC dimesio. -> behaves like a liear Kerel. 87 Kerel Methods Support Vector Machies

88 Sigmoid Kerel K[, ] tah i j he SVM reduces to a layers Multilayer perceptro. Automatic selectio of the umber of hidde euros: umber of hidde euros = umber of iputs * umber of support vectors. Weights of the multilayer perceptro: are the eights coectig the iputs ith the hidde layer. Lagrage multipliers l are the eight coectig the hidde layer ith the outpus. Activatio fuctios: Sigmoid activatio fuctio at the hidde layer. Liear activatio fuctio at the output. i j 0 88 Kerel Methods Support Vector Machies

89 Kerels for structured data Kerels for time series: Dyamic ime Warpig (DW) Kerel. Global Aligmet (GA) Kerel. Autoregressive (AR) Kerel. Kerel for discrete iputs ad boolea epressios: Disjuctive ormal Form (DF) Kerel: K[ u, v] D u jv j u j v j j u,v are boolea epressio i DF. Correspods to a mappig i a space of 3 D - dimesio, here D is the umber of variables i the boolea fuctio. Efficiet learig, computatio of K scales ith D, ot 3 D. 89 Kerel Methods Support Vector Machies

90 Model Selectio Model selectio ith SVM mea fidig the best values for the kerel parameters ad the parameter C (stiffess of the margi). Ho o-cosistet ith the data ca our hypothesis be? Ofte doe ith cross-validatios ad a lattice search (o likelihood fuctio to optimize). Heuristics for simplifyig the search(e ill cosider ). 90 Kerel Methods Support Vector Machies

91 SVM ith Gaussia Kerel he most commo SVM Lattice search usig cross-validatio 9 Kerel Methods Support Vector Machies

92 Eample: SVM Gaussia Kerel, C= K i j [ i, j ] e SVM ith =.3 rai error = 0.48% est error = 0.64% raiig time: secods 9 Kerel Methods Support Vector Machies

93 Eample: SVM Gaussia Kerel, C= K i j [ i, j ] e SVM ith = 0.7 rai error = 0.46% est error = 0.40% raiig time: 3 secods 93 Kerel Methods Support Vector Machies

94 Eample: SVM Gaussia Kerel, C= K i j [ i, j ] e SVM ith = 0. rai error = 0.3% est error = 0.3% raiig time: secods 94 Kerel Methods Support Vector Machies

95 Eample: SVM Gaussia Kerel, C= K i j [ i, j ] e SVM ith = 0.07 rai error = 0.6% est error = 0.36% raiig time: secods 95 Kerel Methods Support Vector Machies

96 Eample: SVM Gaussia Kerel, C= K i j [ i, j ] e SVM ith = 0.0 rai error = 0.9% est error = 0.0% raiig time: secods 96 Kerel Methods Support Vector Machies

97 Heuristic: searchig the lattice Proposed by Keerthi ad Li i: Asymptotic Behaviors of Support Vector Machies ith Gaussia Kerel S. Sathiya Keerthi ad Chih-Je Li,eural Computatio 003 5:7, Kerel Methods Support Vector Machies

98 Heuristic: searchig the lattice Search for the best C for liear SVM ad call it H. Search for the best (C, ) satisfyig log( ) = log(c) log(h). Search o a D lattice reduced to search o a lie. Asymptotically justified heuristic. 98 Kerel Methods Support Vector Machies

99 Heuristic: searchig the lattice Search for the best C for liear SVM ad call it H. Search for the best (C, ) satisfyig log( ) = log(c) log(h). Search o a D lattice reduced to search o a lie. Asymptotically justified heuristic. 99 Kerel Methods Support Vector Machies

100 Heuristic: gradiet based adaptatio Miimizig a smooth performace validatio fuctio. Calculate the gradiet of the validatio fuctio ith respect to the hyperparameters. Advatages already evidet for search of parameters, but thik about he you have to optimize may more. Oe eample is the ARD kerel: K[ i, j ] e Differet scalig parameter for each feature (filter out irrelevat iputs). Here the umber of parameters to optimize is D+, here D is the iput dimesioality. t i j t 00 Kerel Methods Support Vector Machies

101 Heuristic: gradiet based adaptatio SVM ith Gaussia Kerel Lattice search 8 fold CV A Efficiet Method for Gradiet-Based Adaptatio of Hyperparameters i SVM Models. S. Sathiya Keerthi, Vikas Sidhai, Olivier Chapelle. I Proceedigs of IPS'006. pp Kerel Methods Support Vector Machies

102 Multiclass SVM he support vector machie is fudametally a to-class classifier. May problems ivolve K> classes. Most commoly used approaches to tackle multiclass problems ith SVM: Oe-versus-the-rest - rai K separate SVMs ith data from class k are the positive eamples ad the data from the other K- classes are the egative eamples. Oe-versus-oe rai K(K-)/ separate SVMs o all the possible pairs of classes. he predicted class is the oe ith the highest umber of votes. 0 Kerel Methods Support Vector Machies

103 SVM for regressio Sparse solutio for regressio problems. Cost fuctio to miimize: Subject to costraits: [ t ξ C + he costraiats are a e-isesitive error fuctio. his is for obtaiig sparse solutios. Support vectors are poits that lie o the boudary of the e-tube or outside. [ ζ α] 0 = t α] ξ + ζ ε+ξ ε+ζ 03 Kerel Methods Support Vector Machies

104 Book Readigs (Bishop) Ch. 6., 6., 6.3 Ch. 7. Additioal Referece (advaced): A utorial o Support Vector Machies for Patter Recogitio Christopher J. C. Burges. Data Miig ad Koledge Discovery : 67, 998 Sectios:,,3,4,6 04 Kerel Methods Support Vector Machies

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)