CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

CS 675 Itroducto to Mache Learg Lecture Support vector maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square

Mdterm eam October 9, 7 I-class eam Closed book Stud materal: Lecture otes Correspodg chapters Bshop Homeork assgmets

Mdterm eam Possble questos: Dervatos: E.g. derve a ML soluto Computatos: Errors, SENS Geeral koledge: E.g. Propertes of the dfferet ML solutos. Algorthms No Matlab code All of the above ca occur as separate problems or part of multple or /F questos /F asers ma requre ustfcato. Wh es or h o?

Outle Outle: Algorthms for lear decso boudar Support vector maches Mamum marg hperplae Support vectors Support vector maches Etesos to the learl o-separable case Kerel fuctos

Lear decso boudares What models defe lear decso boudares?.5.5 g g -.5 - -.5 g g - - -.5 - -.5.5.5

Logstc regresso model Model for bar class classfcato Defed b dscrmat fuctos: g / + e g g / + e Iput vector d z Logstc fucto g d

Lear dscrmat aalss LDA Whe covaraces are the same ~ N µ, Σ, ~ N µ, Σ,

Learl separable classes Learl separable classes: here s a hperplae + that separates trag staces th o error + Normal or drecto of a plae Class + + > Class - + <

Learg learl separable sets Fdg eghts for learl separable classes: Lear program LP soluto It fds eghts that satsf the follog costrats: + For all, such that + + For all, such that ogether: + Propert: f there s a hperplae separatg the eamples, the lear program fds the soluto

Optmal separatg hperplae Problem: here are multple hperplaes that separate the data pots Whch oe to choose?

Optmal separatg hperplae Problem: multple hperplaes that separate the data ests Whch oe to choose? Mamum marg choce: mamum dstace of d + + d here d + s the shortest dstace of a postve eample from the hperplae smlarl for egatve eamples Note: a marg classfer s a classfer for hch e ca calculate the dstace of each eample from the decso boudar d d d +

Mamum marg hperplae For the mamum marg hperplae ol eamples o the marg matter ol these affect the dstaces hese are called support vectors

Fdg mamum marg hperplaes Assume that eamples the trag set are that { +, } Assume that all data satsf: + for +, such + for he equaltes ca be combed as: d d + + for all Equaltes defe to hperplaes: + +

Fdg the mamum marg hperplae Geometrcal marg: ρ,, + / L measures the dstace of a pot from the hperplae - ormal to the hperplae.. L - Eucldea orm For pots satsfg: + he dstace s L Wdth of the marg: d + + d L

Mamum marg hperplae We at to mamze We do t b mmzg d + + d L L / /, - varables But e also eed to eforce the costrats o data staces:, [ ] +

Mamum marg hperplae Soluto: Icorporate costrats to the optmzato Optmzato problem Lagraga - Lagrage multplers Mmze th respect to Mamze th respect to [ + ] J,, / What happes to : f + > else > Actve costrat, prmal varables dual varables Data staces >,

Ma marg hperplae soluto Set dervatves to Kuh-ucker codtos No e eed to solve for Lagrage parameters Wolfe dual Quadratc optmzato problem: soluto for all,, J,, J, J Subect to costrats for all, ad mamze

Mamum marg soluto he resultg parameter vector ŵ ca be epressed as: he parameter s the soluto of the optmzato s obtaed from [ + ] Soluto propertes for all pots that are ot o the marg he decso boudar: > + + SV he decso boudar defed b support vectors ol

Support vector maches: soluto propert Decso boudar defed b a set of support vectors SV ad ther alpha values Support vectors a subset of datapots the trag data that defe the marg SV + Classfcato decso for e : sg SV Note that e do ot have to eplctl compute ŵ hs ll be mportat for the olear kerel case + + Lagrage multplers

Support vector maches he decso boudar: Classfcato decso: SV + + + sg SV

Support vector maches: soluto propert Decso boudar defed b a set of support vectors SV ad ther alpha values Support vectors a subset of datapots the trag data that defe the marg SV + Classfcato decso: sg SV Note that e do ot have to eplctl compute ŵ hs ll be mportat for the olear kerel case + +

Support vector maches: er product Decso o a e depeds o the er product betee to eamples he decso boudar: Classfcato decso: Smlarl, the optmzato depeds o SV + + + sg SV, J

Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o the er product of to datapots vectors: 5 6 3?

Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o the er product of to data pots vectors: 5 6* 5*3 * 3 * 6 5 + + 6 5 3

Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o the er product of to data pots vectors: he er product s equal * If the agle betee them s the: * If the agle betee them s 9 the: cosθ he er product measures ho smlar the to vectors are

Eteso to a learl o-separable case Idea: Allo some fleblt o crossg the separatg hperplae

Learl o-separable case Rela costrats th varables + ξ + + ξ for for + Error occurs f ξ, s the upper boud o the umber of errors Itroduce a pealt for the errors soft marg mmze Subect to costrats ξ / + C ξ C set b a user, larger C leads to a larger pealt for a error ξ

Learl o-separable case mmze + ξ / + C ξ for + + + ξ for ξ Rerte Regularzato pealt [, ] ξ ma + Hge loss [ + ] / + C ma, / + C ξ

he parameter s obtaed through KK codtos Learl o-separable case Lagrage multpler form prmal problem Dual form after are epressed s cacel out, J [ ] + + + C J /,, ξ µ ξ ξ, Subect to: C for all, ad Soluto: he dfferece from the separable case: C ξ

Support vector maches: soluto he soluto of the learl o-separable case has the same propertes as the learl separable case. he decso boudar s defed ol b a set of support vectors pots that are o the marg or that cross the marg he decso boudar ad the optmzato ca be epressed terms of the er product betee pars of eamples SV + +, J [ ] + + sg sg SV

Nolear decso boudar So far e have see ho to lear a lear decso boudar But hat f the lear decso boudar s ot good. Ho e ca lear a o-lear decso boudares th the SVM?

Nolear decso boudar he o-lear case ca be hadled b usg a set of features. Essetall e map put vectors to larger feature vectors φ Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos Gve the olear feature mappgs, e ca use the lear SVM o the epaded feature vectors ' φ φ ' Kerel fucto K,' φ φ '

Support vector maches: soluto for olear decso boudares he decso boudar: Classfcato: Decso o a e requres to compute the kerel fucto defg the smlart betee the eamples Smlarl, the optmzato depeds o the kerel, K SV + +,, K J [ ] + +, sg sg K SV

Kerel trck he o-lear case maps put vectors to larger feature space φ Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos Kerel fucto defes the er product the epaded hgh dmesoal feature vectors ad let us use the SVM ' K,' φ φ ' Problem: after epaso e eed to perform er products a ver hgh dmesoal space Kerel trck: If e choose the kerel fucto sel e ca compute lear separato the hgh dmesoal feature space mplctl b orkg the orgal put space!!!!

Kerel fucto eample Assume [ ad a feature mappg that maps the put, ] to a quadratc feature set φ [,,,,,] Kerel fucto for the feature space:

Kerel fucto eample Assume [ ad a feature mappg that maps the put, ] to a quadratc feature set φ [,,,,,] Kerel fucto for the feature space: K ', φ ' φ ' + ' + ' ' + ' + ' + ' + ' + + ' he computato of the lear separato the hgher dmesoal space s performed mplctl the orgal put space

Kerel fucto eample Lear separator the epaded feature space No-lear separator the put space

Nolear eteso Kerel trck Replace the er product th a kerel A ell chose kerel leads to a effcet computato

Kerel fuctos Lear kerel K,' ' Polomal kerel [ '] k K, ' + Radal bass kerel K,' ep '

Kerels ML researchers have proposed kerels for comparso of varet of obects. Strgs rees Graphs Cool thg: SVM algorthm ca be o appled to classf a varet of obects