CS 57 Itroducto to AI Lecture 6 Bar classfcato: Support Vector Maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 57 Itro to AI Supervsed learg Data: D { D, D,.., D} a set of eamples D, (,,,,, d s a put vector of sze d s the desred output (gve b a teacher Obectve: lear the mappg f : X Y s.t. f ( for all,.., Regresso: Y s cotuous Eample: eargs, product orders compa stock prce Classfcato: Y s dscrete Eample: hadrtte dgt bar form dgt label CS 57 Itro to AI
Dscrmat fuctos: reve A classfcato model s tpcall defed usg dscrmat fuctos Idea: For each class defe a fucto g ( mappg X Whe the decso o put should be made choose the class th the hghest value of g ( class arg ma ( Works for bar ad mult-class classfcato g CS 57 Itroducto to AI Dscrmat fuctos: reve Assume a bar classfcato problem th classes ad Dscrmat fuctos g ( ad g ( g( g ( g( g ( g( g ( Decso boudar CS 57 Itroducto to AI
Logstc regresso model: reve Model for bar ( class classfcato Defed b dscrmat fuctos: g( g ( g ( g ( here z g ( z /( e s a logstc fucto Iput vector d z Logstc fucto g ( d CS 75 Mache Learg Logstc regresso model. Decso boudar Logstc regresso model defes a lear decso boudar Eample: classes (blue ad red pots g( g ( g( g ( g( g ( CS 57 Itroducto to AI 3
Decso boudar A alteratve a to defe dscrmat fuctos th a lear decso boudar Class : Class -: g g ( ( ( Decso boudar: CS 75 Mache Learg Learl separable classes Learl separable classes: here s a hperplae that separates trag staces th o error Class (+ Class (- CS 75 Mache Learg 4
Learg learl separable sets Fdg eghts for learl separable classes: Lear program (LP soluto It fds eghts that satsf the follog costrats: For all, such that For all, such that ogether: ( Propert: f there s a hperplae separatg the eamples, the lear program fds the soluto CS 75 Mache Learg Optmal separatg hperplae Problem: here are multple hperplaes that separate the data pots Whch oe to choose? CS 75 Mache Learg 5
Optmal separatg hperplae Problem: multple hperplaes that separate the data ests Whch oe to choose? Mamum marg choce: mamum dstace of d d here d s the shortest dstace of a postve eample from the hperplae (smlarl for egatve eamples d d d CS 75 Mache Learg Mamum marg hperplae For the mamum marg hperplae ol eamples o the marg matter (ol these affect the dstaces hese are called support vectors CS 75 Mache Learg 6
Fdg mamum marg hperplaes Assume that eamples the trag set are (, such that {, } Assume that all data satsf: for for he equaltes ca be combed as: d d ( for all Equaltes defe to hperplaes: CS 75 Mache Learg Fdg the mamum marg hperplae Geometrcal marg:, (, ( / L measures the dstace of a pot from the hperplae - ormal to the hperplae.. L - Eucldea orm For pots satsfg: ( he dstace s L Wdth of the marg: d d L CS 75 Mache Learg 7
Mamum marg hperplae We at to mamze d We do t b mmzg d L, L -varables / / But e also eed to eforce the costrats o pots: ( CS 75 Mache Learg Mamum marg hperplae Soluto: Icorporate costrats to the optmzato Optmzato problem (Lagraga J ( - Lagrage multplers,, / ( Mmze th respect to, (prmal varables Mamze th respect to α (dual varables What happes to α: f ( else Actve costrat α > α = CS 75 Mache Learg 8
Ma marg hperplae soluto Set dervatves to (Kuh-ucker codtos J (,, J (,, No e eed to solve for Lagrage parameters (Wolfe dual J (, Quadratc optmzato problem: soluto for all ( Subect to costrats for all, ad mamze CS 75 Mache Learg Mamum marg soluto he resultg parameter vector ŵ ca be epressed as: s the soluto of the optmzato he parameter s obtaed from Soluto propertes for all pots that are ot o the marg he decso boudar: ( ( SV CS 75 Mache Learg he decso boudar defed b support vectors ol α > α = 9
CS 75 Mache Learg Support vector maches he decso boudar: Classfcato decso: ( SV ( sg SV CS 75 Mache Learg Support vector maches: soluto propert Decso boudar defed b the set of support vectors SV ad ther alpha values Support vectors = a subset of datapots the trag data that defe the marg Classfcato decso: Note that e do ot have to eplctl compute hs ll be mportat for the olear (kerel case ( SV ( sg SV ŵ
CS 75 Mache Learg Support vector maches: er product Decso o a e depeds o the er product betee to eamples he decso boudar: Classfcato decso: Smlarl, the optmzato depeds o ( SV ( sg SV ( ( (, J CS 75 Mache Learg Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o er product of to datapots (vectors: ( ( 6 5 3?
Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o the er product of to data pots (vectors: 5 6 ( ( 3 5 6* 3 * 5 * 3 6 * 5 CS 75 Mache Learg Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o the er product of to data pots (vectors: ( he er product s also equal ( * cos If the agle betee them s the: If the agle betee them s 9 the: ( * ( he er product measures ho smlar the to vectors are CS 75 Mache Learg
Eteso to a learl o-separable case Idea: Allo some fleblt o crossg the separatg hperplae CS 75 Mache Learg Eteso to the learl o-separable case Rela costrats th varables for for Error occurs f, s the upper boud o the umber of errors Itroduce a pealt for the errors mmze Subect to costrats / C C set b a user, larger C leads to a larger pealt for a error CS 75 Mache Learg 3
Support vector maches: soluto he soluto of the learl o-separable case has the same propertes as the learl separable case. he decso boudar s defed ol b a set of support vectors (pots that are o the marg or that cross the marg he decso boudar ad the optmzato ca be epressed terms of the er product betee pars of eamples sg J ( ( SV, sg ( CS 75 Mache Learg SV ( Nolear decso boudar So far e have see ho to lear a lear decso boudar But hat f the lear decso boudar s ot good. Ho ca e lear a o-lear decso boudares th the SVM? CS 75 Mache Learg 4
Nolear decso boudar he o-lear case ca be hadled b usg a set of features. Essetall e map put vectors to (larger feature vectors φ( Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos Gve the olear feature mappgs, e ca use the lear SVM o the epaded feature vectors ( ' Kerel fucto φ( φ( ' K (, ' φ( φ( ' CS 75 Mache Learg Nolear case he lear case requres to compute ( ' he o-lear case ca be hadled b usg a set of features. Essetall e map put vectors to (larger feature vectors φ( Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos No e ca use SVM formalsm o feature vectors ( ' φ( φ( ' Kerel fucto K (, ' φ( φ( ' CS 75 Mache Learg 5
Support vector maches: soluto for olear decso boudares he decso boudar: Classfcato: K (, SV sg sg K (, SV Decso o a e requres to compute the kerel fucto defg the smlart betee the eamples Smlarl, the optmzato depeds o the kerel J (, K (, CS 75 Mache Learg Kerel trck he o-lear case maps put vectors to (larger feature space φ( Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos Kerel fucto defes the er product the epaded hgh dmesoal feature vectors ad let us use the SVM ( ' K (, ' φ( φ( ' Problem: after epaso e eed to perform er products a ver hgh dmesoal space Kerel trck: If e choose the kerel fucto sel e ca compute lear separato the hgh dmesoal feature space mplctl b orkg the orgal put space!!!! CS 75 Mache Learg 6
Kerel fucto eample Assume [ ad a feature mappg that maps the put, ] to a quadratc feature set φ( [,,,,,] Kerel fucto for the feature space: K ( ', φ( ' φ( ' ' ' ' ' ' ( ' ' ( ( ' he computato of the lear separato the hgher dmesoal space s performed mplctl the orgal put space CS 75 Mache Learg Kerel fucto eample Lear separator the feature space No-lear separator the put space CS 75 Mache Learg 7
Kerel fuctos Lear kerel K (, ' ' Polomal kerel K (, ' ' k Radal bass kerel K (, ' ep ' CS 75 Mache Learg Kerels Kerel smlart betee pars of obects Kerels ca be defed for more comple obects: Strgs Graphs Images CS 75 Mache Learg 8