An Introduction to. Support Vector Machine

A Itroducto to Support Vector Mache

Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork wth had-desged features a hadwrtg recogto task Curretly, SVM s wdely used object detecto & recogto, cotet-based mage retreval, text recogto, bometrcs, speech recogto, etc. Also used for regresso

Outle Lear Dscrmat Fucto Large Marg Lear Classfer Nolear SVM: he Kerel rck

Lear Dscrmat Fucto g(x) s a lear fucto: x w x + b > 0 g( x) = w x+ b A hyper-plae the feature space w x + b = 0 (Ut-legth) ormal vector of the hyper-plae: = w w w x + b < 0 x

Lear Dscrmat Fucto deotes + How would you classfy these pots usg a lear dscrmat fucto order to mmze the error rate? x deotes - Ifte umber of aswers! x

Lear Dscrmat Fucto deotes + How would you classfy these pots usg a lear dscrmat fucto order to mmze the error rate? x deotes - Ifte umber of aswers! Whch oe s the best? x

Large Marg Lear Classfer deotes + he lear dscrmat fucto (classfer) wth the maxmum marg s the best x safe zoe deotes - Marg Marg s defed as the wdth that the boudary could be creased by before httg a data pot Why t s the best? q Robust to outlers ad thus strog geeralzato ablty q Good accordg to PAC (Probably Approxmately Correct) theory. x

Maxmum Marg Classfcato Dstace from pot x to the hyperplae s: w x + b r = w Examples closest to the hyperplae are support vectors. Marg M of the classfer s the dstace betwee support vectors o both sdes. x x + x + x - Support Vectors Marg M x Oly support vectors matter; other trag pots are gorable.

Large Marg Lear Classfer Gve a set of data pots: {( x, y )}, =,, L,, where x deotes + deotes - M w x + b M/ f y = + w x + b - M/ f y = - Wth a scale trasformato o both w ad b, the above s equvalet to For y =+, wx+ b For y =, wx+ b x

Large Marg Lear Classfer We kow that + wx wx + b = + b = hus w (x + -x - ) = he marg wdth s: + M = ( x x ) + w = ( x x ) = w w x x + x + x - Support Vectors deotes + deotes - Marg M x

Large Marg Lear Classfer Formulato: maxmze w x x + deotes + deotes - Marg such that x + For y =+, wx+ b For y =, wx+ b x - x

Large Marg Lear Classfer Formulato: mmze w x x + deotes + deotes - Marg such that x + For y =+, wx+ b For y =, wx+ b x - x

Large Marg Lear Classfer Formulato: mmze w x x + deotes + deotes - Marg such that x + y ( wx+ b) x - x

Solvg the Optmzato Problem Quadratc programmg wth lear costrats s.t. mmze w y ( wx+ b) Lagraga Fucto mmze L ( w, b, α ) = w α y ( w x + b) ( ) p = s.t. α 0

Solvg the Optmzato Problem mmze L ( w, b, α ) = w α y ( w x + b) L p b = ( ) p = 0 s.t. α 0 L p = 0 w = α y x w = = α y = 0

Solvg the Optmzato Problem mmze L ( w, b, α ) = w α y ( w x + b) ( ) p = s.t. α 0 Lagraga Dual Problem maxmze s.t. α 0 α αα jyy j j = = j= xx, ad = α y = 0

Solvg the Optmzato Problem From KK (Karush Kuh ucker) codto, we kow: ( y wx b ) α ( + ) = 0 hus, oly support vectors have α 0 x x + x + he soluto has the form: w = α yx = α yx = SV x - Support Vectors x get b from y k (w x k + b) = 0, where x k s ay support vector hus, b = y k - Σα y x x k for ay α k > 0

Solvg the Optmzato Problem he lear dscrmat fucto s: g (x) = w x + b = SV α y x x + b hat s, o eed to compute w explctly for classfcato. Notce t reles o a dot product betwee the test pot x ad the support vectors x Also keep md that solvg the optmzato problem volved computg the dot products x x j betwee all pars of trag pots

Large Marg Lear Classfer What f data s ot lear separable? (osy data, outlers, etc.) x deotes + deotes - Slack varables ξ ca be added to allow msclassfcato of dffcult or osy data pots ξ ξ x

Large Marg Lear Classfer Formulato: such that mmze y ( wx + b) ξ ξ 0 w + C ξ Parameter C ca be vewed as a way to cotrol over-fttg: t trades off the relatve mportace of maxmzg the marg ad fttg the trag data. For large values of C, the optmzato wll choose a smaller-marg hyperplae f that hyperplae does a better job of gettg all the trag pots classfed correctly. Coversely, a very small value of C wll cause the optmzer to look for a larger-marg separatg hyperplae, eve f that hyperplae msclassfes more pots. =

Solvg the Optmzato Problem Formulato: (Lagraga Dual Problem) maxmze α αα jyy j j = = j= xx such that 0 α C = α y = 0

Solvg the Optmzato Problem Aga, x wth o-zero α wll be support vectors. Soluto to the dual problem s: w = α yx = α yx = SV b= y k (- ξ k ) - Σα y x x k for ay k s.t. α k >0 Aga, we do t eed to compute w explctly for classfcato: g (x) = w x + b = SV α y x x + b

No-lear SVMs Datasets that are learly separable wth ose work out great: 0 x But what are we gog to do f the dataset s just too hard? 0 x How about mappg data to a hgher-dmesoal space: x 0 x

No-lear SVMs: Feature Space Geeral dea: the orgal put space ca be mapped to some hgher-dmesoal feature space where the trag set s separable: Φ: x φ(x)

Nolear SVMs: he Kerel rck Wth ths mappg, our dscrmat fucto s ow: g( x) = w φ( x) + b= αφ ( x) φ( x) + b SV No eed to kow ths mappg explctly, because we oly use the dot product of feature vectors both the trag ad test. A kerel fucto s defed as a fucto that correspods to a dot product of two feature vectors some expaded feature space: K( x, x ) φ( x ) φ( x ) j j

Nolear SVMs: he Kerel rck A example: -dmesoal vectors x=[x x ]; let K(x,x j )=( + x x j ), Need to show that K(x,x j ) = φ(x ) φ(x j ): K(x,x j )=( + x x j ), = + x x j + x x j x x j + x x j + x x j + x x j = [ x x x x x x ] [ x j x j x j x j x j x j ] = φ(x ) φ(x j ), where φ(x) = [ x x x x x x ] hs slde s courtesy of www.ro.umotreal.ca/~pft6080/documets/papers/svm_tutoral.ppt

Nolear SVMs: he Kerel rck Examples of commoly-used kerel fuctos: q q Lear kerel: Polyomal kerel: K ( x, x ) = x x j j K ( x, x ) = ( + x x ) j j p q q Gaussa (Radal-Bass Fucto (RBF) ) kerel: Sgmod: j K( x, xj) = exp( x x ) σ K( x, x ) = tah( β x x + β ) j 0 j Mercer s theorem: Every sem-postve defte symmetrc fucto s a kerel.

Nolear SVM: Optmzato Formulato: (Lagraga Dual Problem) maxmze α αα yyk(, ) such that x x j j j = = j= 0 α C = α y = 0 he soluto of the dscrmat fucto s g (x) = SV α y K (x,x) + b he optmzato techque s the same.

Support Vector Mache: Algorthm. Choose a kerel fucto. Choose a value for C 3. Solve the quadratc programmg problem (may software packages avalable) 4. Costruct the dscrmat fucto from the support vectors

Some Issues Choce of kerel - Gaussa or polyomal kerel s default - f effectve, more elaborate kerels are eeded - doma experts ca gve assstace formulatg approprate smlarty measures Choce of kerel parameters - e.g. σ Gaussa kerel - σ s the dstace betwee closest pots wth dfferet classfcatos - I the absece of relable crtera, applcatos rely o the use of a valdato set or cross-valdato to set such parameters. Optmzato crtero Hard marg v.s. Soft marg - a legthy seres of expermets whch varous parameters are tested hs slde s courtesy of www.ro.umotreal.ca/~pft6080/documets/papers/svm_tutoral.ppt

Summary: Support Vector Mache. Large Marg Classfer q Better geeralzato ablty & less over-fttg. he Kerel rck q Map data pots to hgher dmesoal space order to make them learly separable. q Sce oly dot product s used, we do ot eed to represet the mappg explctly.

Demo of LbSVM http://www.cse.tu.edu.tw/~cjl/lbsvm/

Refereces o SVM ad Stock Predcto http://www.svms.org/face/huagnakamorwag005.pdf http://cs9.staford.edu/proj0/shejagzhag- StockMarketForecastgusgMacheLeargAlgorthms.pdf http://research.jcaole.org/volume4/umber3/ pxc3877555.pdf ad other refereces ole