An Introduction to. Support Vector Machine

Size: px

Start display at page:

Download "An Introduction to. Support Vector Machine"

Barrie Kennedy
5 years ago
Views:

1 A Itroducto to Support Vector Mache

2 Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork wth had-desged features a hadwrtg recogto task Curretly, SVM s wdely used object detecto & recogto, cotet-based mage retreval, text recogto, bometrcs, speech recogto, etc. Also used for regresso

3 Outle Lear Dscrmat Fucto Large Marg Lear Classfer Nolear SVM: he Kerel rck

4 Lear Dscrmat Fucto g(x) s a lear fucto: x w x + b > 0 g( x) = w x+ b A hyper-plae the feature space w x + b = 0 (Ut-legth) ormal vector of the hyper-plae: = w w w x + b < 0 x

5 Lear Dscrmat Fucto deotes + How would you classfy these pots usg a lear dscrmat fucto order to mmze the error rate? x deotes - Ifte umber of aswers! x

6 Lear Dscrmat Fucto deotes + How would you classfy these pots usg a lear dscrmat fucto order to mmze the error rate? x deotes - Ifte umber of aswers! x

7 Lear Dscrmat Fucto deotes + How would you classfy these pots usg a lear dscrmat fucto order to mmze the error rate? x deotes - Ifte umber of aswers! x

8 Lear Dscrmat Fucto deotes + How would you classfy these pots usg a lear dscrmat fucto order to mmze the error rate? x deotes - Ifte umber of aswers! Whch oe s the best? x

9 Large Marg Lear Classfer deotes + he lear dscrmat fucto (classfer) wth the maxmum marg s the best x safe zoe deotes - Marg Marg s defed as the wdth that the boudary could be creased by before httg a data pot Why t s the best? q Robust to outlers ad thus strog geeralzato ablty q Good accordg to PAC (Probably Approxmately Correct) theory. x

10 Maxmum Marg Classfcato Dstace from pot x to the hyperplae s: w x + b r = w Examples closest to the hyperplae are support vectors. Marg M of the classfer s the dstace betwee support vectors o both sdes. x x + x + x - Support Vectors Marg M x Oly support vectors matter; other trag pots are gorable.

11 Large Marg Lear Classfer Gve a set of data pots: {( x, y )}, =,, L,, where x deotes + deotes - M w x + b M/ f y = + w x + b - M/ f y = - Wth a scale trasformato o both w ad b, the above s equvalet to For y =+, wx+ b For y =, wx+ b x

12 Large Marg Lear Classfer We kow that + wx wx + b = + b = hus w (x + -x - ) = he marg wdth s: + M = ( x x ) + w = ( x x ) = w w x x + x + x - Support Vectors deotes + deotes - Marg M x

13 Large Marg Lear Classfer Formulato: maxmze w x x + deotes + deotes - Marg such that x + For y =+, wx+ b For y =, wx+ b x - x

14 Large Marg Lear Classfer Formulato: mmze w x x + deotes + deotes - Marg such that x + For y =+, wx+ b For y =, wx+ b x - x

15 Large Marg Lear Classfer Formulato: mmze w x x + deotes + deotes - Marg such that x + y ( wx+ b) x - x

16 Solvg the Optmzato Problem Quadratc programmg wth lear costrats s.t. mmze w y ( wx+ b) Lagraga Fucto mmze L ( w, b, α ) = w α y ( w x + b) ( ) p = s.t. α 0

17 Solvg the Optmzato Problem mmze L ( w, b, α ) = w α y ( w x + b) L p b = ( ) p = 0 s.t. α 0 L p = 0 w = α y x w = = α y = 0

18 Solvg the Optmzato Problem mmze L ( w, b, α ) = w α y ( w x + b) ( ) p = s.t. α 0 Lagraga Dual Problem maxmze s.t. α 0 α αα jyy j j = = j= xx, ad = α y = 0

19 Solvg the Optmzato Problem From KK (Karush Kuh ucker) codto, we kow: ( y wx b ) α ( + ) = 0 hus, oly support vectors have α 0 x x + x + he soluto has the form: w = α yx = α yx = SV x - Support Vectors x get b from y k (w x k + b) = 0, where x k s ay support vector hus, b = y k - Σα y x x k for ay α k > 0

20 Solvg the Optmzato Problem he lear dscrmat fucto s: g (x) = w x + b = SV α y x x + b hat s, o eed to compute w explctly for classfcato. Notce t reles o a dot product betwee the test pot x ad the support vectors x Also keep md that solvg the optmzato problem volved computg the dot products x x j betwee all pars of trag pots

21 Large Marg Lear Classfer What f data s ot lear separable? (osy data, outlers, etc.) x deotes + deotes - Slack varables ξ ca be added to allow msclassfcato of dffcult or osy data pots ξ ξ x

22 Large Marg Lear Classfer Formulato: such that mmze y ( wx + b) ξ ξ 0 w + C ξ Parameter C ca be vewed as a way to cotrol over-fttg: t trades off the relatve mportace of maxmzg the marg ad fttg the trag data. For large values of C, the optmzato wll choose a smaller-marg hyperplae f that hyperplae does a better job of gettg all the trag pots classfed correctly. Coversely, a very small value of C wll cause the optmzer to look for a larger-marg separatg hyperplae, eve f that hyperplae msclassfes more pots. =

23 Solvg the Optmzato Problem Formulato: (Lagraga Dual Problem) maxmze α αα jyy j j = = j= xx such that 0 α C = α y = 0

24 Solvg the Optmzato Problem Aga, x wth o-zero α wll be support vectors. Soluto to the dual problem s: w = α yx = α yx = SV b= y k (- ξ k ) - Σα y x x k for ay k s.t. α k >0 Aga, we do t eed to compute w explctly for classfcato: g (x) = w x + b = SV α y x x + b

25 No-lear SVMs Datasets that are learly separable wth ose work out great: 0 x But what are we gog to do f the dataset s just too hard? 0 x How about mappg data to a hgher-dmesoal space: x 0 x

26 No-lear SVMs: Feature Space Geeral dea: the orgal put space ca be mapped to some hgher-dmesoal feature space where the trag set s separable: Φ: x φ(x)

27 Nolear SVMs: he Kerel rck Wth ths mappg, our dscrmat fucto s ow: g( x) = w φ( x) + b= αφ ( x) φ( x) + b SV No eed to kow ths mappg explctly, because we oly use the dot product of feature vectors both the trag ad test. A kerel fucto s defed as a fucto that correspods to a dot product of two feature vectors some expaded feature space: K( x, x ) φ( x ) φ( x ) j j

28 Nolear SVMs: he Kerel rck A example: -dmesoal vectors x=[x x ]; let K(x,x j )=( + x x j ), Need to show that K(x,x j ) = φ(x ) φ(x j ): K(x,x j )=( + x x j ), = + x x j + x x j x x j + x x j + x x j + x x j = [ x x x x x x ] [ x j x j x j x j x j x j ] = φ(x ) φ(x j ), where φ(x) = [ x x x x x x ] hs slde s courtesy of

29 Nolear SVMs: he Kerel rck Examples of commoly-used kerel fuctos: q q Lear kerel: Polyomal kerel: K ( x, x ) = x x j j K ( x, x ) = ( + x x ) j j p q q Gaussa (Radal-Bass Fucto (RBF) ) kerel: Sgmod: j K( x, xj) = exp( x x ) σ K( x, x ) = tah( β x x + β ) j 0 j Mercer s theorem: Every sem-postve defte symmetrc fucto s a kerel.

30 Nolear SVM: Optmzato Formulato: (Lagraga Dual Problem) maxmze α αα yyk(, ) such that x x j j j = = j= 0 α C = α y = 0 he soluto of the dscrmat fucto s g (x) = SV α y K (x,x) + b he optmzato techque s the same.

31 Support Vector Mache: Algorthm. Choose a kerel fucto. Choose a value for C 3. Solve the quadratc programmg problem (may software packages avalable) 4. Costruct the dscrmat fucto from the support vectors

32 Some Issues Choce of kerel - Gaussa or polyomal kerel s default - f effectve, more elaborate kerels are eeded - doma experts ca gve assstace formulatg approprate smlarty measures Choce of kerel parameters - e.g. σ Gaussa kerel - σ s the dstace betwee closest pots wth dfferet classfcatos - I the absece of relable crtera, applcatos rely o the use of a valdato set or cross-valdato to set such parameters. Optmzato crtero Hard marg v.s. Soft marg - a legthy seres of expermets whch varous parameters are tested hs slde s courtesy of

33 Summary: Support Vector Mache. Large Marg Classfer q Better geeralzato ablty & less over-fttg. he Kerel rck q Map data pots to hgher dmesoal space order to make them learly separable. q Sce oly dot product s used, we do ot eed to represet the mappg explctly.

34 Demo of LbSVM

35 Refereces o SVM ad Stock Predcto StockMarketForecastgusgMacheLeargAlgorthms.pdf pxc pdf ad other refereces ole

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest