Chapter 6 Support vector machine. Séparateurs à vaste marge

Size: px

Start display at page:

Download "Chapter 6 Support vector machine. Séparateurs à vaste marge"

Rosalind Nichols
5 years ago
Views:

1 Chapter 6 Support vector machne Séparateurs à vaste marge

2 Méthode de classfcaton bnare par apprentssage Introdute par Vladmr Vapnk en 1995 Repose sur l exstence d un classfcateur lnéare Apprentssage supervsé Effcace en terme de temps de calcul et de precson

3 IDEE DE BASE rouver un classfcateur lnéare (hyperplan) séparant les données d'un plan en deux categores : Classe 1 (+) pour les ponts à y>0 Classe (-) pour les ponts à y<0 Maxmser la dstance de separaton entre ces deux classes

4 Dscrmnant Functon It can be arbtrary functons of x, such as: Nearest Neghbor Decson ree g( x) Lnear Functons w x b Nonlnear Functons

5 Lnear Dscrmnant Functon g(x) s a lnear functon: x w x + b > 0 g( x) w x b A hyper-plane n the feature space n (Unt-length) normal vector of the hyper-plane: n w w w x + b < 0 x 1

6 Lnear Dscrmnant Functon How would you classfy these ponts usng a lnear dscrmnant functon n order to mnmze the error rate? x denotes +1 denotes -1 Infnte number of answers! x 1

7 Lnear Dscrmnant Functon How would you classfy these ponts usng a lnear dscrmnant functon n order to mnmze the error rate? x denotes +1 denotes -1 Infnte number of answers! x 1

8 Lnear Dscrmnant Functon How would you classfy these ponts usng a lnear dscrmnant functon n order to mnmze the error rate? x denotes +1 denotes -1 Infnte number of answers! x 1

9 Lnear Dscrmnant Functon How would you classfy these ponts usng a lnear dscrmnant functon n order to mnmze the error rate? x denotes +1 denotes -1 Infnte number of answers! Whch one s the best? x 1

10 Large Margn Lnear Classfer denotes +1 he lnear dscrmnant functon (classfer) wth the maxmum margn s the best x safe zone denotes -1 Margn Margn s defned as the wdth that the boundary could be ncreased by before httng a data pont Why t s the best? Robust to outlners and thus strong generalzaton ablty x 1

11 Large Margn Lnear Classfer Gven a set of data ponts: {( x, y )}, 1,,, n, where x denotes +1 denotes -1 For y 1, wxb0 For y 1, wxb0 Wth a scale transformaton on both w and b, the above s equvalent to For y 1, wxb1 For y 1, wxb1 x 1

12 Large Margn Lnear Classfer Formulaton: maxmze w x x + denotes +1 denotes -1 Margn such that x + For y 1, wxb1 For y 1, wxb1 n x - x 1

13 Large Margn Lnear Classfer Formulaton: 1 mnmze w x x + denotes +1 denotes -1 Margn such that x + For y 1, wxb1 For y 1, wxb1 n x - x 1

14 Large Margn Lnear Classfer Formulaton: 1 mnmze w x x + denotes +1 denotes -1 Margn such that x + y ( wxb) 1 n x - x 1

15 Solvng the Optmzaton Problem Quadratc programmng wth lnear constrants s.t. 1 mnmze w y ( wxb) 1 Lagrangan Functon 1 mnmze L (, b, ) y ( b) 1 n p w w w x 1 s.t. 0

16 Solvng the Optmzaton Problem 1 mnmze L (, b, ) y ( b) 1 L p b n p w w w x 1 0 s.t. 0 L p 0 w y x w 1 n 1 n y 0

17 Solvng the Optmzaton Problem 1 mnmze L (, b, ) y ( b) 1 n p w w w x 1 s.t. 0 Lagrangan Dual Problem maxmze s.t. 0 1 n n n j yy j j 1 1 j1 n xx, and 1 y 0

18 Solvng the Optmzaton Problem From KK condton, we know: y ( wxb) 1 0 x x + hus, only support vectors have 0 x + x - he soluton has the form: n w yx yx 1 SV get b from y ( wxb) 1 0, where x s support vector Support Vectors x 1

19 Solvng the Optmzaton Problem he lnear dscrmnant functon s: g( x) w x b x x b SV Notce t reles on a dot product between the test pont x and the support vectors x Also keep n mnd that solvng the optmzaton problem nvolved computng the dot products x x j between all pars of tranng ponts

20 Soluton du problème d optmsaton * : estmé (x S,y S ) étant n'mporte quel pont de support m s s m y y w y w D 1 * * 0 1 * * * 0 * ). ( ). ( ) ( x x x w x w x Seuls les α correspondant aux ponts les plus proches sont non nuls. On parle de ponts de support. Elles determnent l hyperplan optmal

21 Interpretaton geometrque Class 8 = =0 5 =0 7 =0 =0 4 =0 9 =0 Class 1 3 =0 6 =1.4 1 =0.8

22 Large Margn Lnear Classfer What f data s not lnear separable? (nosy data, outlers, etc.) x denotes +1 denotes -1 Slack varables ξ can be added to allow msclassfcaton of dffcult or nosy data ponts 1 x 1

23 Large Margn Lnear Classfer Formulaton: 1 mnmze w C n 1 such that y( wx b) 1 0 Parameter C can be vewed as a way to control over-fttng.

24 Large Margn Lnear Classfer Formulaton: (Lagrangan Dual Problem) maxmze 1 n n n j yy j j 1 1 j1 xx such that 0 C n 1 y 0

25 Non-lnear SVMs Datasets that are lnearly separable wth nose work out great: 0 x But what are we gong to do f the dataset s just too hard? 0 x How about mappng data to a hgher-dmensonal space: x 0 x hs slde s courtesy of

26 Non-lnear SVMs: Feature Space General dea: the orgnal nput space can be mapped to some hgher-dmensonal feature space where the tranng set s separable: Φ: x φ(x) hs slde s courtesy of

27 Nonlnear SVMs: he Kernel rck Wth ths mappng, our dscrmnant functon s now: g( x) w ( x) b ( x) ( x) b SV No need to know ths mappng explctly, because we only use the dot product of feature vectors n both the tranng and test. A kernel functon s defned as a functon that corresponds to a dot product of two feature vectors n some expanded feature space: K( x, x ) ( x ) ( x ) j j

28 Nonlnear SVMs: he Kernel rck An example: -dmensonal vectors x=[x 1 x ]; let K(x,x j )=(1 + x x j ), Need to show that K(x,x j ) = φ(x ) φ(x j ): K(x,x j )=(1 + x x j ), = 1+ x 1 x j1 + x 1 x j1 x x j + x x j + x 1 x j1 + x x j = [1 x 1 x 1 x x x 1 x ] [1 x j1 x j1 x j x j x j1 x j ] = φ(x ) φ(x j ), where φ(x) = [1 x 1 x 1 x x x 1 x ] hs slde s courtesy of

29 Nonlnear SVMs: he Kernel rck Examples of commonly-used kernel functons: Lnear kernel: K( x, x ) x x j j Polynomal kernel: K( x, x ) (1 x x ) j j p Gaussan (Radal-Bass Functon (RBF) ) kernel: Sgmod: j K( x, xj) exp( x x ) K( x, x ) tanh( x x ) j 0 j 1 In general, functons that satsfy Mercer s condton can be kernel functons.

30 Nonlnear SVM: Optmzaton Formulaton: (Lagrangan Dual Problem) n n n 1 maxmze y y K(, ) such that 0 C x x j j j 1 1 j1 n 1 y 0 he soluton of the dscrmnant functon s g( x) K( x, x) b SV he optmzaton technque s the same.

31 Support Vector Machne: Algorthm 1. Choose a kernel functon. Choose a value for C 3. Solve the quadratc programmng problem (many software packages avalable) 4. Construct the dscrmnant functon from the support vectors

32 Some Issues Choce of kernel - Gaussan or polynomal kernel s default - f neffectve, more elaborate kernels are needed - doman experts can gve assstance n formulatng approprate smlarty measures Choce of kernel parameters - e.g. σ n Gaussan kernel - σ s the dstance between closest ponts wth dfferent classfcatons - In the absence of relable crtera, applcatons rely on the use of a valdaton set or cross-valdaton to set such parameters. Optmzaton crteron Hard margn v.s. Soft margn - a lengthy seres of experments n whch varous parameters are tested hs slde s courtesy of

33 Summary: Support Vector Machne 1. Large Margn Classfer Better generalzaton ablty & less over-fttng. he Kernel rck Map data ponts to hgher dmensonal space n order to make them lnearly separable. Snce only dot product s used, we do not need to represent the mappng explctly.

34 Soluton du nouveau problème d optmsaton La foncton de décson devent alors D(x) m S u K(x,x) w 0 1 m S : nb de ponts de support

35 SHEMA DE FONCIONNEMEN des SVM sgn( u K(x,x) + w 0 ) Sorte : sgn( u K(x,x) + w 0 ) K K K K Comparason : K(x, x) Échantllon x 1, x, x 3,... Vecteur d'entrée x

36 Archtecture of SVMs Nonlnear Classfer(usng kernel) Decson functon are computed as the soluton of quadratc program l l v y v x tran example each for substtute x b x x v k b x x v x f ) ( ) ), ( sgn( ) )) ( ) ( ( sgn( ) ( 1 1

37 Matlab example load fsherrs data = [meas(:,1), meas(:,)]; % Extract the Setosa class groups = smember(speces,'setosa'); % Randomly select tranng and test sets [tran, test] = crossvalnd('holdout',groups); % % Use a lnear support vector machne classfer svmstruct = svmtran(data(tran,:),groups(tran),'showplot',true); classes = svmclassfy(svmstruct,data(test,:),'showplot',true); % See how well the classfer performed cp = classperf(groups); classperf(cp,classes,test); cp.correctrate senstvty or true postve rate (PR), specfcty (SPC) or true negatve rate

38 (tranng) 0 (classfed) 1 (tranng) 1 (classfed) Support Vectors

39 Summary: Support Vector Machne 1. Large Margn Classfer Better generalzaton ablty & less over-fttng. he Kernel rck Map data ponts to hgher dmensonal space n order to make them lnearly separable. Snce only dot product s used, we do not need to represent the mappng explctly.

40 Addtonal Resource

41 Demo of LbSVM

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest