Machine Learning. What is a good Decision Boundary? Support Vector Machines

Size: px

Start display at page:

Download "Machine Learning. What is a good Decision Boundary? Support Vector Machines"

Harvey Dalton
6 years ago
Views:

Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar?

1 Machne Learnng 0-70/5 70/ Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March Readng: Chap. 6&7 C.B book and lsted papers Erc CMU What s a good Decson Boundar? Consder a bnar classfcaton task th ± labels not 0/ as before. When the tranng eaples are lnearl separable e can set the paraeters of a lnear classfer so that all the tranng eaples are classfed correctl Man decson boundares! Generatve classfers Logstc regressons Are all decson boundares equall good? Class Class 2 Erc CMU

2 What s a good Decson Boundar? Erc CMU Not All Decson Boundares Are Equal! Wh e a have such boundares? Irregular dstrbuton Ibalanced tranng szes outlners Erc CMU

3 Classfcaton and Margn Paraeterzng decson boundar Let denote a vector orthogonal to the decson boundar and b denote a scalar "offset" ter then e can rte the decson boundar as: + b 0 Class 2 Class d - d + Erc CMU Classfcaton and Margn Paraeterzng decson boundar Let denote a vector orthogonal to the decson boundar and b denote a scalar "offset" ter then e can rte the decson boundar as: + b 0 Margn Class Class 2 d - d + +b > +c for all n class 2 +b < c for all n class Or ore copactl: +b >c he argn beteen an to ponts d + d + Erc CMU

4 Mau Margn Classfcaton he "nu" perssble argn s: 2c Here s our Mau Margn Classfcaton proble: a s.t 2c + b c Erc CMU Mau Margn Classfcaton con'd. he optzaton proble: a s.t b c + b c But note that the agntude of c erel scales and b and does not change the classfcaton boundar at all! h? So e nstead ork on ths cleaner proble: a b s.t + b he soluton to ths leads to the faous Support Vector Machnes - -- beleved b an to be the best "off-the-shelf" supervsed learnng algorth Erc CMU

5 Support vector achne A conve quadratc prograng proble th lnear constrans: a b s.t + b he attaned argn s no gven b Onl a fe of the classfcaton constrants are relevant support vectors Constraned optzaton We can drectl solve ths usng coercal quadratc prograng QP code But e ant to take a ore careful nvestgaton of Lagrange dualt and the soluton of the above n ts dual for. deeper nsght: support vectors kernels ore effcent algorth Erc CMU d - d + Dgresson to Lagrangan Dualt he Pral Proble Pral: n s.t. he generalzed Lagrangan: f g 0 K k h 0 K l L β f + g + β h the 's 0 ι and β's are called the Lagarangan ultplers k l Lea: a β 0 A re-rtten Pral: f f satsfes pral constrants L β o/ n a β 0 L β Erc CMU

6 Lagrangan Dualt cont. Recall the Pral Proble: he Dual Proble: heore eak dualt: d n a β 0 L β a β 0 n L β a β n n a 0 L β β 0 L β p heore strong dualt: Iff there est a saddle pont of L β e have d p Erc CMU A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g f p g Erc CMU

7 A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g f p g Erc CMU A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g p f f g g Erc CMU

8 he KK condtons If there ests soe saddle pont of L then the saddle pont satsfes the follong "Karush-Kuhn-ucker" KK condtons: L β 0 L β 0 β g 0 g 0 0 K k K l K K K Copleentar slackness Pral feasblt Dual feasblt heore: If and β satsf the KK condton then t s also a soluton to the pral and the dual probles. Erc CMU Solvng optal argn classfer Recall our opt proble: a b s.t + b hs s equvalent to n b 2 s.t Wrte the Lagrangan: L b + b 2 Recall that can be reforulated as n b a b 0 L No e solve ts dual proble: a n L b + b 0 0 [ ] b Erc CMU

9 9 Erc CMU he Dual Proble We nze L th respect to and b frst: Note that ples: Plus back to L and usng e have: n a b b L 0 b 0 L b b 0 L b 2 L Erc CMU he Dual proble cont. No e have the follong dual opt proble: hs s agan a quadratc prograng proble. A global au of can alas be found. But hat's the bg deal?? Note to thngs:. can be recovered b 2. he "kernel" 2 a J. s.t. k 0 0 K See net More later

10 Support vectors Note the KK condton --- onl a fe 's can be nonzero!! g 0 K 5 0 Class Call the tranng data ponts hose 's are nonzero the support vectors SV Class Erc CMU Support vector achnes Once e have the Lagrange ultplers { } e can reconstruct the paraeter vector as a eghted cobnaton of the tranng eaples: SV For testng th a ne data z Copute z + b SV z + b and classf z as class f the su s postve and class 2 otherse Note: need not be fored eplctl Erc CMU

11 Interpretaton of support vector achnes he optal s a lnear cobnaton of a sall nuber of data ponts. hs sparse representaton can be veed as data copresson as n the constructon of knn classfer o copute the eghts { } and to use support vector achnes e need to specf onl the nner products or kernel beteen the eaples We ake decsons b coparng each ne eaple z th onl the support vectors: sgn SV z + b Erc CMU Non-lnearl Separable Probles Class 2 Class We allo error ξ n classfcaton; t s based on the output of the dscrnant functon +b ξ approates the nuber of sclassfed saples Erc CMU

12 2 Erc CMU Soft Margn Hperplane No e have a slghtl dfferent opt proble: ξ are slack varables n optzaton Note that ξ 0 f there s no error for ξ s an upper bound of the nuber of errors C : tradeoff paraeter beteen error and argn s.t b + 0 ξ ξ + b C 2 ξ n Erc CMU he Optzaton Proble he dual of ths ne constraned optzaton proble s hs s ver slar to the optzaton proble n the lnear separable case ecept that there s an upper bound C on no Once agan a QP solver can be used to fnd 2 a J 0. 0 s.t. C K

13 he SMO algorth Consder solvng the unconstraned opt proble: We ve alread see three opt algorths!??? Coordnate ascend: Erc CMU Coordnate ascend Erc CMU

14 Sequental nal optzaton Constraned optzaton: a J 2 s.t. 0 C 0. K Queston: can e do coordnate along one drecton at a te.e. hold all [-] fed and update? Erc CMU he SMO algorth Repeat tll convergence. Select soe par and to update net usng a heurstc that tres to pck the to that ll allo us to ake the bggest progress toards the global au. 2. Re-optze J th respect to and hle holdng all the other k 's k ; fed. Wll ths procedure converge? Erc CMU

15 5 Erc CMU Convergence of SMO Let s hold 3 fed and reopt J.r.t. and 2 2 a J. 0 s.t. k C 0 K KK: Erc CMU Convergence of SMO he constrants: he obectve: Constraned opt:

16 Cross-valdaton error of SVM he leave-one-out cross-valdaton error does not depend on the densonalt of the feature space but onl on the # of support vectors! # support vectors Leave - one - out CV error # of tranng eaples Erc CMU Suar Ma-argn decson boundar Constraned conve optzaton Dualt he K condtons and the support vectors Non-separable case and slack varables he SMO algorth Erc CMU

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015 Machne Learnng 0-70 Fall 205 Support Vector Machnes Erc Xng Lecture 9 Octoer 6 205 Readng: Chap. 6&7 C.B ook and lsted papers Erc Xng @ CMU 2006-205 What s a good Decson Boundar? Consder a nar classfcaton