Machine Learning. What is a good Decision Boundary? Support Vector Machines

Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar? Consder a bnar classfcaton task th ± labels not 0/ as before. When the tranng eaples are lnearl separable e can set the paraeters of a lnear classfer so that all the tranng eaples are classfed correctl Man decson boundares! Generatve classfers Logstc regressons Are all decson boundares equall good? Class Class 2 Erc Xng @ CMU 2006-200 2

What s a good Decson Boundar? Erc Xng @ CMU 2006-200 3 Not All Decson Boundares Are Equal! Wh e a have such boundares? Irregular dstrbuton Ibalanced tranng szes outlners Erc Xng @ CMU 2006-200 4 2

Classfcaton and Margn Paraeterzng decson boundar Let denote a vector orthogonal to the decson boundar and b denote a scalar "offset" ter then e can rte the decson boundar as: + b 0 Class 2 Class d - d + Erc Xng @ CMU 2006-200 5 Classfcaton and Margn Paraeterzng decson boundar Let denote a vector orthogonal to the decson boundar and b denote a scalar "offset" ter then e can rte the decson boundar as: + b 0 Margn Class Class 2 d - d + +b > +c for all n class 2 +b < c for all n class Or ore copactl: +b >c he argn beteen an to ponts d + d + Erc Xng @ CMU 2006-200 6 3

Mau Margn Classfcaton he "nu" perssble argn s: 2c Here s our Mau Margn Classfcaton proble: a s.t 2c + b c Erc Xng @ CMU 2006-200 7 Mau Margn Classfcaton con'd. he optzaton proble: a s.t b c + b c But note that the agntude of c erel scales and b and does not change the classfcaton boundar at all! h? So e nstead ork on ths cleaner proble: a b s.t + b he soluton to ths leads to the faous Support Vector Machnes - -- beleved b an to be the best "off-the-shelf" supervsed learnng algorth Erc Xng @ CMU 2006-200 8 4

Support vector achne A conve quadratc prograng proble th lnear constrans: a b s.t + b he attaned argn s no gven b Onl a fe of the classfcaton constrants are relevant support vectors Constraned optzaton We can drectl solve ths usng coercal quadratc prograng QP code But e ant to take a ore careful nvestgaton of Lagrange dualt and the soluton of the above n ts dual for. deeper nsght: support vectors kernels ore effcent algorth Erc Xng @ CMU 2006-200 9 d - d + Dgresson to Lagrangan Dualt he Pral Proble Pral: n s.t. he generalzed Lagrangan: f g 0 K k h 0 K l L β f + g + β h the 's 0 ι and β's are called the Lagarangan ultplers k l Lea: a β 0 A re-rtten Pral: f f satsfes pral constrants L β o/ n a β 0 L β Erc Xng @ CMU 2006-200 0 5

Lagrangan Dualt cont. Recall the Pral Proble: he Dual Proble: heore eak dualt: d n a β 0 L β a β 0 n L β a β n n a 0 L β β 0 L β p heore strong dualt: Iff there est a saddle pont of L β e have d p Erc Xng @ CMU 2006-200 A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g 0 + + 0 f p g Erc Xng @ CMU 2006-200 2 6

A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g 0 + + 0 f p g Erc Xng @ CMU 2006-200 3 A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g 0 + + 0 p f f g g Erc Xng @ CMU 2006-200 4 7

he KK condtons If there ests soe saddle pont of L then the saddle pont satsfes the follong "Karush-Kuhn-ucker" KK condtons: L β 0 L β 0 β g 0 g 0 0 K k K l K K K Copleentar slackness Pral feasblt Dual feasblt heore: If and β satsf the KK condton then t s also a soluton to the pral and the dual probles. Erc Xng @ CMU 2006-200 5 Solvng optal argn classfer Recall our opt proble: a b s.t + b hs s equvalent to n b 2 s.t Wrte the Lagrangan: L b + b 2 Recall that can be reforulated as n b a b 0 L No e solve ts dual proble: a n L b + b 0 0 [ ] b Erc Xng @ CMU 2006-200 6 8

9 Erc Xng @ CMU 2006-200 7 he Dual Proble We nze L th respect to and b frst: Note that ples: Plus back to L and usng e have: n a b b L 0 b 0 L b b 0 L b 2 L Erc Xng @ CMU 2006-200 8 he Dual proble cont. No e have the follong dual opt proble: hs s agan a quadratc prograng proble. A global au of can alas be found. But hat's the bg deal?? Note to thngs:. can be recovered b 2. he "kernel" 2 a J. s.t. k 0 0 K See net More later

Support vectors Note the KK condton --- onl a fe 's can be nonzero!! g 0 K 5 0 Class 2 8 0.6 0 0 7 0 2 0 Call the tranng data ponts hose 's are nonzero the support vectors SV 4 0 9 0 Class 3 0 6.4 0.8 Erc Xng @ CMU 2006-200 9 Support vector achnes Once e have the Lagrange ultplers { } e can reconstruct the paraeter vector as a eghted cobnaton of the tranng eaples: SV For testng th a ne data z Copute z + b SV z + b and classf z as class f the su s postve and class 2 otherse Note: need not be fored eplctl Erc Xng @ CMU 2006-200 20 0

Interpretaton of support vector achnes he optal s a lnear cobnaton of a sall nuber of data ponts. hs sparse representaton can be veed as data copresson as n the constructon of knn classfer o copute the eghts { } and to use support vector achnes e need to specf onl the nner products or kernel beteen the eaples We ake decsons b coparng each ne eaple z th onl the support vectors: sgn SV z + b Erc Xng @ CMU 2006-200 2 Non-lnearl Separable Probles Class 2 Class We allo error ξ n classfcaton; t s based on the output of the dscrnant functon +b ξ approates the nuber of sclassfed saples Erc Xng @ CMU 2006-200 22

2 Erc Xng @ CMU 2006-200 23 Soft Margn Hperplane No e have a slghtl dfferent opt proble: ξ are slack varables n optzaton Note that ξ 0 f there s no error for ξ s an upper bound of the nuber of errors C : tradeoff paraeter beteen error and argn s.t b + 0 ξ ξ + b C 2 ξ n Erc Xng @ CMU 2006-200 24 he Optzaton Proble he dual of ths ne constraned optzaton proble s hs s ver slar to the optzaton proble n the lnear separable case ecept that there s an upper bound C on no Once agan a QP solver can be used to fnd 2 a J 0. 0 s.t. C K

he SMO algorth Consder solvng the unconstraned opt proble: We ve alread see three opt algorths!??? Coordnate ascend: Erc Xng @ CMU 2006-200 25 Coordnate ascend Erc Xng @ CMU 2006-200 26 3

Sequental nal optzaton Constraned optzaton: a J 2 s.t. 0 C 0. K Queston: can e do coordnate along one drecton at a te.e. hold all [-] fed and update? Erc Xng @ CMU 2006-200 27 he SMO algorth Repeat tll convergence. Select soe par and to update net usng a heurstc that tres to pck the to that ll allo us to ake the bggest progress toards the global au. 2. Re-optze J th respect to and hle holdng all the other k 's k ; fed. Wll ths procedure converge? Erc Xng @ CMU 2006-200 28 4

5 Erc Xng @ CMU 2006-200 29 Convergence of SMO Let s hold 3 fed and reopt J.r.t. and 2 2 a J. 0 s.t. k C 0 K KK: Erc Xng @ CMU 2006-200 30 Convergence of SMO he constrants: he obectve: Constraned opt:

Cross-valdaton error of SVM he leave-one-out cross-valdaton error does not depend on the densonalt of the feature space but onl on the # of support vectors! # support vectors Leave - one - out CV error # of tranng eaples Erc Xng @ CMU 2006-200 3 Suar Ma-argn decson boundar Constraned conve optzaton Dualt he K condtons and the support vectors Non-separable case and slack varables he SMO algorth Erc Xng @ CMU 2006-200 32 6