Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 8, PDF Free Download

Machne Learnng 0-70 Fall 205 Support Vector Machnes Erc Xng Lecture 9 Octoer 8 205 Readng: Chap. 6&7 C.B ook and lsted papers Erc Xng @ CMU 2006-205

What s a good Decson Boundar? Consder a nar classfcaton task th = ± laels not 0/ as efore. When the tranng eaples are lnearl separale e can set the paraeters of a lnear classfer so that all the tranng eaples are classfed correctl Man decson oundares! Generatve classfers Logstc regressons Are all decson oundares equall good? Class Class 2 Erc Xng @ CMU 2006-205 2

Not All Decson Boundares Are Equal! Wh e a have such oundares? Irregular dstruton Ialanced tranng szes outlners Erc Xng @ CMU 2006-205 3

Classfcaton and Margn Paraeterzng decson oundar Let denote a vector orthogonal to the decson oundar and denote a scalar "offset" ter then e can rte the decson oundar as: 0 Class 2 Class d - d + Erc Xng @ CMU 2006-205 4

Classfcaton and Margn Paraeterzng decson oundar Let denote a vector orthogonal to the decson oundar and denote a scalar "offset" ter then e can rte the decson oundar as: 0 Margn +/ > +c/ for all n class 2 +/ < c/ for all n class Class Class 2 d - d + Or ore copactl: + / >c/ he argn eteen an to ponts = d + d + = Erc Xng @ CMU 2006-205 5

6 Mau Margn Classfcaton he nu perssle argn s: Here s our Mau Margn Classfcaton prole: c 2 * * c c / / s.t 2 a Erc Xng @ CMU 2006-205

Mau Margn Classfcaton con'd. he optzaton prole: a s.t But note that the agntude of c erel scales and and does not change the classfcaton oundar at all! h? So e nstead ork on ths cleaner prole: a s.t c he soluton to ths leads to the faous Support Vector Machnes - -- eleved an to e the est "off-the-shelf" supervsed learnng algorth c Erc Xng @ CMU 2006-205 7

Support vector achne A conve quadratc prograng prole th lnear constrans: a s.t he attaned argn s no gven Onl a fe of the classfcaton constrants are relevant support vectors d + d - Constraned optzaton We can drectl solve ths usng coercal quadratc prograng QP code But e ant to take a ore careful nvestgaton of Lagrange dualt and the soluton of the aove n ts dual for. deeper nsght: support vectors kernels ore effcent algorth Erc Xng @ CMU 2006-205 8

9 Dgresson to Lagrangan Dualt he Pral Prole Pral: he generalzed Lagrangan: the 's 0 and 's are called the Lagarangan ultplers Lea: A re-rtten Pral: l h k g f s.t. n 0 0 l k h g f L o/ constrants pral satsfes f a f L 0 a n L 0 Erc Xng @ CMU 2006-205

Lagrangan Dualt cont. Recall the Pral Prole: n a 0 L he Dual Prole: heore eak dualt: a 0 n L d * a 0 n L n a 0 L p * heore strong dualt: Iff there est a saddle pont of L e have d * p * Erc Xng @ CMU 2006-205 0

A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d * a n f g n a 0 0 f g p * f g Erc Xng @ CMU 2006-205

A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d * a n f g n a 0 0 f g p * f g Erc Xng @ CMU 2006-205 2

he KK condtons If there ests soe saddle pont of L then the saddle pont satsfes the follong "Karush-Kuhn-ucker" KK condtons: L L α g g 0 0 0 0 0 k l Copleentar slackness Pral feaslt Dual feaslt heore: If * * and * satsf the KK condton then t s also a soluton to the pral and the dual proles. Erc Xng @ CMU 2006-205 3

4 Solvng optal argn classfer Recall our opt prole: hs s equvalent to Wrte the Lagrangan: Recall that * can e reforulated as No e solve ts dual prole: s.t a s.t n 0 2 2 L * a n L 0 n a L 0 Erc Xng @ CMU 2006-205

5 *** he Dual Prole We nze L th respect to and frst: Note that * ples: Plug *** ack to L and usng ** e have: n a L 0 0 L 0 L * 2 L ** 2 L Erc Xng @ CMU 2006-205

he Dual prole cont. No e have the follong dual opt prole: a J 2 hs s agan a quadratc prograng prole. A gloal au of can alas e found. But hat's the g deal?? Note to thngs:. can e recovered 2. he "kernel" s.t. 0 k 0. See net More later Erc Xng @ CMU 2006-205 6

Support vectors Note the KK condton --- onl a fe 's can e nonzero!! α g 0 5 =0 Class 2 8 =0.6 0 =0 7 =0 2 =0 Call the tranng data ponts hose 's are nonzero the support vectors SV 4 =0 9 =0 Class 3 =0 6 =.4 =0.8 Erc Xng @ CMU 2006-205 7

Support vector achnes Once e have the Lagrange ultplers { } e can reconstruct the paraeter vector as a eghted conaton of the tranng eaples: SV For testng th a ne data z Copute z SV z and classf z as class f the su s postve and class 2 otherse Note: need not e fored eplctl Erc Xng @ CMU 2006-205 8

Interpretaton of support vector achnes he optal s a lnear conaton of a sall nuer of data ponts. hs sparse representaton can e veed as data copresson as n the constructon of knn classfer o copute the eghts { } and to use support vector achnes e need to specf onl the nner products or kernel eteen the eaples We ake decsons coparng each ne eaple z th onl the support vectors: * sgn SV z Erc Xng @ CMU 2006-205 9

Non-lnearl Separale Proles Class 2 Class We allo error n classfcaton; t s ased on the output of the dscrnant functon + approates the nuer of sclassfed saples Erc Xng @ CMU 2006-205 20

2 Non-lnear Decson Boundar So far e have onl consdered large-argn classfer th a lnear decson oundar Ho to generalze t to ecoe nonlnear? Ke dea: transfor to a hgher densonal space to ake lfe easer Input space: the space the pont are located Feature space: the space of after transforaton Wh transfor? Lnear operaton n the feature space s equvalent to non-lnear operaton n nput space Classfcaton can ecoe easer th a proper transforaton. In the XOR prole for eaple addng a ne feature of 2 ake the prole lnearl separale hoeork Erc Xng @ CMU 2006-205 2

Non-lnear Decson Boundar Erc Xng @ CMU 2006-205 22

ransforng the Data Input space. Feature space Note: feature space s of hgher denson than the nput space n practce Erc Xng @ CMU 2006-205 23

Erc Xng @ CMU 2006-205 24 24 he Kernel rck Recall the SVM optzaton prole he data ponts onl appear as nner product As long as e can calculate the nner product n the feature space e do not need the appng eplctl Man coon geoetrc operatons angles dstances can e epressed nner products Defne the kernel functon K 2 a J 0. 0 s.t. C K

Erc Xng @ CMU 2006-205 25 25 An Eaple for feature appng and kernels Consder an nput =[ 2 ] Suppose. s gven as follos An nner product n the feature space s So f e defne the kernel functon as follos there s no need to carr out. eplctl 2 2 2 2 2 2 2 2 2 ' ' 2 2 2 ' ' K

More eaples of kernel functons Lnear kernel e've seen t K ' ' Polnoal kernel e ust sa an eaple K here p = 2 3 o get the feature vectors e concatenate all pth order polnoal ters of the coponents of eghted appropratel ' ' p Radal ass kernel K ' ep ' 2 In ths case the feature space conssts of functons and results n a nonparaetrc classfer. 2 Erc Xng @ CMU 2006-205 26

he essence of kernel Feature appng ut thout pang a cost E.g. polnoal kernel Ho an densons e ve got n the ne space? Ho an operatons t takes to copute K? Kernel desgn an prncple? Kz can e thought of as a slart functon eteen and z hs ntuton can e ell reflected n the follong Gaussan functon Slarl one can easl coe up th other K n the sae sprt Is ths necessarl lead to a legal kernel? n the aove partcular case K s a legal one do ou kno ho an denson s? Erc Xng @ CMU 2006-205 27

Kernel atr Suppose for no that K s ndeed a vald kernel correspondng to soe feature appng then for e can copute an atr here hs s called a kernel atr! No f a kernel functon s ndeed a vald kernel and ts eleents are dot-product n the transfored feature space t ust satsf: Setr K=K proof Postve sedefnte proof? Erc Xng @ CMU 2006-205 28

Mercer kernel Erc Xng @ CMU 2006-205 29

SVM eaples Erc Xng @ CMU 2006-205 30

Eaples for Non Lnear SVMs Gaussan Kernel Erc Xng @ CMU 2006-205 3

32 Soft Margn Hperplane No e have a slghtl dfferent opt prole: are slack varales n optzaton Note that =0 f there s no error for s an upper ound of the nuer of errors C : tradeoff paraeter eteen error and argn s.t 0 C 2 n Erc Xng @ CMU 2006-205

33 3 he Optzaton Prole he dual of ths ne constraned optzaton prole s hs s ver slar to the optzaton prole n the lnear separale case ecept that there s an upper ound C on no Once agan a QP solver can e used to fnd 2 a J 0. 0 s.t. C Erc Xng @ CMU 2006-205

he SMO algorth Consder solvng the unconstraned opt prole: We ve alread see three opt algorths!??? Coordnate ascend: Erc Xng @ CMU 2006-205 34

Coordnate ascend Erc Xng @ CMU 2006-205 35

36 Sequental nal optzaton Constraned optzaton: Queston: can e do coordnate along one drecton at a te.e. hold all [-] fed and update? 2 a J 0. 0 s.t. C Erc Xng @ CMU 2006-205

he SMO algorth Repeat tll convergence. Select soe par and to update net usng a heurstc that tres to pck the to that ll allo us to ake the ggest progress toards the gloal au. 2. Re-optze J th respect to and hle holdng all the other k 's k ; fed. Wll ths procedure converge? Erc Xng @ CMU 2006-205 37

38 Convergence of SMO Let s hold 3 fed and reopt J.r.t. and 2 2 a J. 0 s.t. k C 0 KK: Erc Xng @ CMU 2006-205

Convergence of SMO he constrants: he oectve: Constraned opt: Erc Xng @ CMU 2006-205 39

Cross-valdaton error of SVM he leave-one-out cross-valdaton error does not depend on the densonalt of the feature space ut onl on the # of support vectors! Leave - one - out CV error # support vectors # of tranng eaples Erc Xng @ CMU 2006-205 40

Suar Ma-argn decson oundar Constraned conve optzaton Dualt he K condtons and the support vectors Non-separale case and slack varales he kernel trck he SMO algorth Erc Xng @ CMU 2006-205 4

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 8, 2015