SVMs: Duality and Kernel Trick. SVMs as quadratic programs

11/17/9 SVMs: Dualt and Kernel rck Machne Learnng - 161 Geoff Gordon MroslavDudík [[[partl ased on sldes of Zv-Bar Joseph] http://.cs.cmu.edu/~ggordon/161/ Novemer 18 9 SVMs as quadratc programs o optmzaton prolems: For the separale and non separale cases n mn mn + Cε = 1 x + 1 x + 1-1

11/17/9 Dual for separale case mn x + 1 Dual for separale case mn x + 1

11/17/9 3 Dual for separale case x + 1 L 1 x Lagrangean mn Dual for separale case x + 1 L 1 x Lagrangean mn max L mn max L mn

11/17/9 Dual for separale case mn x + 1 Lagrangean L x 1 mn max L optmalt of for all : x 1 max mn L optmalt of and x Dual for separale case Dual formulaton max mn x 1 Optmalt condtons KK condtons x x 1 4

11/17/9 5 Optmalt condtons KK condtons j j j 1 max j x x x 1 mn max x 1 x Dual formulaton Dual for separale case Optmalt condtons KK condtons j j j 1 max j x x x 1 x Dual formulaton Dual for separale case

11/17/9 Dual SVM - nterpretaton x For s that are not ; Dual SVM for lnearl separale case Our dual target functon: max 1 o evaluate a ne sample x e need to compute: x xx j j j x x j hs mght e too much ork! e.g. hen lftng x nto hgh dmensons Dot product across all pars of tranng samples Dot product th all tranng samples 6

11/17/9 Classfng n 1-d Can an SVM correctl classf ths data? What aout ths? X X Classfng n 1-d Can an SVM correctl classf ths data? And no? X X X 7

11/17/9 Non-lnear SVDs n -d he orgnal nput space x can e mapped to some hgher-dmensonal feature space φxhere the tranng set s separale: x=x 1 x φx =x 1 x x 1 x x 1 x If data s mapped nto suffcentl hgh dmenson then samples ll n general e lnearl separale; φ : x φx N data ponts are n general separale n a space of N-1 dmensons or more!!! x x 1 hs slde s courtes of.ro.umontreal.ca/~pft68/documents/papers/svm_tutoral.ppt ransformaton of Inputs Possle prolems - Hgh computaton urden due to hgh-dmensonalt - Man more parameters SVM solves these to ssues smultaneousl Kernel trcks for effcent computaton Dual formulaton onl assgns parameters to samples not features Input space φ. φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ Feature space 8

11/17/9 Polnomals of degree to Whle orkng n hgher dmensons s enefcal t also ncreases our runnng tme ecause of the dot product computaton Hoever there s a neat trck e can use max j jx xj j consder all quadratc terms for x 1 x x m he term ll ecome clear n the next slde x 1 x1 xm x1 xm m+1 lnear terms m quadratc terms m s the numer of features n each vector x1x xm1xm mm-1/ parse terms Dot product for polnomals of degree to Ho man operatons do e need for the dot product? 1 1 x z x1 xm x1 xm x1x xm1xm z1 zm z1 zm z1z zm1zm x z x z j 1 x x j z z j 1 m m mm-1/ =~ m 9

11/17/9 Polnomals of degree d n m varales Polnomals of degree d n m varales Orgnal formulaton Mn / φx + 1 Dual formulaton max j jx xj j 1

11/17/9 he kernel trck Ho man operatons do e need for the dot product? x z x z x z x x j z z j 1 j 1 here s structure to ths dot product e can do ths faster! m m mm-1/ =~ m x.z 1 x.z x.z 1 We onl need m operatons! x z x z 1 x z x z j 1 x x j z z j 1 Note that to evaluate a ne sample e are also usng dot products so e save there as ell Where e are Our dual target functon: 1 max j jx x j j mn operatons to evaluate all coeffcents o evaluate a ne sample x e need to compute: x x x mr operatons here r s the numer of support vectors > 11

11/17/9 1 Other kernels Beond polnomals there are other ver hgh dmensonal ass functons that can e made practcal fndng the rght k ernel functon - Radal-Bass Functon: - kernel functons for dscrete ojects graphs strngs etc. exp z x z x K Kernels measure smlart K φ x x x exp z x z x K Decson rule for a ne sample x:

11/17/9 hs slde s courtes of Haste-shran-Fredman nd ed. Dual formulaton for non-separale case Dual target functon: max 1 j C j j x x j o evaluate a ne sample x e need to compute: x x x he onl dfference s that the I s are no ounded 13

11/17/9 Wh do SVMs ork? If e are usng huge features spaces th kernels ho come e are not overfttng the data? - We maxmze margn! - We mnmze loss + regularzaton Softare A lst of SVM mplementaton can e found at http://.kernel-machnes.org/softare.html Some mplementaton such as LIBSVM can handle multclass classfcaton SVMLght s among one of the earlest mplementaton of SVM Several Matla tooloxes for SVM are also avalale 14

11/17/9 Mult-class classfcaton th SVMs What f e have data from more than to classes? Most common soluton: One vs. all - create a classfer for each class aganst all other data - for a ne pont use all classfers and compare the margn for all selected classes Note that ths s not necessarl vald snce ths s not hat e traned the SVM for ut often orks ell n practce Applcatons of SVMs Bonformatcs Machne Vson ext Categorzaton Rankng e.g. Google searches Handrtten Character Recognton me seres analss Lots of ver successful applcatons!!! 15

11/17/9 Handrtten dgt recognton Important ponts Dfference eteen regresson classfers and SVMs Maxmum margn prncple arget functon for SVMs Lnearl separale and non separale cases Dual formulaton of SVMs Kernel trck and computatonal complext 16