UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$9:$Support$Vector$Machine$ (Cont.$Revised$Advanced$Version)$

Size: px

Start display at page:

Download "UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$9:$Support$Vector$Machine$ (Cont.$Revised$Advanced$Version)$"

Alexis Walters
5 years ago
Views:

YanjunQ UnverstyofVrgna Departmentof ComputerScence ATT: there exst some nconsstency of math notatons

1 9/30/15 UVACS6316 Fall015Graduate: MachneLearnng Lecture9:SupportVectorMachne (Cont.RevsedAdvancedVerson) Dr.YanjunQ UnverstyofVrgna Departmentof ComputerScence ATT: there exst some nconsstency of math notatons across sldes. Notatons n each slde are mostly selfconsstent. 1 Whereweare?! FvemajorsecHonsofthscourse " Regresson(supervsed) " ClassfcaHon(supervsed) " Unsupervsedmodels " Learnngtheory " Graphcalmodels 9/30/15

classfcaton approaches nto roughly three major types 1.

Instance based classfers - Use observaton drectly (no models) - e.g.

2 Whereweare?! ThreemajorsecHonsforclassfcaHon We can dvde the large varety of classfcaton approaches nto roughly three major types 1. Dscrmnatve - drectly estmate a decson rule/boundary - e.g., support vector machne, decson tree. Generatve: - buld a generatve statstcal model - e.g., Bayesan networks 3. Instance based classfers - Use observaton drectly (no models) - e.g. K nearest neghbors 9/30/15 3 ADataset forbnary classfcahon Output as Bnary Class Label: 1 or -1 Data/ponts/nstances/examples/samples/records:[rows] Features/a0rbutes/dmensons/ndependent3varables/covarates/ predctors/regressors:[columns,exceptthelast] Target/outcome/response/label/dependent3varable:specal 9/30/15 columntobepredcted[lastcolumn] 4

Intutve, makes sense Some theoretcal support Works well n practce 9/30/15 x 5 1

3 Max margn classfers Instead of fttng all ponts, focus on boundary ponts Learn a boundary that leads to the largest margn from ponts on both sdes x Why? Intutve, makes sense Some theoretcal support Works well n practce 9/30/15 x 5 1 WhenlnearlySeparableCase Thedecsonboundaryshouldbeasfarawayfrom thedataofbothclassesaspossble Class 1 W s a p-dm vector; b s a scalar Class -1 9/30/15 6

4 Today " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter # OpHmzaHontolearnmodelparameters(w,b) # Nonlnearlyseparablecase # OpHmzaHonwthdualform # Nonlneardecsonboundary # PracHcalGude 9/30/15 7 Today " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter # OpHmzaHontolearnmodelparameters(w,b) # Nonlnearlyseparablecase # OpHmzaHonwthdualform # Nonlneardecsonboundary # PracHcalGude 9/30/15 8

5 Optmzaton Step.e. learnng optmal parameter for SVM Predct class +1 M M = w T w w T x+b=+1 w T x+b=0 w T x+b=-1 Predct class -1 Mn (w T w)/ subject to the followng constrants: For all x n class + 1 w T x+b >= 1 For all x n class - 1 w T x+b <= -1 } A total of n constrants f we have n nput samples 9/30/15 9 Optmzaton Step.e. learnng optmal parameter for SVM Predct class +1 M M = w T w w T x+b=+1 w T x+b=0 w T x+b=-1 Predct class -1 Mn (w T w)/ subject to the followng constrants: For all x n class + 1 w T x+b >= 1 For all x n class - 1 w T x+b <= -1 } A argmn w,b subject to x Dtran : y total of n constrants f we have n nput samples 9/30/15 10 p =1 w ( x w + b) 1

6 Optmzaton Revew: Ingredents ObjecHvefuncHon Varables Constrants Fndvaluesofthevarables thatmnmzeormaxmzetheobjechvefunchon whlesahsfyngtheconstrants 9/30/15 11 Optmzaton wth Quadratc programmng (QP) Quadratc programmng solves optmzaton problems of the followng form: mn U u T Ru + d T u + c subject to n nequalty constrants: a 11 u 1 + a 1 u +... b 1!!! a n1 u 1 + a n u +... b n and k equvalency constrants: a n +1,1 u 1 + a n +1, u +... = b n +1!!! Quadratc term When a problem can be specfed as a QP problem we can use solvers that are better than gradent descent or smulated annealng a n +k,1 u 1 + a n +k, u +... = b n +k 9/30/15 1

7 SVM as a QP problem w T x+b=+1 w T x+b=0 w T x+b=-1 Predct class +1 Predct class -1 M M = w T w R as I matrx, d as zero vector, c as 0 value mn U u T Ru + d T u + c subject to n nequalty constrants: a 11 u 1 + a 1 u +... b 1!!! Mn (w T w)/ subject to the followng nequalty constrants: For all x n class + 1 w T x+b >= 1 For all x n class - 1 w T x+b <= -1 } A total of n constrants f we have n nput samples a n1 u 1 + a n u +... b n and k equvalency constrants: a n +1,1 u 1 + a n +1, u +...= b n +1!!! a n +k,1 u 1 + a n +k, u +...= b n +k 9/30/15 13 Today " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter # OpHmzaHontolearnmodelparameters(w,b) # Nonlnearlyseparablecase # OpHmzaHonwthdualform # Nonlneardecsonboundary # PracHcalGude 9/30/15 14

8 Non lnearly separable case So far we assumed that a lnear plane can perfectly separate the ponts But ths s not usally the case - nose, outlers Hard to solve (two mnmzaton problems) How can we convert ths to a QP problem? - Mnmze tranng errors? mn w T w mn #errors - Penalze tranng errors: mn w T w+c*(#errors) Hard to encode n a QP problem 9/30/15 15 Non lnearly separable case Instead of mnmzng the number of msclassfed ponts we can mnmze the dstance between these ponts and ther correct plane The new optmzaton problem s: ε k +1 plane ε -1 plane j w T n w mn w! + Cε =1 subject to the followng nequalty constrants: For all x n class + 1 ε w T x +b >= 1- For all x n class - 1 ε w T x +b <= -1+ Wat. Are we mssng somethng? 9/30/15 16

9 Fnal optmzaton for non lnearly separable case The new optmzaton problem s: w T n w mn w + Cε =1 subject to the followng nequalty constrants: }A total of n constrants For all! ε 0 } Another n constrants 9/30/15 17 Where we are Two optmzaton problems: For the separable and non separable cases w T n w T w w mn mn w + Cε w =1 For all x n class + 1 w T x+b >= 1 For all x n class - 1 w T x+b <=-1 For all! ε 0 9/30/15 18

10 ModelSelecHon,fndrghtC Selectthe rght penalty parameter C 9/30/15 19 Today " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter # OpHmzaHontolearnmodelparameters(w,b) # Nonlnearlyseparablecase # OpHmzaHonwthdualform # Nonlneardecsonboundary # PracHcalGude 9/30/15 0

swtchng to ths type of representaton s that t would allow us to use a neat trck that wll make our lves easer (and the run tme

11 Where we are Two optmzaton problems: For the separable and non separable cases w T n w Mn (w T mn w)/ w + Cε =1 For all! ε 0 Instead of solvng these QPs drectly we wll solve a dual formulaton of the SVM optmzaton problem The man reason for swtchng to ths type of representaton s that t would allow us to use a neat trck that wll make our lves easer (and the run tme faster) 9/30/15 1 Optmzaton Revew: ConstranedOpHmzaHon mn u u s.t.u>=b Allowed mn Case1: b Global mn Allowed mn Case: b Global mn 9/30/15

Optmzaton Revew: ConstranedOpHmzaHonwthLagrange Whenequalconstrants!

hgherfdmensonalproblem Mnmze f ( x) + g ( x) λ w.r.t.

IntroducngaLagrangemulHplerforeachconstrant

12 Optmzaton Revew: ConstranedOpHmzaHonwthLagrange Whenequalconstrants!opHmzef(x),3subjecttog (x)=0 MethodofLagrangemulHplers:converttoa hgherfdmensonalproblem Mnmze f ( x) + g ( x) λ w.r.t. ( x1 x n 1 k ; λ λ ) 9/30/15 IntroducngaLagrangemulHplerforeachconstrant ConstructtheLagranganfortheorgnalopHmzaHonproblem 3 OpHmzaHonRevew:Lagrangan (moregeneralstandardform) 9/30/15 FromStanford ConvexOpHmzaHon Boyd&Vandenberghe 4

13 mn u u s.t.u>=b 9/30/15 5 mn u u s.t.u>=b Dual Prmal 9/30/15 6

Optmzaton Revew: LagranganDualty ThePrmalProblem mn w Prmal: s.t. ThegeneralzedLagrangan:! 9/30/15! =1 =1 the 's( 0)and 'sarecalledthelagaranganmulhplers Lemma:! AreNwrPenPrmal: k f 0 (w) f (w) 0,!

14 Optmzaton Revew: LagranganDualty ThePrmalProblem mn w Prmal: s.t. ThegeneralzedLagrangan:! 9/30/15! =1 =1 the 's( 0)and 'sarecalledthelagaranganmulhplers Lemma:! AreNwrPenPrmal: k f 0 (w) f (w) 0,!!! = 1,,k h (w)= 0,!!! = 1,,l L(w,α,β)= f 0 (w)+ α f (w) + β h (w) max!l(w,α,β)= f 0 (w) f!w!satsfes!prmal!constrants α,β,α 0 o/w mn w maxα, β, α 0 L ( w, α, β ) l ErcXng@CMU,006f008 7 Optmzaton Revew: Dual Func Recall that Lagrange multplers can be appled to turn the followng problem: L3 9/30/15 8

15 Optmzaton Revew: LagranganDualty,cont. RecallthePrmalProblem: mn w maxα, β, α 0 L ( w, α, β ) TheDualProblem: max α β, α 0 Theorem(weakdualty): d, mn w L ( w, α, β ) * = maxα, β, α 0 mnw L ( w, α, β) mnw maxα, β, α 0 L ( w, α, β) = p * Theorem(strongdualty): Iffthereexstasaddlepontof,wehave 9/30/15 L ( w, α, β ) * d = p * 9 OpHmzaHonRevew:LagrangedualfuncHon Inf(.):greatest lowerbound 9/30/15 30

16 An alternatve representaton of the SVM QP We wll start wth the lnearly separable case Instead of encodng the correct classfcaton rule and constrant we wll use Lagrange multples to encode t as part of the our mnmzaton problem Mn (w T w)/ s.t. (w T x +b)y >= 1 Recall that Lagrange multplers can be appled to turn the followng problem: N ( ) L prmal = 1! w α y (w x + b) 1 =1 9/30/15 31 w T w mn w,b max α α [(wt x + b)y 1] α! 0 9/30/15 3

17 TheDualProblem WemnmzeLwthrespecttowandbfrst:! Notethat(*)mples: max mn, L ( w, b, α) α 0 Plus(***)backtoL,andusng(**),wehave: w b tran w L(w,b,α )! = w α y x = 0,! =1 tran b L(w,b,α )! = α y = 0, w =! tran =1 =1 α y x () * () ** () *** 9/30/15 L(w,b,α )= α 1 =1!,j=1 α α j y y j (x T x j ) 33 Summary:DualforSVM Solvngforwthatgvesmaxmummargn: 1. CombneobjecHvefuncHonandconstrantsntonew objechvefunchon,usnglagrangemulhplers3\alpha 3 N ( ) L prmal = 1! w α y (w x + b) 1 =1. TomnmzethsLagrangan,wetakedervaHvesofwandb andsetthemto0: 9/30/15 34

$Oncewehavethe\alpha,wecansubsHtutentoprevous equahonstogetwandb. 5. ThsdefneswandbaslnearcombnaHonsofthetranng data.$ 9/30/15 35 Optmzaton Revew: DualProblem Solvngdualproblemfthedual formseaserthanprmalform Needtochangeprmal mnmzahontodual maxmzahon(or!

9/30/15 35 Optmzaton Revew: DualProblem Solvngdualproblemfthedual formseaserthanprmalform Needtochangeprmal mnmzahontodual maxmzahon(or!

18 Summary:DualforSVM 3. SubsHtuHngandrearrangnggvesthedualoftheLagrangan: N L dual = α 1 α =1 α j y y j x x j,j! whchwetrytomaxmze(notmnmze). 4. Oncewehavethe\alpha,wecansubsHtutentoprevous equahonstogetwandb. 5. ThsdefneswandbaslnearcombnaHonsofthetranng data. 9/30/15 35 Optmzaton Revew: DualProblem Solvngdualproblemfthedual formseaserthanprmalform Needtochangeprmal mnmzahontodual maxmzahon(or!needto changeprmalmaxmzahonto dualmnmzahon) Onlyvaldwhentheorgnal ophmzahonproblemsconvex/ concave(strongdualty) PrmalProblem,e.g., x * = argmn f (x) x subject to h(x) = c DualProblem, e.g., λ * = argmax λ g(λ) Strong dualty g(λ) = nf( f (x) + λ(h(x) c)) x 9/30/15 36

19 Summary: Dual SVM for lnearly separable case Substtutng w nto our target functon and usng the addtonal constrant we get: Dual formulaton max α α 1 α y = 0 α 0,j α α j y y j x T x j EaserthanorgnalQP,moreeffcentalgorthmsexsttofnd 9/30/15 37 OpHmzaHonRevew: nf(.):greatestlowerbound 9/30/15 38

20 OpHmzaHonRevew: KeyforSVMDual 9/30/15 39 KKT=>Supportvectors NotetheKKTcondHonfffonlyafew 'scan benonzero!! ( ) = 0,!!!! = 1,,n! α y (w x + b) 1 Class 8 = =0 Callthetranngdataponts whose 'sarenonzerothe supportvectors(sv) 5 =0 7 =0 =0 4 =0 6 =1.4 1 =0.8 9/30/15 9 =0 Class 1 3 =0 40

21 Dual SVM - nterpretaton w = α x y α For that are 0, no nfluence 9/30/15 41 Dual SVM for lnearly separable case Our dual target functon: max α α 1 α y = 0 α 0 To evaluate a new sample x ts we need to compute: w T x ts + b =!,j α α j y y j x T x j α y x T x ts + b! = sgn α y x T x ts Dot product for all tranng samples Dot product wth tranng samples y ts + b 9/30/15 SV 4! ( )

Dual formulaton for lnearly non separable case Dual target

$throughkffoldscv The only dfference s that the \alpha are now$ bounded To evaluate a new sample x j we need to compute: w T x j

ophmzahonproblemnthelnear separablecase,exceptthattheres

22 Dual formulaton for lnearly non separable case Dual target functon: max α α 1 α y = 0 C >α! 0,,j α α j y y j x T x j HyperparameterC shouldbetuned throughkffoldscv The only dfference s that the \alpha are now bounded To evaluate a new sample x j we need to compute: w T x j + b = α y x T x j + b Thssverysmlartothe ophmzahonproblemnthelnear separablecase,exceptthattheres anupperboundcon now Onceagan,effcentalgorthmexst tofnd 9/30/15 43 FastSVMImplementaHons SMO:SequenHalMnmalOpHmzaHon SVMfLght LbSVM BSVM 9/30/15 44

23 SMO:SequenHalMnmalOpHmzaHon Keydea DvdethelargeQPproblemofSVMntoaseresof smallestpossbleqpproblems,whchcanbesolved analyhcallyandthusavodsusngahmefconsumng numercalqpntheloop(akndofsqpmethod). Spacecomplexty:O(n). SnceQPsgreatlysmplfed,mostHmefconsumngpartof SMOstheevaluaHonofdecsonfuncHon,thereforets veryfastforlnearsvmandsparsedata. 9/30/15 45 SMO Ateachstep,SMOchoosesLagrangemulHplersto jontlyophmze,fndtheophmalvaluesforthese mulhplersandupdatesthesvmtoreflectthenew ophmalvalues. Threecomponents AnanalyHcmethodtosolveforthetwoLagrange mulhplers AheursHcforchoosngwhchmulHplerstoopHmze AmethodforcompuHngbateachstep,sothattheKTT condhonsarefulflledforboththetwoexamples (correspondngtothetwomulhplers) 9/30/15 46

24 ChoosngWhchMulHplerstoOpHmze FrstmulHpler IterateovertheenHretranngset,andfndanexample thatvolatesthekttcondhon. SecondmulHpler MaxmzetheszeofsteptakendurngjontopHmzaHon. E 1 fe,wheree stheerroronthefthexample. 9/30/15 47 Today " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter # OpHmzaHontolearnmodelparameters(w,b) # Nonlnearlyseparablecase # OpHmzaHonwthdualform # Nonlneardecsonboundary # PracHcalGude 9/30/15 48

25 Classfyng n 1-d Can an SVM correctly classfy ths data? What about ths? X X 9/30/15 49 Classfyng n 1-d Can an SVM correctly classfy ths data? And now? (extend wth polynomal bass ) X X X 9/30/15 50

26 RECAP:Polynomalregresson IntroducebassfuncHons 9/30/15 51 Dr.NandodeFretas stutoralslde NonflnearSVMs:D Theorgnalnputspace(x)canbemappedtosomehgherfdmensonal featurespace(φ(x))wherethetranngsetsseparable: x=(x 1,x ) φ(x) =(x 1,x, x 1 x ) x 1 x Φ: x φ(x) x x 1 9/30/15 5 Ths slde s courtesy of

27 NonflnearSVMs:D Theorgnalnputspace(x)canbemappedtosomehgherfdmensonal featurespace(φ(x))wherethetranngsetsseparable: x=(x 1,x ) φ(x) =(x 1,x, x 1 x ) x 1 x If data s mapped nto suffcently hgh dmenson, then samples wll n general Φ: x φ(x) be lnearly separable; N data ponts are n general separable n a space of N-1 dmensons or more!!! x x 1 9/30/15 53 Ths slde s courtesy of A lttle bt theory: Vapnk-Chervonenks (VC) dmenson If data s mapped nto suffcently hgh dmenson, then samples wll n general be lnearly separable; N data ponts are n general separable n a space of N-1 dmensons or more!!! VCdmensonofthesetoforentedlnesnR s3 ItcanbeshownthattheVCdmensonofthefamlyof orentedseparahnghyperplanesnr N satleastn+1 9/30/15 54

28 Transformaton of Inputs Possble problems - Hgh computaton burden due to hgh-dmensonalty - Many more parameters SVM solves these two ssues smultaneously Kernel trcks for effcent computaton Is ths too much computatonal work? Dual formulaton only assgns parameters to samples, not features Input space Φ(.) Φ( ) Φ( ) Φ( ) Φ ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Φ( ) Feature space 9/30/15 55 Quadratc kernels Whle workng n hgher dmensons s benefcal, t also ncreases our runnng tme because of the dot product computaton However, there s a neat trck we can use max α α α α j y y j Φ(x ) T Φ(x j ),j α y = 0 α 0 consder all quadratc terms for x 1, x x m weghts on quadratc terms wll become clear n the next slde Φ(x) = 1 x 1! x m x 1! x m m+1 lnear terms m quadratc terms m s the number of features n each vector K(x, z) := Φ(x) T Φ(z) x 1 x! m(m-1)/ parwse terms x m 1 x m 9/30/15 56

29 Dot product for quadratc kernels How many operatons do we need for the dot product? 1 1 x 1! z 1! Φ(x) T Φ(z) = x m x 1! x m. z m z 1! z m = x z + x z + x x j z z j +1 j= +1 m m m(m-1)/ =~ m x 1 x! z 1 z! K(x, z) := Φ(x) T Φ(z) x m 1 x m z m 1 z m 9/30/15 57 The kernel trck How many operatons do we need for the dot product? Φ(x) T Φ(z) = x z + x z + x x j z z j +1 j= +1 m m m(m-1)/ =~ m However, we can obtan dramatc savngs by notng that Φ(x) T Φ(z) = (x T z +1) = (x.z +1) = (x.z) + (x.z)+1 = ( x z ) + x z +1 We only need m operatons! = x z + x z + x x j z z j +1 9/30/15 58 j=+1 So,fwedefnethekernelfuncUonasfollows,theres noneedtocarryoutbassfunchon\ (.)explctly K(x, z) = (x T z +1)

30 Where we are Our dual target functon: max α α 1 α α j y y j Φ(x ) T Φ(x j ) α y = 0 α 0,j mn operatons at each teraton To evaluate a new sample x k we need to compute: w T Φ(x j )+ b = α y Φ(x ) T Φ(x j )+ b mr operatons where r are the number of support vectors (whose \alpha >0) useofkernelfunchontoavodcarryngout \ (.)explctlysknownasthekerneltrck So,fwedefnethekernelfuncUonasfollows,there snoneedtocarryout (.)representahonexplctly K(x, z) = (x T z +1) 9/30/15 59 Summary: ModfcaHonDuetoKernelFuncHon ChangeallnnerproductstokernelfuncHons Fortranng, max Orgnal α α 1 α α j y y j x T x j,j Lnear α y = 0 C >α! 0, tran 9/30/15 Wth kernel functon - nonlnear max α α 1 α α j y y j K(x,x j ) α y = 0 C >α! 0, tran,j 60

31 Summary: ModfcaHonDuetoKernelFuncHon FortesHng,thenewdatax_ts Orgnal Lnear! y = sgn α y ts x T! x ts + b tran Wth kernel functon - nonlnear! y = sgn α y ts K(x!,x ts )+ b tran 9/30/15 61 MoreexamplesofkernelfuncHons Lnearkernel(we'veseent) T K ( x, x') = x x' Polynomalkernel(wejustsawanexample) wherep=,3, Togetthefeaturevectorsweconcatenateallpthorder polynomaltermsofthecomponentsofx(weghtedapproprately) Radalbasskernel 9/30/15 ( ) d! K(x,x')= 1+ xt x' K(x,x')= exp r x x'! Inthscase.,rshyperpara.ThefeaturespaceoftheRBFkernelhasannfnte numberofdmensons Neverrepresentfeaturesexplctly Computedotproductsnclosedform VerynteresHngtheory ReproducngKernelHlbertSpaces Notcoveredndetalhere 6

32 KernelFuncHon:ImplctBass RepresentaHon Forsomekernels(e.g.RBF)themplct transformbassform\ph(x)snfntef dmensonal! ButcalculaHonswthkernelaredonenorgnalspace,so computahonalburdenandcurseofdmensonaltyaren ta problem. 9/30/15 63 Anexample:Supportvectormachneswth polynomalkernel 9/30/15 64

33 KernelFuncHons InpracHcaluseofSVM,onlythekernelfuncHon(andnot (.))sspecfed KernelfuncHoncanbethoughtofasasmlartymeasure betweenthenputobjects NotallsmlartymeasurecanbeusedaskernelfuncHon, howevermercer'scondhonstatesthatanyposhvesemf defntekernelk(x,y),.e. canbeexpressedasadotproductnahghdmensonalspace. 9/30/15 65 ChoosngtheKernelFuncHon ProbablythemosttrckypartofusngSVM. ThekernelfuncHonsmportantbecausetcreatesthekernel matrx,whchsummarzeallthedata Manyprncpleshavebeenproposed(dffusonkernel,Fsher kernel,strngkernel,treekernel,graphkernel, ) KerneltrckhashelpedNonftradHonaldatalkestrngsandtreesable tobeusedasnputtosvm,nsteadoffeaturevectors InpracHce,alowdegreepolynomalkernelorRBFkernelwth areasonablewdthsagoodnhaltryformostapplcahons. 9/30/15 66

34 Why do SVMs work? " If we are usng huge features spaces (e.g., wth kernels), how come we are not overfttng the data? # Number of parameters remans the same (and most are set to 0) # Whle we have a lot of nput values, at the end we only care about the support vectors and these are usually a small group of samples # The mnmzaton (or the maxmzng of the margn) functon acts as a sort of regularzaton term leadng to reduced overfttng 9/30/15 67 WhySVMWorks? Vapnkarguesthatthefundamentalproblemsnotthenumberofparameters tobeeshmated.rather,theproblemsabouttheflexbltyofaclassfer Vapnkarguesthattheflexbltyofaclassfershouldnotbecharacterzedby thenumberofparameters,butbythecapactyofaclassfer Thssformalzedbythe VCfdmenson ofaclassfer TheSVMobjecHvecanalsobejusHfedbystructuralrskmnmzaHon:the emprcalrsk(tranngerror),plusatermrelatedtothegeneralzahonablty oftheclassfer,smnmzed Anothervew:theSVMlossfuncHonsanalogoustordgeregresson.The term½ w shrnks theparameterstowardszerotoavodoverfång 9/30/15 68

35 Today " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter # OpHmzaHontolearnmodelparameters(w,b) # Nonlnearlyseparablecase # OpHmzaHonwthdualform # Nonlneardecsonboundary # PracHcalGude 9/30/15 69 Software A lst of SVM mplementaton can be found at Some mplementaton (such as LIBSVM) can handle mult-class classfcaton SVMLght s among one of the earlest mplementaton of SVM Several Matlab toolboxes for SVM are also avalable 9/30/15 70

36 Summary:StepsforUsngSVMnHW Preparethefeaturefdatamatrx SelectthekernelfuncHontouse SelecttheparameterofthekernelfuncHonandthe valueofc3 YoucanusethevaluessuggestedbytheSVMsoÇware,oryou cansetapartavaldahonsettodetermnethevaluesofthe parameter Executethetranngalgorthmandobtanthe Unseendatacanbeclassfedusngthe andthe supportvectors 9/30/15 71 Practcal Gude to SVM Fromauthorsofas LIBSVM: APracHcalGudetoSupportVectorClassfcaHon ChhfWeHsu,ChhfChungChang,andChhfJen Ln,003f010 hñp:// gude.pdf 9/30/15 7

37 LIBSVM hñp:// # DevelopedbyChhfJenLnetc. # ToolsforSupportVectorclassfcaHon 9/30/15 # AlsosupportmulHfclassclassfcaHon # C++/Java/Python/Matlab/Perlwrappers # Lnux/UNIX/Wndows # SMOmplementaHon,fast!!! APracHcalGudetoSupportVector ClassfcaHon 73 (a) Data fle formats for LIBSVM Tranng.dat +11: :13:14:f f11: :f14:f :1 +11: :13:f :f f11: :13:14:f : TesHng.dat 9/30/15 74

38 (b)featurepreprocessng (1)CategorcalFeature Recommendusngmnumberstorepresentanmf categoryañrbute. Onlyoneofthemnumberssone,andothersarezero. Forexample,athreefcategoryaÑrbutesuchas{red, green,blue}canberepresentedas(0,0,1),(0,1,0),and (1,0,0) 9/30/15 APracHcalGudetoSupportVector 75 ClassfcaHon FeaturePreprocessng ()ScalngbeforeapplyngSVMsvery mportant toavodañrbutesngreaternumercranges domnahngthosensmallernumercranges. toavodnumercaldffculhesdurngthecalculahon RecommendlnearlyscalngeachaÑrbutetothe range[1,+1]or[0,1]. 9/30/15 76 APracHcalGudetoSupportVector ClassfcaHon

39 FeaturePreprocessng 9/30/15 APracHcalGudetoSupportVector 77 ClassfcaHon FeaturePreprocessng (3)mssngvalue Veryverytrcky! Easyway:tosubsHtutethemssngvaluesbythe meanvalueofthevarable AlÑlebtharderway:mputaHonusngnearest neghbors Evenmorecomplex:e.g.EMbased(beyondthe scope) 9/30/15 78 APracHcalGudetoSupportVector ClassfcaHon

40 (c)modelselechon 9/30/15 79 (c)modelselechon(e.g.forlnearkernel) Selectthe rght penalty parameter C 9/30/15 80

41 (c)modelselechon Threeparametersforapolynomalkernel 9/30/15 APracHcalGudetoSupportVector ClassfcaHon 81 (d)ppelneprocedures (1)tran/test ()kffoldscrossvaldahon (3)kfCVontrantochoose hyperparameter/thentest 9/30/15 8

) 9/30/15 83 Evaluaton Choce-II: CrossValdaHon Problem:don thaveenoughdatatosetasdea testset

42 Evaluaton Choce-I: Tran and Test Tranngdataset consstsofnputf outputpars Evaluaton f(x? ) Measure Loss on par! ( f(x? ), y? ) 9/30/15 83 Evaluaton Choce-II: CrossValdaHon Problem:don thaveenoughdatatosetasdea testset SoluHon:Eachdatapontsusedbothastran andtest Commontypes: fkffoldcrossfvaldahon(e.g.k=5,k=10) fffoldcrossfvaldahon fleavefonefoutcrossfvaldahon(loocv) 9/30/15 AgoodpracHces:torandomshuffleall tranngsamplebeforesplång

Why Maxmum Margn for SVM? denotes +1 denotes -1 Support Vectors are those dataponts that the margn pushes up aganst 1.

n x the - b) locaton of the boundary (t s been jolted n ts perpendcular The drecton) maxmum ths gves us least chance of causng margn a lnear

LOOCV s easy snce the model s mmune to removal of any non-supportvector dataponts. 4.

43 Why Maxmum Margn for SVM? denotes +1 denotes -1 Support Vectors are those dataponts that the margn pushes up aganst 1. Intutvely ths feels safest.. If we ve made f(x,w,b) a small = error sgn(w. n x the - b) locaton of the boundary (t s been jolted n ts perpendcular The drecton) maxmum ths gves us least chance of causng margn a lnear msclassfcaton. classfer s the lnear classfer wth the, um, maxmum margn. 3. LOOCV s easy snce the model s mmune to removal of any non-supportvector dataponts. 4. There s some theory (usng VC dmenson) that s related to (but not the same as) the proposton that ths s a good thng. Ths s the smplest knd of SVM (Called an LSVM) 5. Emprcally t works very very well. 9/30/15 85 Dr. Yanjun Q / UVA CS 6316 / f15 Evaluaton Choce-III: BascsoluHon ForHWfQ more advanced soluhon ForHWfQ 9/30/15 APracHcalGudetoSupportVectorClassfcaHon 86

9/30/15 APracHcalGudetoSupportVector ClassfcaHon 87 Today:Revew&PracUcalGude " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter #

44 9/30/15 APracHcalGudetoSupportVector ClassfcaHon 87 Today:Revew&PracUcalGude " SupportVectorMachne(SVM) # HstoryofSVM # LargeMargnLnearClassfer # DefneMargn(M)ntermsofmodelparameter # OpHmzaHontolearnmodelparameters(w,b) # Nonlnearlyseparablecase # OpHmzaHonwthdualform # Nonlneardecsonboundary # PracHcalGude # Fleformat/LIBSVM # Featurepreprocsssng # ModelselecHon # Ppelneprocedure 9/30/15 88

45 References ElementsofStaHsHcalLearnng,byHasHe, TbshranandFredman ssldes TutoralsldesfromDr.TefYanLu,MSRAsa APracHcalGudetoSupportVectorClassfcaHon ChhfWeHsu,ChhfChungChang,andChhfJenLn, 003f010 TutoralsldesfromStanford ConvexOpHmzaHonI Boyd&Vandenberghe 9/30/15 89

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 11: Classfca8on wth Support Vector Machne (Revew + Prac8cal Gude) Yanjun Q / Jane Unversty of Vrgna Department of Computer