Lecure SVM con. 0 008
Wha we have done so far We have esalshed ha we wan o fnd a lnear decson oundary whose margn s he larges We know how o measure he margn of a lnear decson oundary Tha s: he mnmum geomerc margn of all ranng examples Geomerc margn of a ranng example funconal margn normalzed y he magnude of w y ( w x ) γ w Funconal margn How do we fnd such a lnear decson oundary ha has he larges margn?
Maxmum Margn Classfer Ths can e formulaed as a consraned opmzaon prolem. maxγγ w, suec o : y () () ( w x w ) γ,, L, Ths opmzaon prolem s n a nasy form (quadrac consrans), so we need o do some rewrng Evenually we wll ge he followng: mn w w, suec o : y ( w x ),, L, Maxmzng he geomerc margn s equvalen o mnmzng he magnude of w suec o mananng a funconal margn of a leas
Solvng he Opmzaon Prolem mn w, suec o : y w ( w xx ),, L, Ths s a quadrac programmng prolem,.e., opmzng a quadrac fn wh lnear nequaly consrans. Ths s a well known class of mahemacal programmng prolemsfor whchseveral (non rval) algorhms exs. In pracce, we can us regard he QP solver as a lack ox whou oherng how works You wll e spared of he excrucang deals and ump o
The soluon Hold on a sec, we can no really gve you a close form soluon ha you can drecly plug n he numers and compue for an arrary daa ses Bu, he crysal all ells us ha he soluon can always e wren n he followng form: w α y x s.., α y Ths s he form of he soluon for w, can e calculaed accordngly usng some addonal seps The wegh vecor s a lnear comnaon of all he ranng examples Imporanly, many of he α s are zeros These ha have non zero α s are called he suppor vecors 0
An example Class α 8 0.6 α 00 α 5 0 α 7 0 α 0 α 4 0 α 9 0 Class α 6.4 6 α 3 0 α 0.8
A few mporan noes regardng he geomerc nerpreaon gves he decson oundary posve suppor vecors le on hs lne negave suppor vecors le on hs lne All suppor vecors have funconal margn of We can hnk of a decson oundary now as a ue of ceran wdh, no pons can e nsde he ue Learnng nvolves adusng he locaon and orenaon of he ue o fnd he larges fng ue for he gven ranng se
Summarzaon So Far We defned margn (funconal, geomerc) We demonsraed dha we prefer o have lnear classfers wh large geomerc margn. We formulaed he prolem of fndng he maxmum margn lnear classfer as a quadrac opmzaon prolem Ths prolem can e solved usng effcen QP algorhms ha are avalale. The soluons are very ncely formed Do we have our perfec classfer ye?
on separale Daa and ose Wha f he daa s no lnearly separale? We may have nose n daa, and maxmum margn classfer s no rous o nose!
Sof Margn Allow funconal margns o e less han Orgnally funconal margns need o sasfy: Posve Class y (w x ) ow we allow o e less han : y (w x ) ξ The oecve fn also change o: mn w cξ w, egave Class
Sof Margn Maxmzaon suec o : mn w, y w ( w x ),, L, suec o : mn w, y w ( w x cξ ) ξ 0,, L, ξ,, L, Inroduce slack varales ξ o allow some examples o have funconal margns smaller han Effec of parameer c Conrols he radeoff eween maxmzng he margn and fng he ranng examples Large c: slack varales ncur large penaly, so he opmal soluon wll ry o avod hem Small c: small cos for slack varales, we can sacrfce a few ranng examples o ensure ha he classfer margn s large
Soluons o SVM w α y x, s.. α y 0 o sof margn w α y x, s.. α y 0 and 0 α c Wh sof margn c conrols he radeoff eween maxmzng margn and fng ranng daa I s effec s o pu a ox consran on α,he weghs of he suppor vecors I lms he nfluence of ndvdual suppor vecors (maye oulers) In pracce, c can e se y cross valdaon
How o make predcons? For classfyng wh a new npu z Compue s w z ( α y x ) z α classfy z as f posve, and oherwse s y ( x z) oe: w need no e formed explcly, we can classfy z y akng nner producs wh he suppor vecors Furher, he learnng of w and he predcon usng w oh can e acheved usng nnerproduc eween parof npu pons hslends self naurallyoo handlng cases ha are no lnearly separale y replacng he nner produc wh somehng ha s called kernel funcon.
on lnear SVMs Daases ha are lnearly separale wh some nose work ou grea: 0 x Bu wha are we gong o do f he daase s us oo hard? 0 x
Mappng he npu o a hgher dmensonal space can solve he lnearly nseparale cases 0 x x x 0 x (x, x )
on lnear SVMs: Feaure Spaces General dea: For any daa se, he orgnal npu space can always e mapped o some hgher dmensonal feaure space such ha he daa s lnearly separale: x Ф(x)
Example: Quadrac Feaure Space Assume m npu dmensons x (x, x, L, x m ) umer of quadrac erms: mmm(m )/ m The numer of dmensons ncrease rapdly! You may e wonderng aou he s A leas hey won hur anyhng! You wll fnd ou why hey are here soon!
Do produc n quadrac feaure space ow le s us look a anoher Ф(a) Ф() ow le s us look a anoher neresng funcon of (a ): ) ( ) ( ) ( a a a ) ( ) ( ) ( m m m m m a a a a m m m m a a a a a a a They are he same! And he laer only akes O(m) o compue!
Kernel Funcons If every daa pon s mapped no hgh dmensonal space va some ransformaon x φ(x), he nner produc ha we need o compue for classfyng a pon x ecomes: <φ(x ) φ(x)> for all suppor vecors x A kernel funcon s a funcon ha s equvalen o an nner produc n some feaure space. k(a,) (, ) <φ(a) φ()> ) We have seen he example: k(a,) (a ) Ths s equvalen o mappng o he quadrac space!
Lnear kernel: k(a,) (a ) More kernel funcons Polynomal lkernel: k(a,) (a ) ) d Radal Bass Funcon kernel: In hs case, he correspondng mappng φ(x) s nfnedmensonal! Lucky ha we don have o compue he mappng explcly! s w Φ( z) α y ( Φ( x ) Φ( z)) α s y K( x z) oe: We wll no ge no he deals u he learnng of w can e acheved y usng kernel funcons as well!
onlnear SVM summary Map he npu space o a hgh dmensonal feaure space and learn a lnear decson oundary n he feaure space The decson oundary wll e nonlnear n he orgnal npu space Many possle choces of kernel funcons How o choose? Mos frequenly used mehod: cross valdaon