Chapter 7. Support Vector Machine

Chapter 7 Support Vector Machie

able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie

Support Vector Machie (SVM) Like LDA, traditioal SVM is a liear ad biary classifier Ulike LSQ ad Fisher criterio, SVM approaches the 2-class classificatio problem usig the cocept of margi ad support vectors.

Margi ad Support Vectors Margi is defied to be the smallest distace betee the decisio boudary ad ay of the samples. Support vectors are data poits located o the margi lie.

Support Vector Machie Pick the decisio boudary ith the largest margi! Liear hyperplae defied by support vectors Movig other poits does ot affect the decisio boudary Oly eed to store the support vectors to predict labels of e poits

o-class Classificatio ith Liear Model y( ) = + b

SVM Formulatio o-class classificatio ith the liear model is y( ) = + b Give the target t ={-1,+1}, the distace of a poit to the decisio surface is give by t y( ) t( + b) = SVM is to fid the model parameters W by maimizig the margi, i.e, t ( + b) arg ma mi [ ], b

Parameterizig the decisio boudary t =1 t =-1 Data: ( 1,t 1 ), ( 2,t 2 ), (,t ), here t ={-1,+1} " cofidece " = ( + b) t

Maimizig the Margi maγ =, b 2a s. t. t ( + b) a 2 is added for mathematical coveiece

Support Vector Machies Let a=1 γ maγ =, b 2 s. t. t ( + b) 1

Support Vector Machies maγ =, b 2 s. t. t ( + b) 1 γ mi 1, b γ = 2 s. t. t ( + b) 1

Support Vector Machies mi 1, b γ = 2 mi 1, b γ = 2 2 = 2 s. t. t ( + b) 1 γ s. t. t ( + b) 1

Support Vector Machies γ b t t s b + = = 1 ) (.. 2 2 1 mi 2, γ his ca o be solved by stadard quadratic programmig, i.e., Oly a fe a greater tha 0, correspodig to the support vectors. N sv is the umber of support vectors. 1]} ) ( [ 2 mi{( 1, = + N b b t a Itroducig the Lagrage multipliers a, e have ) ( 1 1 N N t b t a SV = = = =

Data is still ot liearly separable- Soft Margi b t s t C b + + ξ ξ ξ 1 ) (.. 2 mi,, he Soft Margi method ill choose a hyperplae that splits the eamples as clealy as possible, hile still maimizig the distace to the earest clealy split eamples.

Slack variables Hige loss + + = ) ) ( 1 ( t b ξ b t s t C b + + ξ ξ 1 ) (.. 2 mi, ( b) t + Hige loss

Multiple classes SVM

Multiple-Class SVM Oe possibility is to use N to-ay discrimiat fuctios: oe-v.s.-rest Each fuctio discrimiates oe class from the rest. Aother possibility is to use N(N-1)/2 to-ay discrimiat fuctios: oe-v.s.-oe Each fuctio discrimiates betee to particular classes. Sigle Multi-class SVM

Oe-v.s-the-rest

Oe-versus-oe Aother approach is to trai K(K 1)/2 differet 2-class SVMs o all possible pairs of classes, ad the to classify test poits accordig to hich class has the highest umber of votes lead to ambiguities i the resultig classificatio requires sigificatly more traiig time for large K

Sigle Multi-class SVM

Multi-class SVM Although the applicatio of SVMs to multiclass classificatio problems remais a ope issue, i practice the oe-versus-the-rest approach is the most idely used i spite of its ad-hoc formulatio ad its practical limitatios.

SVM ith Kerels for No-liear Classificatio he origial optimal hyperplae as a liear classifier. Kerel trick as itroduced to create oliear SVM classifiers his allos the algorithm to fit the maimummargi hyperplae i a high dimesioal trasformed feature space, here the classes are liearly separable.

Dual SVM Form γ Substitutig ad ito above L() yields 1]} ) ( [ 2 {( ),, ( 1 = + = N b t a a b L Miimizig ) ( 1 1 N N t b t a SV = = = = N t a = = 1 ), ( 2 1 2 1 ) (, 1, 1 m m m m N m m m m N k t t a a a t t a a a a L = = = = Subject to a >=0 m m k = ), ( here is the kerel. 0 1 = = N a t

Kerel ricks Some commo kerels iclude: Polyomial (homogeeous): Gaussia or Radial Basis Fuctio: k (, ) = ( ) m m d k(, m ) = ep( γ ( m ) 2 ) for γ > 0. Sometimes parametrized usig γ = 1 / 2σ2 More.

SVM Parameter selectio he effectiveess of SVM depeds o the selectio of kerel, the kerel's parameters, ad soft margi parameter C. ypically, each combiatio of parameter choices is checked usig cross validatio, ad the parameters ith best cross-validatio accuracy are picked. he fial model, hich is used for testig ad for classifyig e data, is the traied o the hole traiig set usig the selected parameters.

Relevace Vector Machie (RVM) RVM for regressio RVM for classificatio