An Improved Support Vector Machine Using Class-Median Vectors *

A Improved Support Vector Mache Usg Class-Meda Vectors Zhezhe Kou, Jahua Xu, Xuegog Zhag ad Lag J State Ke Laborator of Itellget Techolog ad Sstems Departmet of Automato, Tsghua Uverst, Bejg 100084, P.R.C. zhezhekou00@mals.tsghua.edu.c Abstract Support vector mache bulds the fal dscrmat fucto o ol a small part of the trag samples, hch ma make the decso rule too sestve to ose ad outlers. Ispred b the dea of cetral support vector mache or C, e preset a mproved method based o the class-meda, called Meda Support Vector Mache or M ths paper. The epermet results sho M s a promsg ad robust algorthm, especall he outlers are far from the class-ceter. Keords Patter Classfcato, Support Vector Mache, Meda Support Vector Mache 1 Itroducto Support vector mache or s a e patter recogto techque developed b Dr. Vapk ad hs co-researchers [1-3]. The basc dea of s to desg a lear classfer th mamal classfcato marg, hle mmzg the trag error. Mamzg the marg plas the role of capact cotrol so that the learg mache ll ot ol have small emprcal rsk but also hold good geeralzato ablt [1-3]. Usuall the fal dscrmat fucto of ol depeds o part of the trag samples, hch are called support vectors. Ths propert makes the fal decso fucto sestve to some certa specfc samples the set. Thus the decso fucto obtaed b ma be badl cotorted f the samples are polluted b ose ad outlers ad some of these specfc samples are ufortuatel cosdered as support vectors, hch s ofte true the practcal applcatos. Ths problem has bee attractg more ad more atteto [4-6]. I [5] Zhag proposed a Ths ork s supported b Natoal Natural Scece Foudato of Cha(Project umber 69885004). modfed verso amed cetral support vector mache or C, hch ot ol takes the advatage of capact cotrol but also troduces the class ceter to overcome the eakess of. It as proved effectve ad promsg some practcal cases [5-6]. Hoever, C ma ot alas ork. I some cases, especall he the outlers stad far from the class ceter, the mea vector ll be pulled aa from the ceter b these outlers. But the class meda s a more robust represetato of the class tha the class ceter. I ths paper e preset a method, hch follos the mportat dea of ad C but tres to prevet the classfer from becomg too sestve to outlers b buldg the classfer o both the class-meda vectors ad the support vectors. The method s referred to as Meda Support Vector Mache or M. Epermets o to data sho ts usefuless ad advatages over ad also over C for certa cases. The remag part of the paper s arraged the follog a. The dea ad algorthm of M s preseted secto 2 ad secto 3. Secto 4 dscusses the M o-separable cases ad secto 5 apples the dea of kerel to make M olear. A smplfed mplemetato of M s gve secto 6 to utlze prevous algorthms for. Secto 7 llustrates some epermet results to evaluate M. More dscussos ad coclusos are gve secto 8. 2 Meda Support Vector Mache I ths paper the class meda s defed as a vector hose each compoet s the meda of correspodg compoets of all samples. I practce f the umber of the samples s odd, compoets of the class meda are the meda values, otherse ts compoets are the average of the to mddle values. Whe outlers occur the sample set, the could chage the meda lttle, but affect the class-ceter severel. As sho - 1 -

Fg.1, f there ests o outler, the cetral vector s close to meda vector. Hoever he a outler occurs the cetral vector s pulled far from the true cetral vector but the meda vector ol chages lttle. So t s cocluded that class-meda ll be more robust tha class-ceter, especall he the outlers are located far from class-ceter. class-medas to the classfcato boudar are: b ( b) d (5a) b ( b) d (5b) here 1, 1 are the class labels of ad. The e ca formulate our verso of marg as: ( ) d d d (6) Fg.1: Dagram of ceter ad meda Lke ad C, M s to fd a optmal separatg hperplae ( ) b 0 (1) accordg to the sample set (, ), 1,...,, R d, { 1, 1} (2) The decso fucto s the f ( ) sg{( ) b} (3) For the sake of smplct, frst e cosder learl separable cases. Accordg to [1-3], for the lear classfer to be optmal, all samples (2) should be correctl classfed b (3) ad the separato marg should be mamzed. The former requremet ca be rtte as: b,,, [( ) ] ξ > 0 1 L (4) here ξ 0 s a slack varable that cotrols the trag errors. Ths requremet guaratees the emprcal rsk ca be mmzed. The latter requremet performs the role of capact cotrol, hch makes the learg mache geeralze ell [1-3]. I, the separato marg s defed as the dstaces from the earest samples to the classfcato boudar. Actuall t s ths defto of separato marg that decdes the famous propert that the traed classfer depeds o ol a small part of the trag samples. To avod ths, e defe aother kd of marg, that s, the dstaces from the to class medas to the separato boudar. Fgure 2 llustrates the basc dea of M. Deote the meda of class 1 as, ad the meda of class -1 as. The dstaces from the Fgure 2. Meda separato marg We call the marg defed b (6) the meda separato marg ad call the classfer to mamze ths marg as Meda Support Vector Mache or M. It s ot eas to mamze (6) drectl, so follog the scheme used stadard, e ormalze the umerator of meda separato marg d to 1,.e. ( ) 1 (7) The reaso h e ca do ths s that the magtude of ca be scaled arbtrarl thout affectg the classfcato result. Thus e costruct the optmzato problem of our M method as, 1 2 1 m ψ ( ) ( ) (8) 2 2 subject to costrats (4) ad (7). Ths form s smlar to the prme problem of th oe more costrat. The optmzato goal of (8) ca be eplaed as mamzg the meda marg hle keepg all the trag data ot ol correctl classfed but also be aa from the separato hperplae. We ca ame the to hperplaes ( ) b ±ξ as the separato boudares ad call the rego betee them boudar zoe. - 2 -

3 The Dual Form of M Follog the smlar scheme th, the Lagrage fucto of the prme problem ca be rtte as: L p 1 2 ( ) α 1 { [( ) b] ξ} { ( ) 1} (9) β Ad the dual form of the above problem s: ma L D εα β 1 1 α 2, j 1 j ( β ) (10) ( α j β ), j 1 subject to: α 0 (11a) 1 α 0, 1,... (11b) hch s a quadratc programmg problem of α, β. The soluto eght vector s the lear combato of the trag samples α β ( ) 1 (12) Ad the decso fucto s f ( ) sg{( ) b } (13) hch the threshold b ca be decded from a sample for hch the equalt (4) holds true or be calculated b the to class medas. Smlarl as the case, from the Küh-Tucker codto e ca also fd that ol the α correspodg to those samples, hch make the equalt hold (4), s o-zero. These samples are also the oes that are earest to the separato boudar, correspodg to the support vectors. From (12) e ca see that the fal classfer of M s decded b both the support vectors ad the to class-meda vectors. Ths ca lead to a smplfed mplemetato of M dscussed secto 6. 4 Cosderatos for Noseparable Cases Whe cosderg the oseparable case, aga follog the deas, e troduce slack varables to codto (4) to clude the samples that volate ths codto: [ ( ) b] ξ ξ> 0, ξ 0, 1, L, (14) ad defe the objectve fucto as: mmze 1 ψ (, ξ ) ( ) C ξ (15) 2 1 Parameter C cotrols pealt o the errors ad also cotrols the trade-off betee the classfcato formato cotaed the medas of the samples ad the dvdual samples ear the boudar. For ths e problem, the dual problem turs to be same as defed b (10) ad (11) epect that codto (11b) s modfed to: 0 α < C,,... (16) 5 Kerel Verso 1 Lke e ca also appl the dea of kerel to, make M olear, b substtutg ( ) th, some kerel K( ) to realze the mappg from attrbute space to the hgh-dmeso feature space. I ths case, the fal decso fucto ll be f ( ) sg{ K( ) b } sg{ K (, ) K ( α β, 1 β K (, ) b } ) (17) Here e should pa especall atteto to ad. Ca the meda vectors attrbute space be used drectl here? From secto 2 t ca be see that the basc dea of usg class medas to buld support vector mache s that the class meda s less sestve to outlers ad thus ca be a more robust represetato of the class. Sce ths s true the attrbute space, e beleve that the class-meda attrbute space ca be a good represetatve of the sample. At the same tme, the medas the feature space are usuall uko because the feature space ma ot be or eed ot to be ko. Therefore t ould be reasoable ad feasble to drectl use the meda vectors attrbute space here. That s, use the mage of the meda vectors attrbute space as the medas feature space. Thus the M ca be mplemeted olear case a smplfed a. 6 A smplfed mplemetato of M From the epresso of the soluto of (12), e ca fd out that the fal decso fucto ca be cosdered as the combato of to parts, the support - 3 -

vectors ad the class-meda vectors. Ths fact leads to a smplfed mplemetato of M, hch ca be obtaed through the modfcato of the stadard result. That s, after gettg the eght of a stadard classfer, e ca adjust the hperplae b addg the dfferece vector betee the to class-medas th certa factor, e.g., e ( 1 λ ) λ( ) (18) here s the eght obtaed b stadard ad 0 λ 1 s a costat cotrollg the balace betee the classfcato formato cotaed the SVs ad the class medas. B takg ths short cut, prevous algorthms for ca be utlzed. I practce, λ ca be selected accordg the a pror estmato about our belefs the specfc samples ad the class medas. For eample, f there s severe ose sample set, a larger ca be chose to make the class medas pla a more mportat role. If λ 0, the stadard ca be obtaed. I practce, sce s scaled stadard, ma eed scalg to be comparable th. Whe kerel s adopted for the olear trasform, e ca use the mage of the meda vectors attrbute space as the meda feature space such that f ( ) sg{(1 λ) α K (, ) (19) 1 svm λ( K ( ) K ( )) b } Ths s our smplfed mplemetato of kerel verso of M. same as the oe obtaed he the trag set s clea. Note that here e dd ot attempt to detf ad remove the outler, but smpl made the result less sestve to outlers b takg the class-meda formato to accout. It ca also be see that he the outler s ot far from class-ceters, M ad C acheve smlar performace. (a) outler (b) (c) (d) Fg. 3: A comparso epermet of, C ad M. Pluses ad crcles deote the trag samples of to classes. (a) the stadard result at the estece of outlers; (b) the stadard result ecludg the outlers; (c) ad (d) the hperplaes of C ad M he the outlers est, respectvel. 7 Epermet Results ad Aalss I order to evaluate the performace of M ad compare M th ad C, e desg to to data sets. Epermet results ad aalss are preseted as follos. As sho Fg. 3, the frst eample s to compare the performace of MSV th that of C ad uder the estece of outlers ot far from the class ceter. I Fg.3 (a), there are to outlers the trag set ad the separato hperplae s obvous ot the optmal oe. Fg.3 (b) shos the optmal result he the trag set s oseless ad the separato hperplae chages a lot from the oe (a). Fg.3 (c) s the result obtaed b C, usg ts smplfed mplemetato[5]. Fg.3 (d) llustrates the result obtaed b M, usg ts smplfed mplemetato, th the same λ as (c). We ca fd that although the outlers est the trag set, the optmal result of M s almost the (a) (b) Fg. 4: A comparso epermet betee C ad M. Pluses ad crcles deote the trag samples of to classes. (a) ad (b) are the results of C ad M he the outlers are far from class-ceters, respectvel. Fgure 4 shos the epermet results of the comparso betee C ad M uder the estece of outlers far from the class-ceter. Fg.4 (a) shos the result obtaed b C ad Fg.4 (b) - 4 -

the result b M. It ca be see that he the outlers are far from class ceter, the chage the class-ceter greatl so that the results obtaed b C s far from the reall optmal oe. But the class meda s less sestve to such outlers ad therefore the result s more relable. So t ca be cocluded that M s a more robust algorthm. socet orkshop:, Ne York: The Isttute of Electrcal ad Electrocs Egeergs, Ic. pp. 3-11. [6] X. Zhag, H. Ke, Cetral support vector maches ad ts applcato to cacer classfcato. Submtted to IEEE Tras. o Neural Netorks. [7] Steve Gu, Support Vector Maches for Classfcato ad Regresso, ISIS Techcal Report, Uverst of Southampto, 1998 8 Dscusso ad Cocluso Support vector mache or, as a e techque for patter recogto, possesses good geeralzato ad thus has bee recevg more ad more atteto. Hoever, some authors observed that some cases here there est severe ose ad outlers the trag data, makes ts result too sestve to fe specfc samples ad thus less relable. As a step toard a more robust ad more practcall applcable modfcato of, e proposed meda support vector mache or M. We edeavor to combe the class-meda formato th the stadard to make the algorthm more robust. Further, accordg to the form of the M s soluto, a smpler mplemetato s troduced, hch s smpl the combato of the stadard SVs ad the class medas b a factor. Epermets llustrate the advatage of M over stadard ad also over C for certa cases. Hoever, further aalss of the method s deserved, ad e are also trg to appl t for some practcal problems. 9 Ackoledgemet The authors ould lke to thak Dr. JaHua Xu ad Ha Ke for ther helpful dscusso. Refereces [1] Vapk V N. Statstcal Learg Theor, Ne-York, Joh Wle & Sos, 1998. [2] Vapk V N. The Nature of Statstcal Learg Theor, NY: Sprger-Verlag, 1995 [3] X. Zhag, Itroducto to statstcal learg theor ad support vector maches, Acta Automatca Sca, 2000, 26(1):32-42( Chese). [4] B. Boser, I. Guo, V. Vapk. A trag algorthm for optmal marg classfers, preseted at the 5th Aual Workshop o Computatoal Learg Theor, Pttsburgh: ACM Press, 1992. [5] X. Zhag, Usg class-ceter vectors to buld supportvector maches, Yu-He Hu, Ja Larse, etc. Neural Netorks for Sgal Processg IX, Proceedgs of the 1999 IEEE sgal processg - 5 -