MACHINE LEARNING USING SUPPORT VECTOR MACHINES. M. Palaniswami, A. Shilton, D. Ralph** and B.D. Owen*

Size: px

Start display at page:

Download "MACHINE LEARNING USING SUPPORT VECTOR MACHINES. M. Palaniswami*, A. Shilton*, D. Ralph** and B.D. Owen*"

Simon Small
6 years ago
Views:

1 Astract MACHIE LEARIG USIG SUPPOR VECOR MACHIES M. Palansam*, A. Shlton*, D. Ralph** and B.D. Oen* *Department of Electrcal and Electronc Engneerng he Unversty of Melourne, Vctora-30, Australa. **he Judge Insttute of Management Studes Unversty of Camrdge rumpngton St, Camrdge CB AG, UK. Machne learnng nvokes the magnaton of many scentfc mnds due to ts potental to solve comple and dffcult real orld prolems. hs paper gves methods of constructng machne learnng tools usng Support Vector Machnes (SVMs). We frst gve a smple eample to llustrate the asc concept and then demonstrate further th a practcal prolem. he practcal prolem s concerned th electronc montorng of fshays for automatc countng of dfferent fsh speces for the purpose of envronmental management n Australan rvers. he results llustrate the poer of the SVM approaches on the sample prolem and ther computatonal attractveness for practcal mplementatons.. IRODUCIO Machne learnng s an attractve feld n the doman of Artfcal Intellgence (AI) th the scope to learn from presented epermental data for the purpose of ntellgent nterpretaton hen the system s confronted th unseen stuatons. In the feld of artfcal neural netorks, several neural netorks archtectures have een presented th the ve to generatng generalsed mappngs from nput to output n a roust manner. In ths paper, e gve a technque that s ncreasng n popularty under the name Support Vector Machnes [, 4], hch s also a unversal feedforard appromator much lke the layered feedforard netorks and Radal Bass Functon etorks. We gve the asc concept ehnd ths emergng paradgm and llustrate t th a practcal eample.. SUPPOR VECOR MACHIES. Background A common prolem that can e oserved n many AI engneerng applcatons s pattern recognton [4]. he prolem s as follos gven a tranng set of vectors, each elongng to some knon category, the machne must learn, ased on the nformaton mplctly contaned n ths set, ho to classfy vectors of unknon type nto one of the specfed categores. Support vector machnes (SVMs) provde one means of tacklng ths prolem. In order to provde a ass for classfcaton, SVMs mplctly map the tranng data nto a hgh-dmensonal feature space. A hyperplane s then constructed n ths feature space hch mamses the margn of separaton eteen the plane and those ponts lyng nearest to t (called the support vectors). he plane so constructed can then e used as a ass for classfyng vectors of uncertan type.. Lnearly Separale Data For smplcty, e consder the prolem of to category classfcaton. Consder the tranng par:, here tranng vector, and d ( ) d category (±). Here runs from to, the numer of tranng ponts. Assume that a hyperplane hch dvdes the tranng data can e found, thout mappng to feature space. he decson surface hyperplane ll e defned y a dscrmnant functon, g( ) + 0 here the vector, of dmenson equal to that of, and scalar are chosen such that + > 0 d + < 0 d + (ote e assume the classfcaton s strct: no data pont les on the decson surface g()0.) Classfcaton of an unknon vector nto class memershp d (±) s done usng the dscrmnant functon:

2 d sgn ( g( ) ) he support vectors are those tranng vectors hch le closest to ths plane. he notaton (, d ) refers to a tranng par (, d ) such that s a support vector. Consder any vector. hs can e epressed as: : + P r, here: g( ) r decson surface Mnmze Ψ ( ) suject to d ( + ).3 onlnear Decson Surface For comple prolems, e must frst map our data nto feature space pror to constructng our decson surface. Suppose that ths s done va some artrary mappng nto an artrary dmenson m: ( ) [ ϕ ( ) ϕ ( ) ϕ ( )] ϕ,...,, he classfcaton prolem n feature space s no: m P r Mnmse: Ψ ( ) ( + ) Suject to: d ( ) ϕ here the varale vector s no an m-dmensonal vector and the varale s a scalar as efore. Gven the optmal and, the dscrmnant functon comng out of ths support vector machne s By scalng and, e can ensure that: d g( ) here equalty mples that the pont s a support vector. Hence, for the support vectors: g( r ) ( s ) d o, ased on ths, e defne the margn of separaton eteen the to classes as: ρ r So, n order to mamse the margn of separaton, e must mnmse, or for convenence, suject to the constrants gven prevously. o summarse the classfcaton prolem, t s a quadratc program [3] n the varales and : g( ) ϕ ( ) +.3. on-separale data If the tranng data s not separale n the chosen feature space, then the aove approach ll fal, as the condtons can never e met (causng to dverge to ). o deal th ths, a soft-margn technque can e used. on-negatve slack varales ξ are ntroduced hch allo the constrants to e eakened hen they cannot e met. he re-formulated quadratc program has varales, and ξ, and uses a constant C: Mnmse: Ψ (, ξ ) + C ξ d ( ϕ + ) ξ Suject to: ( ) C s an artrary constant that may e used to control the trade off eteen machne complety (correctly classfyng outlers, possly at the epense of msclassfyng some good data) and roustness (lmtng the alty of outlers to dstort the separatng plane). See Vapnk [] for an elegant theoretcal underpnnng of ths dea.

3 .3. Kernel functons and the Wolfe dual In feature space, the dot product s: (, y) ϕ ϕ ( ) ϕ( y) ( ) ϕ ( y) ϕ ( ) ϕ ( y) he nner product kernel s defned as: () K (, y) ϕ( ) ϕ( y) hs can e treated as a generalsed form of the dot product for a curved nput space. ote that the feature space may have nfnte dmenson so long as e never have to refer to the mappng to feature space eplctly. For the lnearly separale case, e can smply use the standard dot product K(,y) y. Any artrary functon can e used as a kernel, ut good theoretcal propertes rely on t eng related to an nner product on the feature space. hs ll e true f the kernel satsfes Mercer s condton, hch s: a For all ψ( ) for hch ( ) a a K and the kernel s symmetrc. m m ψ d s defned. (, y) ψ ( ) ψ ( y) d d y 0 he prmal prolem appears to e mpractcal hen feature spaces of very hgh dmenson are used (and mpossle f the feature space has nfnte dmenson). o overcome ths e solve the Wolfe dual quadratc program nstead of the prmal quadratc program. Dervaton of the follong Wolfe dual can e found elsehere, n Burges, for eample, and more generally n [3]. he Wolfe dual can e rtten: G Q α α α α Suject to: d α 0 0 α C here the G matr has ros and columns: Mnmse: ( ) j j ( ) G d d K, Havng solved the dual prolem to otan an optmal dual vector α, e can easly recover the dscrmnant functon g as e descre net. What s nterestng, and very mportant from the pont of ve of mplementaton, that ths s possle thout needng j to kno hat the feature mappng φ s, ut only that t ests and satsfes the kernel equaton (). It can e shon that the optmal and are gven y d α d φ( ) α d K here (, ) (, ) (, ) d d s any tranng par such that 0 < α C; the fact that α s postve can < e shon to mply that g ( ) s a support vector. hus φ( ) + α d φ( α d K ) φ( ) + (, ) + hch s epressed thout reference to φ. As usual, a ne data vector can e classfed y d sgn g(). hs overcomes the prolems nherent hen dealng th hgh dmensonal feature spaces that are ntrnsc to a kernel mappng..4 Eample smulatng an XOR gate We have adapted an eample from [4]. ranng set: d (-,-) - (-,) (,-) (,) - Choose the quadratc kernel K, y + y. We can compute G ( ) ( ) y

Solvng the Wolfe-dual gves Soluton: 0.5 0.

5 0 he feature mappng assocated th the aove kernel can e constructed eplctly [4], leadng to the follong compact representaton of the decson surface: 0 hs s sensle, as all t does s to dvde the

4 Solvng the Wolfe-dual gves Soluton: α he feature mappng assocated th the aove kernel can e constructed eplctly [4], leadng to the follong compact representaton of the decson surface: 0 hs s sensle, as all t does s to dvde the quadrants of the to-dmensonal nput space. Eplctly, the decson surface represented here s: d - d + d + d - 3. FISH CLASSIFICAIO USIG SUPPOR VECOR MACHIES Fshays are constructed n rvers to help mgratory fsh get over ostacles (dams, ers, etc.). In order to study there success, t s mportant to montor oth the numer and speces of fsh usng them. It s very tme consumng to do ths manually (y veng vdeo footage and countng y hand). A etter alternatve ould e to tran a machne to recognse the dfferent speces and use ths machne to do the count n real tme. he follong results ere otaned to assess the feaslty of usng support vector machnes for ths purpose. 3. ranng data and epermental detals Results ere otaned ased on fsh n a tank. he speces nvolved ere slver perch, gant daneo, daneo, tger ar and rano fsh; speces -5 respectvely. Multple mages of each speces ere taken, and feature etracton used to reduce each mage to a 0 dmensonal vector. We have coded a set of 0 feature etractors from varous sources. 3 shape features [5], length features [6], area ratos [7], 5 nary moments (up to 4 th order) and the correspondng 5 shade moments, moment nvarants [8] and the remanng features are some of the prevous features normalzed such that the solated fsh has unt area. A lnear transform s found such that the fnal tranng features are normalzed such that they le eteen 0 and. he same lnear transform s then appled to all the testng features. Permeter Length Fsh Speces Heght 000 tranng and testng vectors ere used. Of these, 800 ere chosen at random to tran the support vector machne and the remander used to evaluate the results. For consstency, the process as repeated 0 tmes.

5 Multclass (5 speces) classfcaton as acheved usng a smple nner-takes-all method. hs corresponds to a aselne accuracy of 0% (.e. 0% accuracy ould e acheved y randomly assgnng a speces to each test vector). 3. Results usng support vector machne he frst set of results ere generated usng a smple polynomal kernel, namely: (, y) ( ) y p K + Results are shon n the follong graph: % correct 70.00% 60.00% 50.00% 40.00% 30.00% 0.00% 0.00% 0.00% A radal ass functon kernel as also tred: 70.00% 60.00% K (, y) y σ e param our current soluton s very close to the optmal soluton, and can e made optmal through relatvely mnor modfcatons. Actve set methods for quadratc programs [3] present a natural frameork n hch to nvestgate the effect of perturatons of the prolem data on the optmal value and optmal soluton knon as senstvty analyss and then to derve ncremental methods. he actve set technque can e traced ack to pvotal methods lke the Smple method [3] for lnear programmng. For ths prolem, e compared the computatonal cost of repeated atch solvng and ncremental solvng usng a quadratc kernel. he method as as follos: he SVM as ntally traned usng -0 tranng pars 0 more tranng pars ere added, and the resultng prolem solved usng oth ncremental and atch methods. Both the atch and ncremental methods used ere actve set methods. In the ncremental method, the alphas correspondng to each of the ne data ponts as set to 0 or C dependng on hether the ne tranng par as correctly classfed y the old SVM or not. Batch methods smply solved from scratch. he follong graph shos the dfference eteen the flop counts for the ncremental and atch solves. Computatonal cost - ncremental vs atch solvng % correct 50.00% 40.00% 30.00% 0.00% 0.00% atch ncremental 0.00% p BACH VERSUS ICREMEAL SOLUIO he tradtonal approach to solvng the Wolfe dual has een to use a atch solvng method. he man dsadvantage to ths approach s that t s not possle to ncorporate ne tranng data as t comes to hand (as e must solve agan from scratch). An alternatve approach s that of ncremental learnng. hs makes the assumpton that the ne data only contans a small amount of nformaton. Hence, COCLUSIO tranng set sze hs paper has gven methods of constructng Support Vector Machnes for the purpose of machne learnng. he classfcaton performance s llustrated th an on-gong practcal applcaton prolem n computer vson and envronment management. In many practcal stuatons, ne nformaton s added and ths nformaton s to e ncorporated effcently thout

6 sgnfcantly affectng the old nformaton. For ths purpose, results are provded to demonstrate the usefulness of ncremental algorthms that deal th ncremental data. 6. REFERECES [] Vapnk, V.., -he ature of Statstcal Learnng heory, Sprnger-Verlag, 5. [] Burges, C.J., A utoral on Support Vector Machnes for Pattern Recognton, Knoledge Dscovery and Data Mnng, (), 8. [3] Fletcher, R., -Practcal Methods of Optmzaton, John Wley & Sons, nd ed., 87. [4] Haykn, S., - eural etorks, Prentce Hall, 88. [5] Castgnolles,., Cattoen, M. and Larner, M., Automatc system for montorng fsh passage at dams, Proceedngs-of-the-SPIE --he- Internatonal-Socety-for-Optcal-Engneerng, vol. 8, pp. 4-, 4. [6] Strachan,.J.C., esvada, P., and Allen, A.R., Fsh speces recognton y shape analyss of mages, Pattern-Recognton, vol. 3, pp , 0. [7] Strachan,.J.C., Recognton of fsh speces y colour and shape, Image-and-Vson-Computng, vol., pp. -0, 3. [8] Ress,.H., he revsed fundamental theorem of moment nvarants, IEEE ransactons on Pattern Analyss and Machne Intellgence, vol. 3, pp ,.

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest