X-Attributes Classifier (XAC): A New Multiclass Classification Method by Using Simple Linear Regression and Its Geometrical Properties

Proceedgs o the World Cogress o Egeerg ad Computer Scece 05 Vol II WCECS 05, October -3, 05, Sa Fracsco, USA X-Attrbutes Classer (XAC): A New Multclass Classcato Method by Usg Smple Lear Regresso ad Its Geometrcal Propertes Jeremas T. Lals, Member, IAENG Abstract I ths paper, a ew multclass classcato method has bee proposed. Durg the trag process, the smple lear regresso was used to d the lear relatoshp betwee the pared varables as well as ts cetrod o every class. The three pots: based o lear uctos, cetrods, ad put values, were used to d the class membershp o the preseted ew obect by usg the ormula calculatg the area o tragle. Four stadard ad publc datasets take rom UCI mache learg repostory were used to evaluate the perormace o the proposed algorthm usg 5-old crossvaldato. Emprcal results show the satsactory perormace o XAC algorthm o learly ad olearly separable classes wth small trag sze ad/or hgh dmeso. Ide Terms data mg, multclass classcato, smple lear regresso, geometrc propertes U I. INTRODUCTION coverg hdde useul kowledge wth large datasets s the ma goal o data mg. It helps people makg proactve ad kowledge drve decsos. Hece, varous data mg techques emerged deret research topcs lke sequetal rules, patter recogto, clusterg, regresso ad classcato. Amog these topcs, data classcato became oe o maor researches due to ts wde applcatos [][3], such as bomedcal modelg, bologcal modelg ad etc. Classcato s a supervsed learg method that reers to the task o aalyzg a set o data cotag observatos order to lear a model or ucto that ca be used detyg the ew observato to oe o the predeed classes. It has bee a actve research topc ot oly the mache learg area, but also statstcs []. Early work o classcato ocused o dg whch varables dscrmate betwee two or more classes, or also kow as dscrmat ucto aalyss (DFA). The uderlyg dea DFA s to use the predctor varables rom the trag set to costruct the dscrmat uctos, lke lear uctos, that wll determe the group membershp o the usee Mauscrpt receved May, 05; revsed July 0, 05. Jeremas T. Lals s a Assstat Proessor the College o Computer Studes ad the Drector o the Isttutoal Research ad Publcato Oce o La Salle Uversty, Ozamz Cty, Phlppes, e-mal: eremas.lals@gmal.com obect. Moder classcato approaches ocused o automatc geerato o rule (e.g. decso tree), the use o codtoal probabltes ( e.g. Naïve Bayes), calculatg the dstaces the eature space (e.g. K-earest eghbor), ad eve through lear ad olear regresso (e.g. support vector mache) creatg more leble models. I ths paper, the researcher presets a ew ad smple classcato method based o smple lear regresso dg the lear relatoshps betwee the obect s attrbutes ad to use ts geometrcal propertes, area o tragle, calculatg the dstace o the ew obect rom the predetermed classes. Ths study also shows the applcablty o smple lear regresso lear ad olear separable multclass classcato problems. Four stadard datasets rom the UCI mache learg repostory were used to measure ad evaluate the perormace o the proposed algorthm. II. RELATED WORK A. Smple Lear Regresso Lear regresso s the task o dg the best-ttg straght le, or also kow as regresso le, through the eature space []. The ma dea ths techque s to reveal the lear relatoshp or to derve a lear ucto that lks varable ad y, deoted as y m b () where y s the crtero varable, m s the slope, s the predctor varable, ad b as y-tercept o the tredle. I ths case, the value o varable y s predcted based oly o varable, thus, t s called as smple lear regresso. There are some other lear ad bary classcato methods that apply ths techque to classy learly separable classes, such as perceptro ad support vector maches (SVM). However, the best-ttg le these methods s used to separate the two classes, where t s called as hyperplae. B. Lear Classcato va Hyperplae Regresso ad classcato are both learg techques data mg that are used to create predctve models based o the preseted data. However, these methods produce deret values or output varables, ad thus, used ISBN: 978-988-4047--5 ISSN: 078-0958 (Prt); ISSN: 078-0966 (Ole) WCECS 05

Proceedgs o the World Cogress o Egeerg ad Computer Scece 05 Vol II WCECS 05, October -3, 05, Sa Fracsco, USA deretly. Sce regresso takes cotuous values as output, the t s used to estmate or predct a respose. O the other had, classcato takes class labels as output so t s used to d the class membershp o the obect. However, there are some classcato methods, lke perceptro ad SVM, that adopted the cocept o lear regresso to classy obects but a deret ad ar more comple maer. Perceptro Perceptro s oe o the earlest algorthms or lear classcato veted by Frak Roseblatt at the Corell Aeroautcal Laboratory 957 [4]. It s also cosdered as a smple model o euro that has a set o eteral put that ca be o ay umber, a teral put b, ad oe Boolea output value. The ma dea ths method s to d the sutable values or the weghts w the separatg hyperplae, (), so that the trag eamples are correctly classed. The hyperplae s geometrcally deed as A. Trag Phase I order or ay classer to dety the correct class membershp o the ew obect, t should be traed rst usg the trag set ad create a predctve model. Fg. shows the block dagram o -attrbutes classer (XAC) trag procedure. ) Gve the trag dataset wth umber o attrbutes ad k tuples, d the lear relatoshp betwee the pars o attrbutes each class, w b 0 ( ) () 0 otherwse However, the separatg hyperplae s oly guarateed to be oud the learg set s learly separable, otherwse, the trag process wll ever stop. Ths maor drawback makes ths algorthm less applcable to may patter recogto problems. Support Vector Mache (SVM) Lke perceptro, support vector mache (SVM) s a hyperplae based classer, but t s backed wth sold theoretcal groudg [5]. The obectve ths method s to d a optmal hyperplae, w. + b = 0, that separates the two classes wth the largest marg. It meas that ths hyperplae has the largest mmum dstace to the trag set. The hyperplae ca be ormally deed as T ( ) sg ( w b) (3) where w s the weght vector ad b as the bas whch ca be computed based o the trag data pot by solvg a costraed quadratc optmzato problem. The al decso ca the be derved ad deed as ( ) sg N y ( ) b (4) Where ths ucto depeds o a o-zero support vectors α whch are ote a small racto o the orgal dataset. III. XAC ALGORITHM The ma obectve o ths study s to use the lear ucto, () = m. + b, classyg lear ad olear separable multclass obects. I geeral, the proposed algorthm has two stages, the trag phase ad classcato phase. Fg.. Block dagram o the proposed trag procedure o -attrbutes classer.... ( ( ( ) ) ) (5) where ( ) s the lear ucto betwee attrbutes ad +, α s the slope, ad β s the oset. The slope α ( ) ca be computed as: k ( ) (6) k whle the oset β ( ) s computed as: (7) k The resultg values o α ad β betwee the pared attrbutes each class wll the be used as teral puts to calculate the output value durg the classcato stage. ) Calculate the cetrod C o the pared varables ad + or each class, deoted as C (, ): ISBN: 978-988-4047--5 ISSN: 078-0958 (Prt); ISSN: 078-0966 (Ole) WCECS 05

Proceedgs o the World Cogress o Egeerg ad Computer Scece 05 Vol II WCECS 05, October -3, 05, Sa Fracsco, USA, (8) Fgs., 3 ad 4 llustrate the scatter plot o each par o attrbutes as well as ts correspodg regresso le or each class the Irs lower dataset. classcato process o XAC algorthm. To determe the class membershp o the put obect: ) Fd the rst pot o the tragle or every pared attrbutes o ts respectve class by usg the Fg.. Scatter plot ad lear relatoshps betwee sepal legth ad sepal wdth o the three classes. Fg. 3. Scatter plot ad lear relatoshps betwee sepal wdth ad petal legth 3 o the three classes. Fg. 4. Scatter plot ad lear relatoshps betwee petal legth 3 ad petal wdth 4 o the three classes. B. Classcato Phase Ater the trag process, the resultg model ca ow be used to classy the ew obect. Fg. 5 shows the Fg. 5. Block dagram o the classcato process o XAC. ISBN: 978-988-4047--5 ISSN: 078-0958 (Prt); ISSN: 078-0966 (Ole) WCECS 05

Proceedgs o the World Cogress o Egeerg ad Computer Scece 05 Vol II WCECS 05, October -3, 05, Sa Fracsco, USA prevously calculated lear uctos ( ), ( ), ( 3 ),, ( - ) ad ts correspodg put values,, 3,, -. The resultg y-coordates would be o the orm o (eteral put, teral put ( )). ) The prevously computed cetrod C o each pared attrbute class wll serve as the secod pot o the correspodg tragles, the orm o (teral put, teral put ). 3) Par the put values, e.g. (eteral put, eteral put ), to obta the thrd pot o the correspodg tragles class. 4) Use the three pots o each pared attrbutes class to calculate the area o ts correspodg tragles, Area ( ) ( ) 5) Calculate the dstace o the put obect rom the eature vectors every class by summg up all the correspodg Area o ts pared attrbutes, (9) dst Area (0) where s the umber o attrbutes. 6) The class that obtaed the least dstace wll be declared as the wer or the class membershp o the ew obect. IV. EXPERIMENTS A. Dataset To measure ad valdate the perormace o the proposed algorthm, our publc datasets rom UCI Mache Learg Repostory were cosdered: Irs Flower [6], Wheat Seed Kerel [7], Breast Tssue [8], Breast Cacer Wscos (Dagostc) [9], ad Oe Hudred Plat Speces Leaves [0]. Table I shows the characterstcs o each dataset used the epermets. B. Evaluato To evaluate the perormace o the proposed method, 5- old cross-valdato was used each epermet. The trag ad testg steps were perormed ve tmes by TABLE I DATASET CHARACTERISTICS Dataset Trag Sze Testg Sze # o Classes Dm Irs Flower 0 per class 40 per class 3 4 Wheat Seed 4 per class 56 per class 3 7 Breast Tssue 4 or class 7 or class 4 4 0 or class 3 or class 3 4 or class 4 39 or class or class 3 8 or class 4 Breast Cacer 7 or class 86 or class 30 4 or class 70 or class Leaves-Shape 3 per class 3 per class 5 64 parttog the dataset to ve mutually eclusve subsets or olds. Accuracy, precso, recall ad F score were also used to measure the correctess, eactess, completeess, ad retreval perormace, respectvely, o the model beg produced by XAC every epermet. V. RESULTS AND DISCUSSION The summary o epermets results usg the our datasets s reported Table II. As we ca see, the XAC algorthm perorms best wth the Irs lower dataset compared wth the other three datasets. The result proves the applcablty o smple lear regresso classyg ot oly learly separable, but Dataset TABLE II EXPERIMENTS RESULTS SUMMARY Mea Accuracy Mea Precso cludg olearly separable classes. Net to t are the results o the epermets coducted wth the breast cacer dataset havg a mea precso o 89.9. Note that the dvso o the dataset, trag ad testg, s slghtly mbalaced, where 6 s comg rom the beg class ad the rest s rom the malgat class. However, results rom the epermets usg the wheat seed dataset are ar more better terms o mea accuracy, mea recall, ad mea F-score compared to the results wth the breast cacer dataset. It s also otable that the algorthm was able to produce a acceptable result or leaves dataset terms o precso at 86.47 despte o the lmted umber o trag set, three per class, ad hgh dmesoalty. Addg to the dculty o the classcato problem ths dataset s that may o the sub speces resemble close appearace wth the other maor speces, ad may sub speces resemble a radcally deret appearace wth ts maor spece []. Furthermore, results also show the robustess o the approach by usg oly the shape-based dataset durg trag ad testg. However, results gve by XAC usg the breast tssue dataset gve the lowest result, especally terms o completeess at 67.88. Ths s due to the mbalace o the umber o trag ad testg sets each class, where, 48 o the total umber o t s comg rom oe class oly. I geeral, the proposed algorthm perorms satsactorly eve wth small umber o trag set at 0 o the total sze o each dataset. VI. CONCLUSION Mea Recall Mea F-Score Irs Flower 94.50 95.05 94.50 94.77 Wheat Seed 89.7 89.7 89.3 89.48 Breast Tssue 75.9 75.3 67.88 7.3 Breast Cacer 88.55 89.9 85.9 87.87 Leaves (Shape) 83.69 86.47 83.63 85.03 Ths paper has preseted a ew method that ca be used or multclass classcato problems wth learly ad olearly separable classes usg smple lear regresso whch s orgally desged or bary classcato ISBN: 978-988-4047--5 ISSN: 078-0958 (Prt); ISSN: 078-0966 (Ole) WCECS 05

Proceedgs o the World Cogress o Egeerg ad Computer Scece 05 Vol II WCECS 05, October -3, 05, Sa Fracsco, USA problem wth learly separable classes oly. Emprcal results rom the epermets coducted usg the our stadard ad publc datasets take rom UCI mache learg repostory showed the satsactory perormace o the proposed algorthm. For the uture work, several aveues or mprovemet ca stll be cosdered lke usg the olear regresso to cater those pared attrbutes wth olear relatoshp. REFERENCES [] V. S. M. Tseg ad C. Lee, Cbs: A ew classcato method by usg sequetal patters, Proc.005 SIAM Iteratoal Data Mg Coerece, CA, 005, pp.596-600. [] A. A, Classcato methods. CA: Idea Group Ic, 005, pp. -6. [3] A. Arakeya, L. Nersya, A. Gevorgya, ad A. Boyaya, Geometrc approach or Gaussa-kerel bolstered error estmato or lear classcato computatoal bology, Iteratoal Joural Iormato Theores ad Applcatos, vol., o., pp. 70-8-, 04. [4] F. Roseblatt, The perceptro-a percevg ad recogzg automato, Corell Aeroautcal Laboratory, New York, Report 85-460-, 957. [5] C. Cortes ad V. Vapk, Support-vector etworks, Mache Learg, vol. 0, o. 3, pp. 73-97, 995. [6] UCI Mache Learg Repostory, Irs data set, 988. [Ole]. Avalable: archve.cs.uc.edu/ml/datasets/irs [7] UCI Mache Learg Repostory, Seeds data set, 0. [Ole]. Avalable: archve.cs.uc.edu/ml/datasets/seeds [8] UCI Mache Learg Repostory, Breast tssue data set, 00. [Ole]. Avalable: archve.cs.uc.edu/ml/datasets/breast+tssue [9] UCI Mache Learg Repostory, Breast cacer wscos (dagostc), 995. [Ole]. Avalable: archve.cs.uc.edu/ml/datasets/breast+cacer+wscos+(dagost c) [0] UCI Mache Learg Repostory, Oe-hudred plat speces leaves data set, 0. [Ole]. Avalable: archve.cs.uc.edu/ml/datasets/oehudred+plat+speces+leaves+data+set [] C. Mallah, Probablstc Classcato rom a K-Nearest-Neghbor Classer, Computatoal Research, vol., o., pp. -9, 03. [] Wkpeda, Lear regresso, 05. [Ole]. Avalable: http://e.wkpeda.org/wk/lear_regresso ISBN: 978-988-4047--5 ISSN: 078-0958 (Prt); ISSN: 078-0966 (Ole) WCECS 05