Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr ggelen@nigde.edu.tr Abstract In this study, the perfrmance f tw neural classifiers; namely Multi Layer Perceptrn () and Radial Basis Fuctin (), are cmpared fr a multivariate classificatin prblem. and are tw f the mst widely neural netwrk architecture in literature fr classificatin and have successfully been emplyed fr a variety f applicatins. A nnlinear scaling scheme fr multivariate data is prpsed prir t training prcess in rder t imprve the perfrmance f bth neural classifiers. Prpsed scheme mdifies the gaussian multivariate data and prduce a unifrmly distributed multivariate data. It is shwn that the prpsed scaling scheme increases the perfrmance f neural classifiers. 1. Intrductin In recent years, a great deal f attentin is paid t utilizing neural classifiers in the field f pattern recgnitin. Reasns f this success essentially cme frm the fact that neural netwrks can be implemented as a nnlinear discriminant functin and universal apprximatir. and are tw f the mst widely neural netwrk architecture in literature fr classificatin r regressin prblems [1-9]. The perfrmance f / neural classifiers is studied in literature. In [6-7] a hybrid / neural classifier is prpsed which prduces far better classificatin and regressin results when cmpared with advanced r with architectures. A nrmalized net is applied t nnparametric classificatin prblem in [5], which is btained by dividing each radial functin in by the sum f all radial functins. In the study f Reyneri and Sgarbi [4], a neur-fuzzy unificatin algrithm which mixes and has been tested as a pattern classifier. It has been shwn that the prpsed algrithm reaches a perfrmance which is cmparable r better than ther traditinal neural algrithms and can be trained much faster. Raudys has studied the generalizatin f neural classifier in terms f the relatin between the perfrmance and the cmplexity f the classifier [8] It is shwn that neural learning is highly dependent n the presentatin f the training data. [10]. This dependency will be explited in rder t achieve higher perfrmance in a classificatin prblem. The aim f this study is t utilize a nnlinear scaling n the gaussian distributed multivariate input data t prduce a mre unifrmly distributed training data. It will be shwn that apprpriate distributin characteristics results in an imprved perfrmance in classificatin. 2. Neural Netwrk Classifiers Multilayer perceptrn () and Radial Basis Functin netwrks () have becme the mst widely used netwrk architectures in signal prcessing and pattern recgnitin. Bth types f neural netwrk structures are gd in pattern classificatin prblems. They are rbust classifiers with the ability t generalize fr imprecise input data. General difference between and is that is a lcalist type f learning which is respnsive nly t a limited sectin f input space. On the ther hand, is mre distributed apprach. The utput f a is prduced by linear cmbinatins f the utputs f hidden layer ndes in which every neurn maps a weighted average f the inputs thrugh a sigmid functin. In ne hidden layer netwrk hidden ndes map distances between input vectrs and center vectrs t utputs thrugh a nnlinear kernel r radial functin. 2.1 Multi Layer Perceptrn () Neural Netwrk netwrks cnsist f an input layer, ne r mre hidden layers and an utput layer. Each layer has a number f prcessing units and each unit is fully intercnnected with weighted cnnectins t units in the subsequent layer. The transfrms n inputs t l utputs thrugh sme nnlinear functins. The utput f the netwrk is determined by the activatin f the units in the utput layer as fllws 24

x = f xw h h (1) h where f() is activatin functin, x h : activatin f h th hidden layer nde and w h : is the intercnnectin between h th hidden layer nde and th utput layer nde. The mst used activatin functin t is the sigmid and it is given as fllws 1 x = 1 + exp x w ( h h) The activatin level f the ndes in the hidden layer is determined in a similar fashin. Based n the differences between the calculated utput and the target value an errr is defined as fllws (2) 2 x μ j φ j ( x ) = exp 2 (6) σ j where x and μ are the input and the center f unit respectively. σ j is the spread f the gaussian basis functin. The utput f i th neurn in the utput layer f is determined by the linear cmbinatin f the utput f the units in the hidden layer as fllws M y ( x) = w φ ( x ) + b i (7) i ij j j= 1 N L 1 ( s) ( s) E = ( t ) 2 x (3) 2 s where N is the number f pattern in data set and L is the number f utput ndes. The aim is t reduce the errr by adjusting the intercnnectins between layers. The weights are adjusted using gradient descent Backprpagatin (BP) algrithm. The algrithm requires a training data that cnsists f a set f crrespnding input and target pattern values t. During training prcess, starts with a randm set f initial weights and then training cntinues until set f w ih and that f w h are ptimized s that a predefined errr threshld is met between x and t. Accrding t the BP algrithm, each intercnnectin between the ndes are adjusted by the amunt f the weight update value as fllws E Δ wh = η wh = ηδ xh (4) E Δ wih = η wih = ηδ h xi (5) where E is the errr cst functin given in (3), δ = x ( t x ), δ = x δ w h h h where x = x ( 1 x ) and x when a h = xh ( 1 xh ) sigmid activatin functin is used 2.2 Radial Basis Functin () Neural Netwrk The structure f neural netwrk is similar t that f. It cnsists f layer f neurns. The main distinctin is that has a hidden layer which cntains ndes called units. Each ne f the units in the hidden layer has tw parameters; the center and the width f gaussian basis functin which determines the utput f the units. Gaussian basis functin has the fllwing frm 25 where w ij the weights between units and the utput nde f neural netwrk. b i is the bias term. The weights are ptimized using least mean square LMS algrithm nce the centers f units are determined. The centers can be chsen randmly r using clustering algrithms. 3. Data Sets Tw classificatin prblems are cnsidered in the experiments. In the first experiment, imensnal and 3- Dimensnal multivariate randm values are prduced using MATLAB Statistical Tlbx. The first set f prblems is easily separable as seen in Figure 1 fr classificatin prblem. The data have been chsen s that each independent variables, (and x3 fr ) have a gaussian distributin as seen in Figure 2a-b. In the secnd experiments a mre difficult classificatin prblem is cnstructed bth fr and 3- D cases chsing the mean value f the independent variables clse enugh. Figure 3 shws (-) plane fr the data and Figure 4a the distributin f data n (- -x3) space fr classificatin prblem. As seen frm Figure 3 data have a highly inseparable classes. A similar bservatin can be made fr data distributin n (, ), and (,x3) plane which are given in Figure 4b-c, Fr classificatin prblem, tw sets f data with identical statistical characteristics are prduced randmly. Each set has 200 samples and each classes (CLASS I and CLASS II) in the data sets is represented equally and has a ttal number f 100 samples, respectively. The first set is labeled as SET T ( T: stands fr training) and used t train neural classifiers. The secnd set is kept fr testing purpse and is labeled as SET TS (TS: stands fr testing).

Figure 1- Easily separable classificatin prblem n (-) plane Cls I Cls II Figure 3. A highly inseparable classificatin prblem n (-) plane Frequency x3 Figure 2a. Distributin f the easily separable classificatin prblem n plane Cls I Cls II Figure 4a. Distributin f the, and x3 space Frequency Figure 2b. Distributin f the easily separable classificatin prblem n plane Fr classificatin prblems as well, data are prduced as explained abve. All data sets are subject t scaling prcedure which explained in Appendix II. The scaling is perfrmed s that it mdifies the distributin characteristics f the independent variables, fr the prblems and, and x3 fr the prblem. The resulted distributin has a unifrm characteristic as illustrated in Figure A1 in the Appendix sectin. Figure 4b. Distributin f the (, ) plane x3 Figure 4c. Distributin f the (, x3) plane 26

4. Experiments A suitable and structures are cnstructed t slve and classificatin prblems. In the first experiment, the gaussian distributed data is first used t train and neural classifiers. In the secnd experiments, neural classifiers are trained using the mdified data. Fr cmparisn purpse, all parameters (i.e. initial weights, learning rate, netwrk structure etc.) are kept cnstant fr neural classifier. In the same way, parameters are chsen identical t fairly evaluate results. and neural classifiers are created as explained in Appendix I. Table 1 shws the perfrmance f neural classifiers in the first experiment. As the classes are easily separable, and reach a high scre fr training and test data. crrectly classifies 94 data ut f 100 fr Class I. Classes Ttal Scre Prblem P. I II E T C 94 88 188 182 F 6 12 12 18 % 94 88 94 91 atin C 90 88 187 178 F 10 12 13 22 % 90 88 93 89 C 88 83 172 171 atin F 12 17 28 29 % 88 83 86 85 C 64 59 195 123 F 36 41 5 77 % 64 59 97 62 Table 1- The simulatin results f neural classifiers fr the secnd phase f the experiments (easily separable classificatin prblem) Only 12 data misclassified by fr Class II. Ttal scre fr classificatin prblem reach 94% fr training data and 91% fr test data. Fr prblem the scres are 86% and 85%, respectively. The perfrmance f is fund slightly inferir when cmpared t bth fr and classificatin prblem as 93% and 89%, respectevily. Hwever a great degradatin is fund in the slutin f 3- D classificatin prblem. The recrded scres are 97% fr training and 62% fr test data. An experiment with redistributed data is als carried ut t delineate the effect f data distributin n the perfrmance f the neural classifiers. Fr a fair cmparisn, all parameters and structure f neural classifiers are kept intact. Table 2 summarizes the findings. As seen, a great increase in the perfrmance f is btained fr prblem. is classifying all test patterns crrectly. Fr prblem, the perfrmance remains nearly same as 84%. On the ther hand, the perfrmance f imprves slightly frm 89% t 90% fr test patterns in slving classificatin prblem. As fr classificatin prblem, a great increase is experienced frm 62% t 74%. In the secnd phase f experiments, and neural classifiers are emplyed with the same structure and initial cnditins t slve mre cmplex classificatin prblems. The results are given in Table 3 and Table 4 fr riginal data and mdified data sets, respectively. As expected, the perfrmance f bth classifiers is reduced greatly due t the cmplexity f the classificatin prblem in hand. Classes Ttal Prblem Net P. I II E T C 100 100 200 200 F 0 0 0 0 % 100 100 100 100 C 88 93 184 181 atin F 12 7 16 19 % 88 93 92 90 C 88 81 170 169 atin F 12 19 30 31 % 88 81 85 84 C 76 72 168 148 F 24 28 32 52 % 76 72 84 74 Table 2- The simulatin results f neural classifiers fr the secnd phase f the experiments with mdified input data (easily separable classificatin prblem) Classes Ttal Prblem Net P. I II E T C 79 75 135 134 F 21 25 65 66 % 79 75 68 67 C 67 52 138 119 atin F 33 48 62 81 % 67 52 69 59 C 35 59 123 94 atin F 65 41 77 106 % 35 51 61 47 C 46 53 138 99 F 54 47 62 101 % 46 53 69 49 Table 3- The simulatin results f neural classifiers fr the secnd phase f the experiments (cmplex classificatin prblem) When mdified data set is used t train neural classifiers, an enhancement in the classificatin perfrmance f and is witnessed. An increase f 25 in percentage, frm 47% t 72%, is encuntered in 27

the perfrmance f neural classifier fr prblem. The perfrmance f is als imprved frm 49% t 53% in slving prblem. As fr classificatin prblem, shws a higher accuracy with an increase frm 59% t 63%, while a slight decrease takes place in the perfrmance f frm 67% t 65%. Classes Ttal Prblem Net P. I II E T C 78 52 135 130 F 22 48 65 70 % 78 52 67 65 C 70 76 141 126 atin F 30 24 59 74 % 70 76 70 63 C 68 76 149 144 atin F 32 24 51 56 % 68 76 74 72 C 49 55 129 104 F 51 45 71 96 % 49 55 64 52 Table 4- The simulatin results f neural classifiers fr the secnd phase f the experiments with mdified input data (cmplex classificatin prblem) 5. Cnclusin A nnlinear scaling has been prpsed t imprve the perfrmance f and neural classifiers. A 2- D and classificatin prblems have been slve and results have shwn the validity f the prpsed nnlinear scaling scheme. Better perfrmance in terms f classificatin is btained mst f the case. As an extreme case, an increase frm 47% t 72%, is btained in the perfrmance f neural classifier fr prblem. The prpsed scaling takes nly the mean value f the data int accunt t prduce a unifrmly distributed data. This is believed t result in a negligible incnsistency in the imprvement f the perfrmance. Hence nnlinear scaling scheme may further be enhanced t augment the unifrm distributin 6. Appendix I: Netwrk Structures The neural netwrks are cnstructed and trained in MATLAB 6.5 envirnment using Neural Netwrk Tlbx. neural netwrks is created using newff build-in functin and have a 2 ndes fr classificatin prblem r 3 ndes fr classificatin prblem in the input layer, 10 ndes in the hidden layers and 2 ndes, crrespnding t classes, in the utput layer. Lgsig (sigmid functin) is used as an activatin functin fr all neurns. Training is carried ut by backprpagatin algrithm with mmentum and a buildin functin traingdm is used. neural classifier is created using MATLAB builtin functin newrb with 2 r 3 ndes in the input layer, 15 units in the hidden layer and 2 ndes in the utput layer Appendix 2: Mdified Distributin f Data Set: Nnlinear Scaling The distributin f the independent variables, and x3 is selected t be gaussian as seen frm figures. A nnlinear scaling which mdifies the gaussian distributin t a unifrm ne is perfrmed t seek an enhanced perfrmance f the neural classifiers. T this end, the fllwing functin which maps the X i t X 0 is prpsed with m being the mean f the data which is given in Table A1 fr each set f data prduced. X 2 0 = 1 2( X m) 1+ e A.1 i where; x i : the data being mdified x 0 : mdified data m: mean value f the data. As an example Figure A1 illustrated the mdified data fr classificatin prblem given in Figure 1. Table A1. The mean value m f the independent variables namely,, and x3 (bth fr training and test data) fr bth easily separable (ES) and mre cmplex (MC) prblems Training Data Test Data Prblem I II I II X1-0.304 4.220 0.152 4.19 ES X2 1.753 4.631 2.230 5.27 X1 1.157 1.919 1.144 2.03 MC X2 2.216 3.271 2.045 3.26 X1 0.935 1.069 0.971 1.12 X2 2.008 2.129 1.935 1.97 ES X3 3.156 5.030 2.847 5.17 X1 2.165 2.061 2.055 1.97 X2 1.945 2.082 2.036 2.05 MC X3 1.988 1.920 1.9475 1.937 28

Elec Cmp Eng, Trnt, Canada, May 13-16, 2001, pp. 87-92 [10] H Altun and K. M. Curtis, Expliting the Statistical Characteristics f the Speech Signals fr Imprved Neural Learning in Neural Netwrks, IEEE Wrkshp n Neural Netwrks fr Signal Prcessing, NNSP 98, Cambridge, UK, August 1998, pp. 547-556 Figure A1. illustrates the scaled versin f data shwn in Figure 1. 7. References [1] W. Lh, L. Tim, A Cmparisn f Predictin Accuracy, Cmplexity, and Training Time f Thirty Three ld and New atin Algrithm, Machine Learning 40(3), 2000. pp 203-238, [2] A Ribert, A Ennaji, Y Lecurtier, Generalisatin Capabilities f a Distributed Neural Classifier, ESANN'1999 - Eurpean Sympsium n Artificial Neural Netwrks Bruges (Belgium), 1999, pp. 269-268 [3] KennethJ. McGarry, Stefan Wernter, and Jhn MacInyre Knwledge Extractin frm Radial Basis Fuctin Netwrks and Multi Layer Perceptrns Internatinal Jurnal f Cmputatinal Intelligence and Applicatins, 1(3) 2001 pp.369-382. [4] L.M. Reyneri, M. Sgarbi, Perfrmance f Weighted Radial Basis Functin Classifiers, in Prc. f ESANN 97, Eurpean Sympsium n Artificial Neural Netwrks, Bruges (B), April 1997, pp. 19-25. [5] B. Kegl, A Krzyzak and H Niemann, Radial basis functin netwrks in nnparametric classificatin and functin learning, ICPR 98: Internatinal Cnference n Pattern Recgnitin, Brisbane, Australia, August, 1998, pp. 565-570. [6] S. Chen and N. Intratr Autmatic mdel selectin in a hybrid perceptrn/radial netwrk. Infrmatin Fusin: Special issue n multiple experts, 3 (4), 2002, pp. 259-266. [7] Shimn Chen, Nathan Intratr: A Study f Ensemble f Hybrid Netwrks with Strng Regularizatin. Multiple Classifier Systems, 2003, pp. 227-235 [8] Sarunas Raudys, Generalizatin f classifiers, The 2nd Internatinal Cnference n Neural Netwrks and Artificial Intelligence, ICNNAI'01, 2001 Belarus [9] Ramirez L, Pedrycz W, Pizzi N. Severe strm cell classificatin using supprt vectr machines and radial basis functin appraches. Prc IEEE Canadian Cnf 29