Systematic Selection of Parameters in the development of Feedforward Artificial Neural Network Models through Conventional and Intelligent Algorithms

THALES Project No. 65/3 Systematc Selecto of Parameters the developmet of Feedforward Artfcal Neural Network Models through Covetoal ad Itellget Algorthms Research Team G.-C. Vosakos, T. Gaakaks, A. Krmpes, P. G. Beardos Natoal Techcal Uversty of Athes, School of Mechacal Egeerg, Maufacturg Techology Dvso. Itroducto Feedforward artfcal eural etworks (ANNs) are curretly used a varety of applcatos (Fgure ) wth great success. The reaso behd ths wdespread adopto ca be foud two very mportat abltes that they exhbt. ANNs ca be traed to lear through examples (memorzato ablty) ad ca respod to cases that are smlar but ot detcal to the oes that they have bee traed wth (geeralzato ablty). IMAGE PROCESSING CONTROL ROBOTICS PATTERN RECOGNITION ANN SIGNAL PROCESSING FINANCE SIMULATION MEDICINE Fgure. Applcatos of ANNs The buldg block of a feedforward ANN s called a euro, ts mathematcal model show Fgure 2a. Each euro, receves put the form of weghted sgals (where p s the put sgal matrx ad W s the weght coeffcet matrx), sums them alog wth a bas term ad apples a fucto f, called actvato fucto (usually o-lear), to determe ts ow output sgal, deoted by y. A typcal feedforward ANN s composed of several such euros, whch are arraged layers as depcted Fgure 2b. The mathematcal otato used s also gve.

X IW, b IW2, K w, X2 LW, p2 p3 p4 p w,2 w,3 w,4 w, Σ b y=f(wp+b) f y X3 X4 K2 b2 bm LW,2 LW,3 by IW2, Km X IWm, (a) (b) Fgures 2. Mathematcal model of (a) a typcal euro ad (b) typcal feedforward ANN x : output sgal of the -th euro the put layer k j : output sgal of the j-th euro the hdde layer y: ANN s output sgal IW j, : weght coeffcet betwee the -th put euro ad the j-th hdde euro b j : bas of the j-th hdde euro LW,j : weght coeffcet betwee the j-th hdde euro ad the output euro b y : bas of the output euro tasg(x): hyperbolc taget fucto =,2,, ad j=,2,,m The developmet of a feedforward ANN model volves several stages, from gatherg the ecessary data to creatg a satsfactory model (Fgure 3). The most mportat aspect of ths process s the selecto of certa parameters that are crucal for the model s performace, otably the umber of hdde layers ad euros. Sce there s o theoretcal method to determe the approprate archtecture, a tral-ad-error repettve procedure s volved that s both tme-cosumg ad wth ucerta results. The outcome s maly based o the experece of the researcher regardg ANNs ad the studed pheomeo. Data collecto Data preprocessg Selecto of Algorthm Selecto of TA parameters Selecto of ANN parameters ANN trag Performace checkg Satsfactory ANN model Fgure 3. Developmet of a ANN model

The preset work attempts to deal wth the above problem by developg a systematc way to select such parameters. The emphass s placed o two phases of ANN model buldg, amely the talzato of the etwork s weghts ad the determato of the most sutable archtecture. 2. Italzato of weght coeffcets Every trag procedure starts by talzg the weght coeffcets,.e. by assgg values to them. The trag s goal s to fd the weght values that mmze the etwork s error fucto. Sce the tal values of the weghts defe the startg pot of the trag algorthm o the error fucto, they affect both the trag speed ad the acheved trag error. Ths depeds o whether ths pot s close to the global mmum or located a area wth may local mma. The most commo methods used to talze the weghts are ether to radomly select values from a predefed value feld (usually cetered aroud zero) or to use a statstcal dstrbuto (usually the Gaussa or uform dstrbuto). Pre-processg of the trag data (ormalsato, scalg etc) s also used cojucto wth these techques. 2. The approach The approach adopted s a combato of aalytcal ad radom calculato of weght values. Referrg to Fgure 3, the etwork s output s calculated below: y + = LW, k+ LW, 2 k2 +... + LW, m k m by () The output of each hdde euro s tur gve by the followg equato: k j = ta sg( IW j, x + IW j,2 x2 +... + IW j, x + b j ) (2) If a multple lear regresso s performed o the trag data the the resultg aalytcal model would be: 0 + a x + a2 x2 +... + a x = a x + a0 = y= a Comparg equatos 2 ad 3, t s cocluded that the argumet of the hyperbolc taget fucto ca be replaced by the aalytcal model of the multple lear regresso. Ths s accomplshed f: (3) b kj = a 0 (4) IW = a (5) j, The remag weghts betwee the hdde ad the output layer ad the respectve bas are talzed radomly so that the startg pot of the trag algorthm s slghtly dfferet each tme the trag s repeated.

2.2 Results The approach was tested by comparg ts results to the Nguye-Wdrow method. Data orgatg from a bar turg process were used to develop a ANN model ad the umber of requred epochs ad acheved trag error (mea squared error ) were examed. Three dfferet archtectures were vestgated, amely 5x0x, 5x6x, 5x3x, ad the trag results are gve Table. o. 5x0x 5x6x 5x3x Italzato type 466,43E-25 2325,5E-29 5000,26E-06 N-W 2 734,54E-28 344 2,07E-28 5000,60E-06 N-W 3 670,78E-29 0000 2,45E-07 675 2,69E-06 N-W 4 68 2,24E-25 9377,95E-26 5000 6,90E-07 N-W 5 753 8,30E-26 397 3,79E-24 5000,06E-06 N-W 6 765 7,63E-28 0000 3,9E-08 5000 8,38E-07 N-W 7 983 5,90E-24 2565,37E-26 62 8,0E-07 N-W 8 255,E-27 262,92E-28 5000 7,69E-07 N-W 9 256,27E-3 70 6,85E-28 5000 9,3E-07 N-W 0 39 4,74E-24 0000 3,23E-07 5000,60E-06 N-W,E-24 6,08E-08,22E-06 o. 5x0x 5x6x 5x3x Italzato type 95 5,6E-3 235 4,97E-27 6393,03E-06 MLR 2 686,72E-3 473,25E-30 6576,03E-06 MLR 3 750,67E-28 2294 2,8E-29 650,03E-06 MLR 4 328 8,47E-26 666 3,74E-30 0000 6,94E-07 MLR 5 004 5,54E-3 957 3,73E-29 0000 8,28E-07 MLR 6 985,86E-26 247,09E-30 6752,03E-06 MLR 7 032 2,2E-26 926 6,32E-28 6990,03E-06 MLR 8 899 2,52E-28 996 7,54E-3 0000 8,28E-07 MLR 9 903,04E-28 926 2,50E-30 0000 8,28E-07 MLR 0 860 3,7E-3 09 8,82E-29 7030,03E-06 MLR,25E-26 5,76E-28 9,34E-07 Table. Italzato method results It s observed that there s mprovemet both of the examed parameters, whch s proportoately hgher to the complexty of the archtecture. 3. Determato of ANN s archtecture A ANN s archtecture s drectly related to the complexty of the soluto space that t represets. A etwork that s farly smple mght ot be able to lear the teractos uderlyg the trag data, whle a very complex etwork wll memorze them to such extet that t wll o loger be able to respod to ukow data. Obtag the rght archtecture s the most crucal stage the developmet of a ANN model ad gve that there s o theory as to what ths archtecture s or how to obta t, t s also oe of the most dffcult stages to perform. Curret practce volves a tral-ad-error approach, but there are a lot of research efforts volvg the use of evoluto algorthms as well as costructve/decostructve aalytcal techques that try to address ths problem.

3. The approach If the descrbed problem s vewed as a problem of mult-parametrc optmzato, the a geetc algorthm ca be used. The am s to fd the approprate archtecture,.e. the umber of hdde layers ad the umber of euros each oe of them, whch results a ANN model wth good performace. I order to satsfy ths, crtera that quatfy the performace of the model are developed ad are cosequetly tegrated the objectve fucto to be mmzed. These crtera are:. error crtero E = = trag o o, where E trag : trag error, : target value of the -th trag data vector, o ANN s respose to the -th trag data vector ad : umber of trag data. Geeralzato error crtero E = = geeralzato o o, where E geeralzato : geeralzato error, o : target value of the -th testg data vector, : ANN s predcted value for the -th testg data vector ad : umber of testg data. Feedforward archtecture crtero FFAC =, hdde layer ad m 0 +(m-0)*0., hdde layer ad m>0 2, 2 hdde layers ad m 0 ad 0 2+(m-0)*0., 2 hdde layers ad m>0 ad 0 2+(m-0)*0.+(-0)*0.2, 2 hdde layers ad m>0 ad >0, where m ad : umber of euros the st ad 2 d hdde layer respectvely v. speed crtero.5, trspeed=, epochs< 0 epochs> 0 v. Soluto space cosstecy crtero solspc = + x * 0. 33+ y, where x: umber of test cases that the absolute value of the relatve error s the terval [5,25] ad y: umber of test cases that the absolute value of the relatve error s the terval (25, ) :

3.2 Results Usg the same data as for the talzato method testg, the developed method was compared to the results of a expereced researcher that followed the tral-ad-error approach. The model acheved by a expereced huma aalyst was 5x3x. The best objectve fucto value versus the umber of geeratos s show Fgure 4 ad the umber of euros each layer s gve Table 2. log0(f(x)) 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8 0 0 20 30 40 50 60 geerato Best = 0.20897 st hdde layer 2 d h dde la y er 0 3 3 5 2 4 3 4 0 0 6 3 2 0 7 4 9 0 0 9 0 Fgure 4. Best objectve fucto value hstory Table 2. Number of euros each hdde layer As ca be see from the above table, the developed methodology performs as well as a huma expert, but at the same tme t offers advatages such as o requred experece, shorter developmet tme ad systematc selecto of the etwork s parameters. 4. Coclusos By usg the descrbed methodologes, the developmet of a ANN model s facltated ad, more mportatly, t s carred out followg a systematc procedure, rather tha a repettve tral-ad-error procedure wth ucerta results. I both felds (weght talzato ad archtecture determato), the results show a mprovemet over curret practces. Furthermore, the latter case, the focus s prmarly o the geeralzato performace of the ANN ad etwork sze, whch guaratee accurate ad cosstet model predctos. Publcatos. Italsato mprovemet egeerg feedforward ANN models, 3 th Europea Symposum o Artfcal Neural Networks, 27-29 Aprl 2005, Bruges, Belgum. 2. Optmsg feedforward artfcal eural etwork archtecture, Egeerg Applcatos of Artfcal Itellgece, submtted for publcato.