Genetically Engineered Adaptive Resonance Theory (art) Neural Network Architectures

Size: px

Start display at page:

Download "Genetically Engineered Adaptive Resonance Theory (art) Neural Network Architectures"

Magdalen Anna Ross
5 years ago
Views:

University of Centrl Florid Electronic Theses nd Disserttions Doctorl Disserttion (Open Access) Geneticlly Engineered Adptive Resonnce Theory (rt) Neurl Network Architectures 2006 Ahmd Al-Driseh

edu/etd University of Centrl Florid Librries http://librry.ucf.

1 University of Centrl Florid Electronic Theses nd Disserttions Doctorl Disserttion (Open Access) Geneticlly Engineered Adptive Resonnce Theory (rt) Neurl Network Architectures 2006 Ahmd Al-Driseh University of Centrl Florid Find similr works t: University of Centrl Florid Librries Prt of the Computer Engineering Commons STARS Cittion Al-Driseh, Ahmd, "Geneticlly Engineered Adptive Resonnce Theory (rt) Neurl Network Architectures" (2006). Electronic Theses nd Disserttions This Doctorl Disserttion (Open Access) is brought to you for free nd open ccess by STARS. It hs been ccepted for inclusion in Electronic Theses nd Disserttions by n uthorized dministrtor of STARS. For more informtion, plese contct lee.dotson@ucf.edu.

2 GENETICALLY ENGINEERED ADAPTIVE RESONANCE THEORY (ART) NEURAL NETWORK ARCHITECTURES by AHMAD A. AL-DARAISEH B.S. Yrmouk University, 1998 M.S. University of Centrl Florid, 2001 A disserttion submitted in prtil fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Electricl Engineering nd Computer Science in the College of Engineering nd Computer Science t the University of Centrl Florid Orlndo, Florid Spring Term 2006 Mor Professor: Michel Georgiopoulos

3 2006 Ahmd A. Al-Driseh ii

4 ABSTRACT Fuzzy ARTMAP (FAM) is currently considered to be one of the premier neurl network rchitectures in solving clssifiction problems. One of the limittions of Fuzzy ARTMAP tht hs been extensively reported in the literture is the ctegory prolifertion problem. Tht is Fuzzy ARTMAP hs the tendency of incresing its network size, s it is confronted with more nd more dt, especilly if the dt is of noisy nd/or overlpping nture. To remedy this problem number of reserchers hve designed modifictions to the trining phse of Fuzzy ARTMAP tht hd the beneficil effect of reducing this phenomenon. In this thesis we propose new pproch to hndle the ctegory prolifertion problem in Fuzzy ARTMAP by evolving trined FAM rchitectures. We refer to the resulting FAM rchitectures s GFAM. We demonstrte through extensive experimenttion tht n evolved FAM (GFAM) exhibits good (sometimes optiml) generliztion, smll size (sometimes optiml size), nd requires resonble computtionl effort to produce n optiml or suboptiml network. Furthermore, comprisons of the GFAM with other pproches, proposed in the literture, which ddress the FAM ctegory prolifertion problem, illustrte tht the GFAM hs number of dvntges (i.e. produces smller or equl size rchitectures, of better or s good generliztion, with reduced computtionl complexity). Furthermore, in this disserttion we hve extended the pproch used with Fuzzy ARTMAP to other ART rchitectures, such s Ellipsoidl ARTMAP (EAM) nd Gussin ARTMAP (GAM) tht lso suffer from the ART ctegory prolifertion problem. Thus, we hve designed nd experimented with geneticlly engineered EAM nd GAM rchitectures, nmed GEAM nd GGAM. Comprisons of GEAM nd GGAM with other ART rchitectures tht were introduced in the ART literture, ddressing the ctegory prolifertion problem, illustrte similr dvntges observed by GFAM (i.e, GEAM nd GGAM produce iii

5 smller size ART rchitectures, of better or improved generliztion, with reduced computtionl complexity). Moverover, to optimlly cover the input spce of problem, we proposed geneticlly engineered ART rchitecture tht combines the ctegory structures of two different ART networks, FAM nd EAM. We nmed this rchitecture UART (Universl ART). We nlyzed the order of serch in UART, tht is the order ccording to which FAM ctegory or n EAM ctegory is ccessed in UART. This nlysis llowed us to better understnd UART s functionlity. Experiments were lso conducted to compre UART with other ART rchitectures, in similr fshion s GFAM nd GEAM were compred. Similr conclusions were drwn from this comprison, s in the comprison of GFAM nd GEAM with other ART rchitectures. Finlly, we nlyzed the computtionl complexity of the geneticlly engineered ART rchitectures nd we compred it with the computtionl complexity of other ART rchitectures, introduced into the literture. This nlyticl comprison verified our clim tht the geneticlly engineered ART rchitectures produce better generliztion nd smller sizes ART structures, t reduced computtionl complexity, compred to other ART pproches. In review, methodology ws introduced of how to combine the nswers (ctegories) of ART rchitectures, using genetic lgorithms. This methodology ws successfully pplied to FAM, EAM nd FAM nd EAM ART rchitectures, with success, resulting in ART neurl networks which outperformed other ART rchitectures, previously introduced into the literture, nd quite often produced ART rchitectures tht ttined optiml clssifiction results, t reduced computtionl complexity. iv

6 I dedicte this work to the gretest womn on erth, my Mother, to my gretest Fther, to my soul mte my wife, to my dughter NOOR the light of my life, to my son MO MEN the fith of it, nd finlly to my unborn yet bby (Abdllh if mle, Tsneem if femle). I love you ll v

7 ACKNOWLEDGMENTS I would like to express my grtitude to my dvisor, Dr. Michel Georgiopoulos, who gve me his time nd dvice to help me finish this disserttion. Without his encourgement, support, nd guidnce, this disserttion would not hve been published. I would like to thnk my committee members, Dr. Ronld F. Demr, Dr. Kent Willims, Dr. Sheu-Dong Lng, nd Dr. Tkis C. Kspris, for their support nd willingness to serve on my defense exmintion. I would like to thnk Dr. Gerd Brummel my mnger t Siemens PG. Dr. Brummel ws one of those who lwys encourged me nd stood by my side until I finished this work. Thnk you very much Dr. Brummel. I would like to thnk my prents nd wife for their help nd support, nd my kids Noor nd Mo men for their disturbnce. I lso would like to thnk my friends nd brothers who helped me nd pryed for me. Specil thnks to UCF nd to those who re working hrd to improve it nd mke it better, plese don t rise your tuition fees nymore. vi

8 TABLE OF CONTENTS LIST OF FIGURES...x LIST OF TABLES... xiii LIST OF ACRONYMS/ABBREVIATIONS...xiv 1. INTRODUCTION ART, Fetures nd Limittions Genetic Algorithms nd Neurl Networks Combintion Using GA with MLP NN Using GAs with Other NN Models (other thn MLP-NNs) Using GAs with ART NNs Motivtion Reserch Overview Using GAs to Evolve ART Architectures Universl ART (UART) Experiments nd Comprisons Anlysis User Interfce Development BACKGROUND Fuzzy ARTMAP (FAM) FAM Ctegory Geometricl Representtion FAM Opertions nd Prmeters Ellipsoidl ARTMAP (EAM) EAM Ctegory Geometricl Representtion EAM Opertions nd Prmeters Gussin ARTMAP (GAM) GAM Opertions nd Prmeters Genetic Algorithms Chromosome Representtion Genetic Opertors Selection GENETIC FUZZY ARTMAP (GFAM) Justifiction of the Evolutionry Choices for GFAM Justifiction of the Fitness Function Choice for GFAM Justifiction of the Genetic Opertors Choices for GFAM Experiments with GFAM Dtbses Experimentl Procedure Experimentl Results GFAM Performnce Performnce Comprisons of GFAM nd other ART Networks Performnce Comprisons of GFAM nd Other Neurl Networks GFAM Summry nd Conclusions GEAM AND GGAM Genetic Ellipsoidl ARTMAP (GEAM) GEAM Experiments nd Results GEAM Performnce Performnce Comprisons of GEAM nd other ART Networks Summry/Conclusions Genetic Gussin ARTMAP (GGAM) GGAM Experiments nd Results...84 vii

9 GGAM Performnce Performnce Comprisons of GGAM nd other ART Networks Summry/Conclusions UNIVERSAL ART (UART) UART Design Performnce Phse of UART Geometry Selection Phse (Genetic Phse) of UART Results of UART UART Performnce Performnce Comprisons of UART nd other ART Networks Performnce Comprisons of UART nd other Genetic ART Networks UART Summry ANALYSIS UART Order of Serch Anlysis Time Complexity Anlysis USER INTERFACE GFAM User Interfce GFAM Controls GFAM UI Abstrct Design GFAMForm Obect GrphForm Obect FNode Obect PtrnNode Obect Chrom Obect Ct Obect AddCtForm Obect GEAM User Interfce GEAM Controls GEAM UI Abstrct Design GEAMForm Obect GrphForm Obect ENode Obect PtrnNode Obect Chrom Obect Ct Obect AddCtForm Obect GGAM User Interfce GGAM Controls GGAM UI Abstrct Design GGAMForm Obect GrphForm Obect GNode Obect PtrnNode Obect Chrom Obect Ct Obect AddCtForm Obect UART User Interfce UART Controls UART UI Abstrct Design UARTForm Obect viii

10 GrphForm Obect FNode, ENode nd PtrnNode Obects Chrom Obect Ct Obect AddCtForm Obect SUMMARY/CONTRIBUTIONS, AND FUTURE WORK Summry/Contributions Future Work APPENDIX A: TERMINOLOGY APPENDIX B: FAM STEP-BY-STEP TRAINING & TESTING APPENDIX C: EAM STEP-BY-STEP TRAINING & TESTING APPENDIX D: GAM STEP-BY-STEP TRAINING & TESTING APPENDIX E: USER MANUAL REFERENCES ix

11 LIST OF FIGURES Figure 2-1: Simple FAM Architecture...13 Figure 2-2: A rectngle representtion of FAM ctegory tht lerned seven input ptterns 15 Figure 2-3: The distnce of n input pttern from the rectngle R is the minimum distnce of the pttern from the border of the rectngle R...16 Figure 2-4: FAM Lerning,. A ctegory with 0 size; b. Introducing new pttern 2; c. The ctegory expnds to include 2; d. Since 3 is inside the ctegory, it doesn t chnge its size; e. Pttern 4 is presented; f. Since 4 is outside the ctegory, the ctegory is expnded to include 4, within its boundries Figure 2-5: Simple EAM Architecture...20 Figure 2-6: An EAM ctegory tht encodes 3 ptterns...22 Figure 2-7: Cretion nd expnsion of n EAM ctegory...26 Figure 2-8: A 2D GAM ctegory tht encodes 5 ptterns within 2 stndrd devitions...28 Figure 2-9: One-point crossover...33 Figure 2-10: Two-point crossover...33 Figure 2-11: Uniform crossover...34 Figure 3-1: GFAM chromosome structure...38 Figure 3-2: Crossover implementtion...40 Figure3-3: 3D plot of log(fit(p))...43 Figure 3-4: : Problem 1 (Four squres in squre problem), b: (Asymmetric squres within squre problem), c: Problem 3 (Two circles in squre problem)...47 Figure 3-5: Averge Fitness vlue of the Best FAM produced by GFAM for Problem 1. The verge is computed over the 50 runs. The verge fitness vlues re shown with respect to ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the three different vlues of the muttion probbility...49 Figure 3-6: Averge Fitness vlue of the Best FAM produced by GFAM for Problem 2. The verge is computed over the 50 runs. The verge fitness vlues re shown with respect to ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the three different vlues of the muttion probbility...50 Figure 3-7: Averge Fitness vlue of the Best FAM produced by GFAM for Problem 2. The verge is computed over the 50 runs. The verge fitness vlues re shown for ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the three different vlues of the muttion probbility...50 Figure 3-8: Averge Fitness vlue of the Best FAM produced by GFAM for Problems 1, 2 nd 3. The verge is computed over the 50 runs. The verge fitness vlues re shown for ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the two different non-zero vlues of the muttion probbility...51 Figure 3-9: Gussin Dtbses (2-dimensionl, 2, 4 or 6 clss, 5, 15, 25 nd 40 % of overlp)...56 Figure 3-10: Structures within Structure Dtbses...56 Figure 3-11: Performnce nd Size comprison of GFAM vs ssfam...61 Figure 3-11b: Performnce nd Size comprison of GFAM vs sseam...61 Figure 3-11b: Performnce nd Size comprison of GFAM vs ssgam...62 Figure 3-11d: Performnce nd Size comprison of GFAM vs microartmap...62 Figure 4-1: GEAM chromosome structure...69 Figure 4-2: Performnce nd Size comprison of GEAM vs ssfam...76 Figure 4-2b: Performnce nd Size comprison of GEAM vs sseam...76 Figure 4-2c: Performnce nd Size comprison of GEAM vs ssgam...77 x

12 Figure 4-2d: Performnce nd Size comprison of GEAM vs microartmap...77 Figure 4-3: GGAM Chromosome Structure...82 Figure 4-4: Performnce nd Size comprison of GGAM vs ssfam...90 Figure 4-4b: Performnce nd Size comprison of GGAM vs sseam...90 Figure 4-4c: Performnce nd Size comprison of GGAM vs ssgam...91 Figure 4-4d: Performnce nd Size comprison of GGAM vs microartmap...91 Figure 5-1: These figures show wht hppens when using unsuitble clssifiers for certin problem Figure 5-2: clssifiction problem where the boundries cn t be optimlly covered by FAM, GAM or EAM s ctegories Figure 5-3: Using UART to solve the problem in figure 5-2, notice tht prts of the problem spce re not covered, but becuse UART encourges smller size, it might scrifice little ccurcy to get optiml size...95 Figure 5-4: A simple UART structurl digrm during the trining phse...98 Figure 5-5: A simple UART structurl digrm during the performnce phse...99 Figure 5-6: GFAM chromosome structure Figure 5-7: Crossover implementtion Figure 5-8: Performnce nd Size comprison of UART vs ssfam Figure 5-8b: Performnce nd Size comprison of UART vs sseam Figure 5-8c: Performnce nd Size comprison of UART vs ssgam Figure 5-8d: Performnce nd Size comprison of UART vs microartmap Figure 5-9: Performnce nd Size comprison of UART vs GFAM Figure 5-9b: Performnce nd Size comprison of UART vs GEAM Figure 5-9c: Performnce nd Size comprison of UART vs GGAM Figure 6-1: Cse # 1, pttern inside both FAM ctegory nd n EAM ctegory Figure 6-2: Cse # 2, pttern inside FAM ctegory but outside n EAM ctegory Figure 6-3: Cse # 3, pttern inside n EAM ctegory but outside FAM ctegory Figure 6-4: Cse # 4, pttern outside both FAM nd EAM ctegories Figure 7-1: GFAM User interfce Figure 7-2: An open dilogue window, llows the user to select the trining, vlidting nd testing files Figure 7-3: A dilogue box tht displys the results fter n interruption of the process Figure 7-4: A 2D Grph tht displys the dt points s well s the ctegories Figure 7-5: Sme grph s in figure 7-4 but displying the ctegories only Figure 7-6: A 2D grph displying the clssifiction borders of this GFAM s well s the ctegories Figure 7-7: After pushing the Del All button, the GFAM does not hve ny more ctegories Figure 7-8: This figure shows figure 7-7, but only displying the ctegories (none in this cse) Figure 7-9: An dd ctegory dilogue box Figure 7-10: This figure is the sme s figure 7-9 but fter filling in some vlues Figure 7-11: This figure shows the mnully dded ctegory Figure 7-12: This figure shows the clssifiction borders of the mnully dded ctegory..142 Figure 7-13: This figure shows the vlues of the endpoints of the second mnully dded ctegory Figure 7-14: This figure shows the two mnully dded ctegories Figure 7-15: This figure shows the clssifiction borders of the mnully dded ctegories143 Figure 7-16: GEAM user interfce xi

13 Figure 7-17: 2D grph displying GEAM network; note here ellipsoids re represented by circles Figure 7-18: A 2D grph showing the clssifiction borders of the GEAM network Figure 7-19: This figure shows the GEAM network fter pushing the Del All button Figure 7-20: An dd GEAM ctegory dilogue box Figure 7-21: Mnully filling in vlues for the first ctegory Figure 7-22: GEAM network fter mnully dding ctegory Figure 7-23: Clssifiction borders of the mnully dded ctegory Figure 7-24: Mnully dding new ctegory Figure 7-25: GEAM network fter dding the second ctegory Figure 1-26: Clssifiction borders of the GEAM network Figure 7-27: GGAM user interfce Figure 7-28: A 2D grph showing GGAM network; note here GGAM ctegory is represented by lrge dot Figure 7-29: A 2D grph showing the clssifiction boundries of the GGAM network Figure 7-30: After deleting ll the ctegories Figure 7-31: An dd GGAM ctegory dilogue box Figure 7-32: Add ctegory dilogue box with numbers in the vilble boxes Figure 7-33: A figure showing the mnully dded ctegory Figure 7-34: The clssifiction boundries corresponding to the mnully dded ctegory.165 Figure 7-35: Filling in numbers for the second ctegory Figure 7-36: The GGAM network fter dding the second ctegory Figure 7-37: The clssifiction boundries of the GGAM fter dding two ctegories Figure 7-38: UART user interfce Figure 7-39: A UART network fter rndomly mixing FAM ctegories with EAM ctegories Figure 7-40: A 2D grph showing only the ctegories Figure 7-41: The clssifiction borders of the UART network fter the rndom mixing Figure 7-42: After deleting ll the ctegories of UART Figure 7-43: An dd ctegory dilogue box Figure 7-44: Filling in dt for n EAM ctegory Figure 7-45: UART fter mnully dding n EAM ctegory Figure 7-46: The clssifiction boundries of UART fter ddition Figure 7-47: Filling in dt for FAM ctegory Figure 7-48: UART fter mnully dding n EAM nd FAM ctegory Figure 7-49: UART clssifiction boundries fter the mnul ddition Figure e-1: Error messge xii

14 LIST OF TABLES Tble 3-1: The vlues of the probbilities for muttion, ctegory dd, nd ctegory delete used in the experiments to determine good vlues for the GA prmeters...46 Tble 3-2: For ech problem (dtbse) we rn 3 experiments. For ech experiment we used the depicted combintions of number of genertions, nd popultion size (3 combintions). We evolved the trined Fuzzy ARTMAPs 50 different times (50 rndom seeds), nd for ech time we used the combintions of probbility vlues, shown in Tble 3-1. Hence, the FAMs were evolved 1350 times for ech problem, or totl of 4050 times for ll the problems Tble 3-3: Dtbses used in the Genetic ARTMAP experiments...55 Tble 3-4: Accurcy nd size results chieved by GFAM nd other ART networks. Note tht:sfe uam: Sfe microartmap; FAM: Fuzzy ARTMAP; EAM: Ellipsoidl ARTMAP; GAM: Gussin ARTMAP; ss*: semi-supervised version...63 Tble 3-5: Accurcy nd size results chieved by GFAM nd other ART networks. Note: dfam: Distributed Fuzzy ARTMAP, FsART, dfsart : Distributed FsART, GFAM : Genetic Fuzzy ARTMAP...65 Tble 4-1: Accurcy nd size results chieved by GEAM nd other ART networks. Note tht:sfe uam: Sfe microartmap; FAM: Fuzzy ARTMAP; EAM: Ellipsoidl ARTMAP; GAM: Gussin ARTMAP; ss*: semi-supervised version...78 Tble 4-2: Accurcy nd size results chieved by GGAM nd other ART networks. Note tht:sfe uam: Sfe microartmap; FAM: Fuzzy ARTMAP; EAM: Ellipsoidl ARTMAP; GAM: Gussin ARTMAP; ss*: semi-supervised version...89 Tble 5-1: UART performnce nd size compred to other ART rchitectures Tble 5-2: UART performnce nd size compred to other genetic ART rchitectures xiii

15 LIST OF ACRONYMS/ABBREVIATIONS NN: Neurl Network ART: Adptive Resonnce Theory FAM: Fuzzy ARTMAP Ctegory Prolifertion: An ART limittion tht cuses it to crete more ctegories when confronted with noisy/overlpping dt GA: Genetic Algorithm GFAM: Genetic Fuzzy ARTMAP EAM: Ellipsoidl ARTMAP GEAM: Genetic Ellipsoidl ARTMAP GAM: Gussin ARTMAP GGAM: Genetic Gussin ARTMAP UART: Universl ART stbility plsticity Dilemm: Creting stble neurl network tht cn lern new inputs without relerning MLP: Multi-Lyer Perceptron BP: Bck Propgtion RBF : Rdil Bsis Function UI: User Interfce Committed node: A node tht hs estblished connection to n output node (clss) CCF: Ctegory Choice Function CMF: Ctegory Mtch Function PCC: Percent Correct Clssifiction xiv

16 1. INTRODUCTION The min focus of this disserttion is to present methodology of how to use genetic lgorithms (GA) to construct optiml or sub-optiml ART networks tht solve deficiencies existing in current ART models, such s ctegory prolifertion, poor coverge of the problem s input spce, nd lrge dependency on prmeters. This tsk is ccomplished while t the sme time producing geneticlly engineered ART rchitectures tht improve generliztion t reduced computtionl cost. The vrious sections in this introduction give the reder better understnding of the min components of this disserttion long with some relted literture. 1.1 ART, Fetures nd Limittions The Adptive Resonnce Theory (ART) ws developed by (Grossberg, 1976) s solution to the stbility plsticity dilemm. One of the most celebrted ART rchitectures is FAM (short for Fuzzy ARTMAP) (Crpenter et l, 1992), which hs been successfully used in the literture for solving vriety of clssifiction problems. Some of the dvntges tht Fuzzy ARTMAP possesses is tht it cn solve rbitrrily complex clssifiction problems, it converges quickly to solution (within few presenttions of the list of the input/output ptterns belonging to the trining set), it hs the bility to recognize novelty in the input ptterns presented to it, it cn operte in n on-line fshion (new input/output ptterns cn be lerned by the system without re-trining with the old input/output ptterns), nd it produces nswers tht cn be explined with reltive ese. Despite ll the gret fetures tht Fuzzy ARTMAP possesses, it suffers from the ctegory prolifertion problem, especilly when it is confronted with dt tht re of noisy nd/or overlpping nture. Quite often the ctegory prolifertion problem, observed in Fuzzy ARTMAP rchitectures, is connected with the issue of overtrining in Fuzzy ARTMAP. Over-trining hppens when Fuzzy ARTMAP is trying to lern the trining dt perfectly t 1

17 the expense of degrded generliztion performnce (i.e., clssifiction ccurcy on unseen dt) nd lso t the expense of creting mny ctegories to represent the trining dt (leding to the ctegory prolifertion problem). Another reson for the ctegory prolifertion problem is tht ART rchitectures rely on single ctegory structure (hyper-rectngles) to represent the dt in the input spce. Geneticlly engineered ART rchitectures tht combine more thn one ctegory structure (such s hyper-rectngles nd ellipsoids) ddress this problem. A number of uthors hve tried to ddress the ctegory prolifertion problem in Fuzzy ARTMAP by tckling either the trining method or the geometricl representtion tht FAM rchitectures use. Amongst them we refer to the work by Mrriott ( Mrriott nd Hrrisson, 1995), where the uthors eliminte the mtch trcking mechnism of Fuzzy ARTMAP when deling with noisy dt, the work by Chrlmpidis (Chrlmpidis, et l., 2001), where the Fuzzy ARTMAP equtions re ppropritely modified to compenste for noisy dt, the work by Verzi (Verzi, et l., 2001), (Angnostopoulos, et l., 2003 & 2001), nd (Gomez- Snchez, et l., 2002 & 2001), where different wys re introduced of llowing the Fuzzy ARTMAP ctegories to encode ptterns tht re not necessrily mpped to the sme lbel, the work by Koufkou (Koufkou, et l., 2001), where cross-vlidtion is employed to void the overtrining/ctegory prolifertion problem in Fuzzy ARTMAP, nd the work by Crpenter (Crpenter, 1998), Willimson (Willimson, 1997), Prrdo-Hernndez (Prrdo-Hernndez, et l., 2003), where the ART structure is chnged from winner-tke-ll to distributed version nd simultneously slow lerning is employed with the intent of creting fewer ART ctegories nd reducing the effects of noisy ptterns. Another limittion with ART rchitectures is tht their performnce depends on number of network prmeters, nd sometimes it becomes computtionlly expensive to discover set of network prmeters tht produces network with good generliztion nd 2

18 smll size. Wht mkes this issue more difficult is tht the best network prmeters re problem dependent. As it will be seen lter, geneticlly engineered ART rchitectures solve this problem s well. 1.2 Genetic Algorithms nd Neurl Networks Combintion Genetic lgorithms re clss of popultion-bsed stochstic serch lgorithms tht re developed from ides nd principles of nturl evolution. An importnt feture of these lgorithms is their popultion bsed serch strtegy. Individul chromosomes in popultion compete, exchnge nd modify informtion with ech other in order to perform certin tsks. Genetic lgorithms hve been widely used to evolve rtificil neurl networks. For thorough exposition of the vilble reserch literture in evolving neurl networks the interested reder is dvised to consult (Yo, 1999). In (Yo, 1999) the uthor distinguishes three different strtegies in evolving neurl networks. The first strtegy is the one used to serch for the weights of the neurl network, the second one is the one used to design the structure of the network, nd the third one is the one where the lerning rules of the neurl network re evolved. Exmples of the first strtegy re ones where the weights found by the genetic lgorithm re used in the neurl network without further refinement (Whitley, et l., 1989 nd 1990). An lterntive to this strtegy is to use the pertinent neurl network lerning to further refine the weights tht the GA lgorithm produces (Belew, et l., 1991 nd Lee, 1996). Beside serching for weights, GAs my lso be used to select the fetures tht re input to the neurl network. Since the pioneering work by Siedlecki nd Sklnski (Siedleck nd Sklnsky, 1989), genetic lgorithms hve been used for mny selection problems using neurl networks (Brotherton, Simpson, 1994, Yng nd Honvr, 1998), nd other clssifiers, such s decision trees (Bl, et l., 1996), k-nerest neighbors (Kelly nd Dvis, 1991, Punch, et l., 1993), nd nïve Byes clssifiers (Inz, et l., 1999, Cntu-Pz, 2002). As it hs been 3

19 verified in the literture the topology of the neurl network is crucil to its performnce. If network hs too few nodes it might not be ble to lern the required tsk. On the other hnd, if the network hs too mny nodes it my overfit the trining dt nd thus exhibit poor generliztion. Miller, Todd nd Hedge (Miller, et l., 1989) defined two mor pproches to use GAs to design the topology of the neurl networks: use of direct encoding to specify every connection of the network, or to evolve n indirect specifiction of the connectivity. The direct encoding GA pproch implies tht every connection between every node be directly represented in chromosoml string. Direct encoding hs been used effectively to prune neurl networks with good results (Whitley, et l., 1990, Hncock, 1992). On the other hnd, simple indirect encoding method is to commit to prticulr topology (e.g., feedforwrd or recurrent NN) nd lerning lgorithm (e.g. bck-prop lerning lgorithm), nd then use GA to find the prmeter vlues tht complete the network specifiction. For exmple, the GA in the feed-forwrd neurl network pproch cn serch for the number of lyers nd the number of units (nodes) per lyer. The indirect encoding scheme is fr more sophisticted while being theoreticlly cpble of representing complicted topologies with finesse. It encodes the most importnt prmeters nd leves the reminder to be determined elsewhere. Hrp, et l. (see Hrp, et l., 1989) used segments of two prts in n encoding scheme entitled blueprints. The first segment held prmeter specifictions including ddress, orgniztion nd number of nodes, nd lerning prmeters ssocited with the nodes. The second segment described the connections between themselves by specifying the density between the current re nd the trget re, the trget s re ddress, orgniztion of the connections, nd prmeters of lerning ssocited with the connection weights Using GA with MLP NN A thorough study of the literture shows tht most of the rticles tht involved GA nd NN used Multi-Lyer Perceptron (MLP) NN. For instnce, Hruschk (Hruschk, et l., 4

20 2000) used GA to extrct clssifiction rules from MLP NN (specificlly, they used the ctivtion functions of the hidden nodes to extrct the rules). The genetic lgorithm ws utilized to cluster the ctivtion functions nd hence to extrct the rules. Krsniewicz (Krsniewicz, et l., 2000) investigted different methods in constructing GA/MLP NN in distributed environment. It is known fct tht trining NN could tke long time, nd s such combining NNs with GAs mplifies this problem (since mny NNs re needed to be trined when GAs re involved). This problem triggered the ide of using distributed (prllel) system, in Krsniewicz, et l., 2000). Sntos (Sntos, et l., 2000) used GAs to extrct comprehensible rules (IF-Then Sttements) from MLP NN s. In his work the GA ws used to find the NN tht provides us with the best rules. Yen (Yen, et l., 2000) used hierrchicl GA to construct n MLP NN. This method, s Yen climed, solved the Network Fesibility problem, s well s the problem of mpping multiple phenotypes by genotypes, tht ppers when using GAs to construct NNs. Fieldsend (Fieldsend, et l., 2005) proposed new method clled Preto ENN (evolutionry Neurl Network) to construct multi-obective optimized NN tht is cpble of time series forecsting. The ide of his pproch ws to keep set of F of chromosomes (NNs) t ech genertion, ech one of which is the best performer regrding specific mesure. In the next genertion if some chromosomes re better thn their counter prts in F then we replce the chromosomes in F. This process continues until mximum number of genertions hs pssed. Mnic (Mnic, et l., 2002) used combintion of GAs with grdient descent to find the best set of weights. In his pproch the GA is used to find sub-optiml set of weights, nd then the grdient descent method is used to fine tune this set. Leung (Leung, et l., 2003) proposed modified GA nd modified BP-NN nd used the new GA method to tune the prmeters nd the structure of the NN. The GA genertes four off-springs from ech couple of prents, while the MLP NN uses switches for its links. 5

21 1.2.2 Using GAs with Other NN Models (other thn MLP-NNs) Vrious uthors used other NN models with GAs. Xiuu (Xiuu,et l., 2002) proposed new RBF NN (Rdil Bsis Function ) tht uses GA to select clss-dependent fetures (i.e. different hidden nodes re connected to different fetures bsed on the output of the GA). Ling-Hsun (Ling-Hsun, et l., 2002) proposed new Intelligent Control System ICS, bsed on multi-obective genetic lgorithm. The ide is to use the GAs to find optiml or suboptiml ctions when needed. Bsed on the principle of the nerest neighbor lgorithm nd NN, Ishibuchi (Ishibuchi, et l., 1997) proposed method to construct rule-bsed clssifiction system using GA. His gol ws to generte set of fuzzy rules tht minimize the error, reection rtio nd the number of rules. To crete set of rules every trining pttern is used s seprte rule. In the cse were the trining ptterns re mny, the designers usully use other mens to compct the number of trining ptterns (rule s centers). Ghosh (Ghosh, et l., 2002) presented new method to construct Error BckProp Neurl Networks EBP NN. The ide is to use mix of Genetic Algorithm nd the Lest Squre Method to tune the weights nd the number of hidden nodes of the network. Rovithkis (Rovithkis, et l., 2004) presented procedurl method to construct HONN (High-Order Neurl Network) using GAs. This method ws then used to construct HONN for function pproximtion purposes. Chi-Feng (Chi-Feng, et l. 2004) used new evolution method to trin recurrent neurl network, mixture of GA nd PSO (prcticl swrming theory) ws used to evolve the weights of the NN. The ide is to hve the GA serch for optiml solution through its crossover nd muttion opertors nd then to use the PSO enhncement of the genertion to crete n even better genertion of solutions Using GAs with ART NNs There re only hndful of rticles tht used ART NNs with GAs. One of the methods of combining NNs with GAs is to use the NN s fitness function evlutor for the 6

22 GA, Burton (Burton, et l., 1997) used n Adptive Resonnce Theory (ART) NN s fitness evlutor. The uthors used GA to compose new musicl rhythms, to see how similr these newly creted ones re to the existing ones the uthors used n ART network to clssify the new rhythms. It is well known fct tht the lrger the number of fetures in specific problem, the more complex nd time consuming the clssifiction process becomes. Quite often lrge number of fetures compromises the ccurcy of the clssifier, s well. Plnippn (Plnippn, et l., 2002) developed new method tht uses Genetic Algorithms to select fetures tht re then used s n input to Fuzzy ARTMAP clssifier. This method ws pplied on rel world problem. Hui (Hui, tl l., 2003) climed in their pper tht the effect of different fetures is different from one to nother nd hence multiplying some fetures by certin fctor (impulsive force) could improve the generliztion of the network. Hence, Hui proposed new ART rchitecture clled Impulsive Fuzzy ART (IFART), nd used GA to find the right impulsive forces. In ll the rticles presented bove (section to 1.2.3), there is one common fct. Using GAs to evolve NNs yields optiml or sub-optiml neurl network structures. It ws lso common tht the results of using GAs with MLP-NNs were very successful, nd when compred to the originl network the genetic model lwys gve better results. 1.3 Motivtion After creful study of the literture presented bove nd in the next chpter, it is very evident tht ARTMAP rchitectures suffer from the ctegory prolifertion, problem. It is lso evident tht using GAs to evolve NNs hs been very successful pproch to find 7

23 optiml topologies nd weight sets. For the bove resons, we decided to investigte the use of GA s to evolve ART rchitectures. 1.4 Reserch Overview In this disserttion the work ws divided into five mor efforts, presented in the following five sections Using GAs to Evolve ART Architectures In this proect, GAs were successfully used to simultneously evolve the weights, s well s the topology of three ART neurl networks, nmely: FAM, EAM nd GAM. But in contrst to the feed-forwrd neurl networks tht hve been extensively evolved, ART neurl networks hve number of topologicl constrints, such s () they consist of one hidden lyer of nodes, nd (b) every interconnection weight vlue from every node of the input lyer to node in the hidden lyer is importnt (representing the minimum or the mximum of the vlues of input ptterns cross every dimension tht were encoded by this node). Consequently, the only element of n ART topology tht cn be evolved is the number of nodes in the hidden lyer. Furthermore, lthough we could strt from n initil popultion of rndomly chosen number nd vlues of the weights, in our ppliction we strt with popultion of trined ART networks, whose number of nodes in the hidden lyer nd the vlues of the interconnection weights converging to these nodes re fully determined (t the beginning of the evolution) by the specific ART trining rules. To this initil popultion of networks, GA s re pplied to modify these trined network s rchitectures (number of nodes in the hidden lyer, nd vlues of the interconnection weights) in wy tht encourges better generliztion nd smller size rchitectures. It is worth reminding the reder tht s with mny neurl network rchitectures, the knowledge in ART networks is stored in their interconnection weights tht hve very 8

24 interesting geometricl interprettion (see Angnostopoulos, et l., 2001). For exmple, the interconnection weights in FAM (converging to the nodes in the hidden lyer) represent the lower nd upper end-points of hyper-rectngles (referred to s ctegories) tht enclose within their boundries clusters of dt tht re mpped to the sme lbel. Eventully, the evolution of these trined networks produces n ART rchitecture, referred to s GFAM, GEAM or GGAM, nd extrcted from the lst genertion s the network tht ttined the highest fitness vlue. The GFAM, GEAM or GGAM network ttined better generliztion performnce nd smller size thn the networks tht we strted with in the initil GA popultion. It is pprent tht in evolving neurl network rchitectures one hs to decide on the genotype representtion scheme for the neurl network rchitecture under considertion, on the genetic opertors used to evolve these neurl network rchitectures nd on the fitness function used to guide this evolution. In this disserttion we ddress these issues in mnner tht fits the chrcteristics of the ART neurl networks nd our ultimte obective of reducing ctegory prolifertion in ART, while we preserve good (sometimes optiml) generliztion performnce Universl ART (UART) In the first section of this introduction, we relted the ctegory prolifertion problem in prt to the fct tht ny ART or ARTMAP module, introduced into the literture, uses only one ctegory representtion to cover the input spce of the problem. In this effort new ARTMAP rchitecture is proposed. Universl ART (UART) is new rchitecture tht hs the potentil of combining multiple ctegory representtions in one network. The current version combined both EAM nd FAM to crete network tht covers the input spce with hyperrectngles nd/or hyper-ellipsoids when needed. This rchitecture benefits from the use of GAs to select the best ctegories for specific problem. 9

25 UART trins hlf the of the GA popultion s FAM networks nd the other hlf s EAM networks. Through process clled shuffling, the ctegories of ll the networks re then redistributed mongst the chromosomes to ensure fir fitness evlution of the initil popultion. UART then uses stndrd GA steps to find n optiml or sub-optiml network, tht consists from FAM ctegories only, EAM ctegories only or possibly combintion of both types. The selected ctegories rely hevily on the nture of the problem t hnd nd on the rndom seeds used during the GA serch Experiments nd Comprisons Extensive experimenttion ws conducted with GFAM, GEAM, GGAM nd UART. The gol of this experimenttion ws to show the superiority of these models versus their existing ART counterprt rchitectures. The comprison ws bsed on the ccurcy of the rchitectures nd size of the rchitectures produced by these techniques, s well s the computtionl effort involved in producing these rchitectures. Another gol of the experimenttion ws to discover good, defult set of GA prmeter vlues to evolve the ART neurl networks with Anlysis This effort is ctully divided into two seprte sub-efforts:. An order of serch nlysis of the UART rchitecture, nd b. A time complexity nlysis of the genetic pproch in finding n optiml network nd the trditionl pproch of finding the best network which we refer to s exhustive serch. In effort, we proved four theorems tht explin the order ccording to which FAM versus n EAM ctegory re serched. These theorems explin which ctegory will represent n input pttern if the input pttern ws 1. Inside both ctegories, 2. Inside the FAM ctegory but outside the EAM ctegory, 3. Inside the EAM ctegory but outside the FAM ctegory, 4. Outside both ctegories. Then we drew twelve results bsed on these theorems. 10

26 In effort b, we present the pseudo code for the genetic pproch nd then nlyze its time complexity. Then we present the exhustive serch pseudo code nd nlyze its time complexity. From this comprison it is discovered tht the genetic pproch time complexity 4 7 of O (N ) is much better thn tht of the exhustive serch of ( N ) O User Interfce Development A mor effort through out this reserch ws devoted to the design, implementtion nd testing of the user interfce (UI). In fct, four different progrms were developed nmely GFAM UI, GEAM UI, GGAM UI nd UART UI. The following re few of the mny requirements these progrms hd to hve: Cpble of creting vrible number of ARTMAP networks (chromosomes). Cpble of coding ARTMAP networks to chromosomes nd vice vers. Cpble pplying genetic lgorithms on the creted chromosomes. Cn run one or multiple genertions t time (user defined). Cn disply 2D grphs of the ctegories nd the input ptterns (2D problems only). Cn disply 2D grph of the clssifiction borders for specific network (2D problems only). Allows the user to insert nd delete specific ctegories from network. Cn log different levels of detils to log files. The bove requirements nd others were successfully designed, implemented nd tested for ll of the four rchitectures. The orgniztion of this disserttion is s follows: In chpter 2 we present bckground informtion relted to FAM, EAM nd GAM rchitectures, nd we lso present the evolutionry computtion (EC) concept, its vritions nd pplicbility. In chpter 3, we describe ll the necessry elements of evolving FAM rchitectures, s well s the experiments 11

27 nd results. In chpter 4, we introduce GEAM nd GGAM, long with their results nd ssocited comprisons. In chpter 5, we propose the new rchitecture UART. In chpter 6, we present the time complexity nd the order of serch nlysis. In chpter 7, we go through the detils of developing the UI progrms used to evolve the different ART rchitectures. In chpter 8, we summrize our contributions, nd we provide directions for future reserch. 12

28 2. BACKGROUND In this chpter the min building blocks of this reserch re presented in detil. In prticulr, the following sections give the reder thorough understnding of FAM, EAM, GAM nd EA. It is worth mentioning, tht there re semi-supervised versions of the bove ART modules nmely: ssfam, sseam nd ssgam. ssfam, sseam nd ssgam performnce ws compred with the performnce of geneticlly engineered ART networks. The only difference between the two versions FAM nd ssfam, or EAM nd sseam, or GAM nd ssgam is tht the semi-supervised versions llow the ctegories to encode ptterns tht go to mixture of lbels, provided tht the mority lbel is bove certin threshold. This semi-supervised feture reduced the size of the ART network nd incresed its generliztion ccurcy (see Angnostopoulos, et l, 2003). In our comprisons of the evolved ART models, we used semi-supervised models ART to compre to. Figure 2-1). 2.1 Fuzzy ARTMAP (FAM) The Fuzzy ARTMAP rchitecture consists of three lyers or fields of nodes (see b F 2 Output Lbels b W F 2 Ctegory Nodes w Mtch Trcking F 1 I = (, c ) Figure 2-1: Simple FAM Architecture 13

29 These lyers re input lyer ( F 1 ), the ctegory representtion lyer ( F 2 ), nd the b output lyer ( F 2 ). The input lyer of FAM is the lyer where inputs re pplied. An input pplied to F 1 is vector I of dimensionlity M 2 of the following form, I i M 2-1 c c c c c = (, ) = ( 1,..., M, 1,..., 1,..., M ); i = 1 i, 1 where is vector whose components lie in the intervl [ 0,1]. Thus, lyer contins 2M nodes, one node for ech component of the input pttern I. The index i ( 1 i 2M ) designtes generic node in lyer F 1. The lyer F 2 F 1 is lyer tht of FAM is referred to s the ctegory representtion lyer, becuse this is where ctegories (or groups) of input ptterns re formed. Finlly, the output lyer (lyer b F 2 ) is the lyer tht produces the outputs of the network. Every node in the output lyer of FAM represents one of the lbels of the pttern recognition tsk. The index k 1 k N ) designtes generic node ( b in F b 2 ; N represents the highest index needed to represent ll the lbels of the pttern b clssifiction tsk t hnd. FAM stores the lerned knowledge in its interconnections weights. There re two vectors of FAM weights tht re worth mentioning: () The vector of weights w = ( w 1, w 2,..., w, 2 nd converge to ll the nodes in M ), clled templte, whose components emnte from node in F 1 ; w represents the group of input ptterns tht chose node in the ctegory representtion lyer of FAM s their representtive node nd this b node encoded them, nd (b) the vector of weights, denoted by W = W, W,..., W ), emnting from every node in the In FAM, if ll the components of indiction tht node in F 2 b W ( 1 2, Nb lyer of FAM nd converging, to ll the nodes in re equl to 0, except componentw F 2 is mpped to lbel k in F b 2. b k, this is n F 2 b F 2. 14

30 2.1.1 FAM Ctegory Geometricl Representtion It is very importnt to point out tht the templtes in FAM (i.e., the w s) hve n interesting geometricl interprettion. Tht is every templte in FAM cn be thought of s hyper-rectngle, whose boundries enclose ll the input ptterns tht were encoded by the templte during FAM s trining phse. For instnce, in Figure 2-2, the 2D hyper-rectngle R of templte w is shown s including within its boundries 7 input ptterns encoded by it. As it cn be seen from Figure 2-2, c (i.e. u, v ), nd it cn be shown tht w = ( u, ( v ) ). R is completely defined by its two endpoints R v u Figure 2-2: A rectngle representtion of FAM ctegory tht lerned seven input ptterns To geometriclly describe the FAM equtions we need the concepts of the size of hyper-rectngle, nd the distnce of n input pttern I from the hyper-rectngle R. The size of the hyper-rectngle R (denoted by s( w ) ) is equl to the L1 norm of the vector v u (or in other words the sum of the lengths of ll its sides). The distnce of n input pttern I = (, c ) from rectngle R (denoted by dis( I, w ) ) is equl to the minimum L1 distnce of from ny point of the rectngle R. Plese refer to Figure 2-3 for n illustrtion of the size of rectngle nd the distnce of n input pttern from this rectngle in the cse where M = 2. 15

31 R dis( I, w ) R dis( I, w ) Figure 2-3: The distnce of n input pttern from the rectngle pttern from the border of the rectngle R is the minimum distnce of the FAM Opertions nd Prmeters FAM cn operte in two distinct phses: the trining phse nd the performnce phse. In the trining phse of FAM list of input ptterns/output lbels, for exmple, {( I, O( I )),...,( I, O( I )),...,( I, O( I 1 1 r r PT PT R ))}, is repetedly presented to FAM until FAM lerns the required mpping. The tsk is considered ccomplished (i.e. lerning is complete) when the weights do not chnge during list presenttion or when mximum number of list presenttions is reched. The performnce phse of FAM works s follows: Given list of input ptterns, such s 1 ~ I, I ~ 2 ~,..., I PS, we wnt to find the FAM output produced when ech one of the forementioned test ptterns is presented t its F 1 lyer. In order to chieve the forementioned gol we present the test list to the trined FAM rchitecture nd we observe the network s output (i.e., lbel). (Appendix B gives step-by-step procedure for FAM trining nd performnce) prmeter The opertion of FAM is ffected by two user defined network prmeters, the choice β, nd the bseline vigilnce prmeter ρ. The choice prmeter β tkes vlues in the intervl ( 0, ), while the bseline vigilnce prmeter ρ ssumes vlues in the intervl [ 0,1]. Both of these prmeters ffect the number of nodes creted in the ctegory 16

32 representtion lyer of FAM. The prmeter β controls the order ccording to which nodes will be ccessed in Fuzzy ARTMAP. At the sme time the prmeter β hs n effect on how mny nodes will be creted in the ctegory representtion lyer of Fuzzy ARTMAP during FAM s trining phse (lrger vlues of β tend to produce lrger number of ctegory nodes in FAM). The prmeter ρ lso hs n effect on the number of nodes creted in the ctegory representtion lyer of Fuzzy ARTMAP (lrger vlues of ρ produce lrger number of nodes in the ctegory representtion lyer). There re two other network prmeter vlues in FAM tht re controlled by the lgorithm, nmely, the vigilnce prmeter ρ, nd the number of nodes N in the ctegory representtion lyer of FAM. The vigilnce prmeter ρ tkes vlue in the intervl [ ρ, 1] nd its initil vlue is set to be equl to ρ. The number of nodes N in the ctegory representtion lyer of FAM corresponds to the number of committed nodes (nodes tht hve estblished connection with nodes in the b F 2 lyer) in FAM plus one uncommitted node. Prior to inititing the trining phse of FAM, the top-down weights (the b w i s) re chosen equl to 1, nd the inter-art weights (the W k s ) re chosen equl to 0. There re three mor opertions tht tke plce during the presenttion of trining r r input/output pir (e.g., ( I, O ) ) to Fuzzy ARTMAP. Opertion 1: Clculting the Ctegory Choice Function (CCF) vlue In this opertion FAM clcultes the ctegory choice function (CCF) vlue (i.e. T ( I) ) for every ctegory in its ctegory representtion lyer F 2, s follows: M s( w ) dis( I, w T ( I) = β + M s( w ) ) 2-2 After clculting the choice function vlues, the node J with the mximum choice function vlue proceeds to opertion 2. 17

33 Opertion 2: Clculting the Ctegory Mtch Function (CMF) Vlue The node J with the lrgest CCF vlue is exmined to determine whether it psses the vigilnce criterion. A node J (ctegory) psses the vigilnce criterion if its ctegory (node) mtch function vlue (i.e., ( ρ ( J I) ) exceeds the vigilnce prmeter vlue ρ, tht is if M s( w J ) dis( I, w ) ρ( J I ) = ρ 2-3 M If the vigilnce criterion is pssed we proceed with opertion 3. Otherwise, node J is disqulified nd we find the next in sequence node in F 2 tht mximizes the CCF vlue. Eventully we will end up with node J tht mximizes the CCF vlue nd stisfies the vigilnce criterion (notice tht this could be n uncommitted node, nd hence incresed by 1). N (number of committed nodes in FAM) get Opertion 3: Mtch Trcking Mechnism/Chnge of the Weights This opertion is implemented only fter we hve found node J tht mximizes the CCF vlue of the remining (in the competition) F 2 nodes nd psses the vigilnce criterion. Opertion 3 determines whether this node J psses the prediction test. The prediction test checks if the inter-art weight vector emnting from node J b (i.e. W = W, W,..., W ) ) mtches exctly the desired output vector O (note tht O is J ( J1 J 2 J, N b the output pttern tht the input pttern I is supposed to be mpped to). If the prediction is stisfied then we sy tht the node pssed the prediction test. If the node does not pss the M s(w J ) prediction test, the vigilnce prmeter ρ is incresed to the level of, node J is M disqulified, nd the next in sequence node J tht mximizes the CCF vlue nd psses the vigilnce is chosen (this ction is referred to s the mtch trcking mechnism). If node J though psses the prediction test, the weights demonstrtes, nd ccording to the following eqution: w J in FAM re modified s Figure

34 w, new = w, old I 2-4 where is the fuzzy min opertor, which outputs vector whose components re equl to the minimum of the corresponding components of its rguments. And b W mx = O b c d e f Figure 2-4: FAM Lerning,. A ctegory with 0 size; b. Introducing new pttern 2; c. The ctegory expnds to include 2; d. Since 3 is inside the ctegory, it doesn t chnge its size; e. Pttern 4 is presented; f. Since 4 is outside the ctegory, the ctegory is expnded to include 4, within its boundries. Figure 2-4 covers ll the possible lerning scenrios in FAM, tht is when ctegory lerns the first input pttern, then s it lerns second input pttern, nd then s it lerns third input pttern for the cse where the third pttern is inside the rectngle nd for the cse where the third input pttern is outside the rectngle tht the ctegory defines. FAM trining is considered complete if nd only if fter repeted presenttions of ll trining input/output pirs to FAM, where Opertions 1-3 re recursively pplied for every input/output pir, we find ourselves in sitution where complete cycle through ll the input/output pirs produced no weight chnges, or if we reched mximum number of list presenttion, through out ll experiments we used 10 for this number. In the performnce phse of FAM only Opertions 1 nd 2 re implemented for every input pttern presented to FAM. By registering the network output to every test input presented to FAM, nd by compring it to the desired output we cn clculte the network s performnce (i.e. Percent Correct Clssifiction or PCC) 19

35 2.2 Ellipsoidl ARTMAP (EAM) Ellipsoidl ARTMAP (EAM) rchitecture is very similr to tht of FAM. The mor difference between EAM nd FAM is tht EAM covers the spce of the input ptterns with ellipsoids insted of rectngles (hyper-ellipsoids in problems with more thn two dimensions). The EAM rchitecture, like the FAM rchitecture, consists of three lyers or fields of nodes (Figure 2-5). b F 2 b W F 2 w F 1 I = Figure 2-5: Simple EAM Architecture These re the input lyer ( F 1 ), the ctegory representtion lyer ( F 2 ), nd the b output lyer ( F 2 ). The input lyer of EAM is the lyer where inputs re pplied. An input pplied to F 1 is vector I of dimensionlity M of the following form, I = = ( 1,..., ); 1 i M 2-5 i M where is vector whose components lie in the intervl (, + ). Thus, lyer tht contins F 1 is lyer M nodes, one node for ech component of the input pttern I. The index i ( 1 i M ) designtes generic node in lyer F 1. The lyer F 2 of EAM is referred to s the ctegory representtion lyer, this lyer contins ll the ctegories (or groups) of input 20

36 ptterns in the network. Finlly, the output lyer (lyer b F 2 ) is the lyer where the outputs of the network cn be found. Nodes in the output lyer of EAM represent lbels or clsses of the pttern recognition tsk. The index k 1 k N ) designtes generic node in F b 2 ; ( b N b is the number of ll clsses in the problem domin nd so is the highest index in this clssifiction tsk. EAM s in FAM, stores the lerned knowledge in its interconnection weights: () The vector of weights w = m, d, r ), clled templte, where m, d r represent the center, (, direction, nd rdius of the mor xis of the ellipsoid tht node cretes, whose components emnte from node in F 2 nd converge to ll the nodes in F 1 ; w encompsses the group of input ptterns tht were selected by this node in the ctegory representtion lyer of b EAM, nd (b) the vector of weights, denoted by W = W, W,..., W ), emnting from every node in the the components of in ( 1 2, N F 2 lyer of EAM nd converging to ll the nodes in b W F 2 is mpped to lbel k in F b 2. re equl to 0, except componentw b k b b F 2. In EAM, if ll, it is n indiction tht node EAM Ctegory Geometricl Representtion As it ws pointed out erlier, the mor difference between FAM nd EAM lies in the shpe of their ctegories (FAM uses hyper-rectngles, while EAM uses hyper-ellipsoids). In both cses though the structure creted tht corresponds to ctegory (in FAM or EAM), encloses within its boundries ll the input ptterns tht used nd were encoded by this ctegory. In figure 2-6 2D hyper-ellipsoid ctegory is shown s including within its boundries 3 input ptterns tht it hd encoded lredy. It is very obvious tht the ellipsoid grows to include the ptterns tht it encodes. 21

37 I 3 µr I 2 d r m I 1 Figure 2-6: An EAM ctegory tht encodes 3 ptterns The size of n EAM ctegory is defined s the mximum Mhlnobis distnce between two ptterns inside the representtion region of the ctegory, since this distnce equls 2 r, then the size of ctegory is s( w ) = 2r. The distnce in EAM is defined s the minimum distnce between the pttern I nd the boundries of the ctegory if the pttern is outside the ctegory, otherwise the distnce equls zero, nd hence, the distnce of n input pttern I from ctegory represented by w is dis I =, r } r 2-6 (, w ) mx{ I m C 1 µ µ 2 2 T nd I m I m [ d I ] 2 C = 2 (1 ) ( m ) 2-7 where C is the shpe mtrix of n EAM ctegory, µ is the rtio of the minor xis of the hyper-ellipsoid to its mor xis nd. 2 is the Euclidin ( L 2 ) norm of its rgument vector EAM Opertions nd Prmeters EAM cn operte in two distinct phses: the trining phse nd the performnce phse. In the trining phse of EAM/EAM list of input ptterns/output lbels, for exmple, {( I, O( I )),...,( I, O( I )),...,( I, O( I 1 1 r r PT PT ))}, is repetedly presented to EAM until EAM lerns the required mpping. The tsk is considered ccomplished (i.e., the lerning is 22

38 complete) when the weights do not chnge during list presenttion or user defined mximum number of list presenttions is reched. The performnce phse of EAM works s follows: Given list of input ptterns, such s 1 ~ I, I ~ 2 ~,..., I PS, we would like to find the EAM output produced when ech one of the forementioned test ptterns is presented t its F 1 lyer. By presenting the test list to the trined EAM rchitecture nd observing the network s output (i.e., lbel) the forementioned gol could be reched. Few prmeters ffect the opertion of EAM. Among these prmeters re ( β (choice prmeter) nd ρ (bseline vigilnce prmeter)) tht ffect the performnce of FAM s well. The other two re: µ, which is the minor-to-mor xis length rtio (common for every EAM ctegory), nd D which corresponds to FAM's M vrible nd it should be greter thn zero. The choice nd vigilnce prmeters ffect the number of nodes creted in the ctegory representtion lyer of FAM. The prmeter β controls the order ccording to which nodes will be ccessed in EAM. At the sme time the prmeter β hs n effect on how mny nodes will be creted in the ctegory representtion lyer of EAM during EAM s trining phse (lrger vlues of β tend to produce lrger number of ctegory nodes in EAM). The prmeter ρ lso hs n effect on the number of nodes creted in the ctegory representtion lyer of Ellipsoidl (lrger vlues of ρ produce lrger number of nodes in the ctegory representtion lyer). The prmeter µ rnges from 0 to 1, nd finlly D is chosen equl to M µ. Note, tht the prmeter D lso ffects the number of nodes creted in the ctegory representtion lyer of EAM (smller vlues of D tend to produce more nodes in the ctegory representtion lyer of EAM, nd consequently result in less compression of the input ptterns presented in EAM). There re two other network prmeters in EAM tht the networking uses internlly, these re the vigilnce prmeter ρ, nd the number of nodes 23

39 N in the ctegory representtion lyer of EAM. The vigilnce prmeter ρ tkes vlue in the intervl [ ρ, 1] nd its initil vlue is set to be equl to ρ. The number of nodes N in the ctegory representtion lyer of EAM corresponds to the number of committed nodes in EAM plus one uncommitted node. Prior to inititing the trining phse of EAM, m, d, r re chosen equl to 0, nd the inter-art weights (thew b k s ) re chosen equl to 0 too. There re three mor opertions tht tke plce during the trining phse of Ellipsoid ARTMAP. Opertion 1: Clculting the Ctegory Choice Function (CCF) vlue To select the wining ctegory the network should clculte the ctegory (node) choice function (CCF) vlue (i.e., T ( I) ) for every node (ctegory) in s follows: F 2, which is performed D s( w ) dis( I, w T ( I) = β + D s( w ) ) 2-8 After clculting the choice function vlues for every node in the ctegory representtion lyer of EAM, the node J with the mximum choice function vlue is chosen, nd EAM proceeds with opertion 2. Opertion 2: Clculting the Ctegory Mtch Function (CMF) Vlue After finding the node J with the lrgest CCF vlue, this node is exmined to determine whether it psses the vigilnce criterion. A node J (ctegory) psses the vigilnce criterion if its ctegory (node) mtch function vlue (i.e., ( ρ ( J I) ) exceeds the vigilnce prmeter vlue ρ, tht is if D s( w J ) dis( I, w ) ρ ( I ) = > ρ 2-9 D 24

40 If the node psses the vigilnce criterion the third opertion strts. Otherwise, node J is disqulified nd we find the next in sequence node in F 2 tht mximizes the CCF vlue. Eventully we will end up with node J tht mximizes the CCF vlue nd stisfies the vigilnce criterion, which could be n uncommitted node. Opertion 3: Mtch Trcking Mechnism/Chnge of the Weights Opertion 3 determines whether the chosen node J psses the prediction test. The prediction test is simply comprison between the inter-art weight vector emnting from b node J (i.e. W = W, W,..., W ) ) nd the desired output vector O. If they re equl, J ( J1 J 2 J, N b this is referred to s the node pssing the prediction test. If the node does not pss the D s(w prediction test, the vigilnce prmeter ρ is incresed to the level of J ), node J is D disqulified, new serch for the next node J tht mximizes the CCF vlue nd psses the vigilnce strts (this ction is referred to s the mtch trcking mechnism). If node J psses the prediction test, the weights w J in EAM re modified ccording to the following equtions nd explined pictorilly in figure 2-7. m nd r new new = m old γ min old old { r, I m old } I m old C old old old old ( mx{ r, I m old } r ) old γ = r + 2 C C ( I m old ) Where γ is the lerning fctor, in our experiments we used, fst lerning (i.e. γ =1). 25

41 I 4 I 4 I 2 I 2 I 2 d I 2 I 1 I1 d m I 1 I 3 d m I 1 I 3 d m I 1 m I 3 m = I 1 Figure 2-7: Cretion nd expnsion of n EAM ctegory Figure 2-7 covers ll the possible scenrios when node J first lerns the first input pttern, then s it lerns second input pttern, nd then s it lerns third input pttern, for the cse where the third pttern is inside the ellipsoid, the ellipsoid does not get updted nd hence stys the sme. EAM trining is considered complete if nd only if fter repeted presenttions of ll trining input/output pirs to EAM, where Opertions 1-3 re recursively pplied for every input/output pir, we find ourselves in sitution where complete cycle through ll the input/output pirs produced no weight chnges. As in FAM, in the performnce phse of EAM only Opertions 1 nd 2 re implemented for every input pttern presented to EAM. By registering the network output to every test input presented to EAM, nd by compring it to the desired output we cn clculte the network s performnce (i.e. Percent Correct Clssifiction or PCC) 2.3 Gussin ARTMAP (GAM) Similr to FAM nd EAM, the GAM rchitecture consists of three lyers or fields of nodes (see Figure 2-5), The input lyer ( F 1 ), the ctegory representtion lyer ( b the output lyer ( F 2 ). The input lyer of GAM is the lyer where inputs re pplied. An input pttern pplied to F 2 F 1 is vector I (of dimensionlity M ) of the following form, ), nd I = = ( 1,..., ); 1 i M 2-12 i M 26

42 Where is vector whose components lie in the intervl (, + ). Thus, lyer tht contins F 1 is lyer M nodes, one node for ech component of the input pttern I. The index i 1 i M ) designtes generic node in lyer ( F 1. The lyer F 2 of GAM is referred to s the ctegory representtion lyer, becuse this is where ctegories (or groups) of input ptterns re formed. Finlly, the output lyer (lyer b F 2 ) is the lyer tht produces the outputs of the network. Every node in the output lyer of GAM represents one of the lbels of the pttern recognition tsk. The index k 1 k N ) designtes generic node ( b in F b 2 ; N represents the number of the different clsses in the problem, nd is the highest b index needed to represent ll the lbels of the pttern clssifiction tsk t hnd. Using its interconnections weights, GAM stores the lerned knowledge. There re two types of GAM weights: () The vector of weights w = µ, σ, n ), clled templte, whose components emnte from node in is n ( F 2 nd converge to ll the nodes in M -dimensionl vector, with components equl to the verge vlues of the F 1. The vector µ components of the input ptterns tht ccessed nd were encoded by ctegory. The vector σ is n M -dimensionl vector, with components equl to the stndrd devition of the components of the input ptterns tht ccessed nd were encoded by ctegory. Finlly, sclr tht is equl to the number of input ptterns tht ccessed nd were encoded by b ctegory in GAM, nd (b) the vector of weights, denoted by W = W, W,..., W ), emnting from every node in the b F 2. In GAM, if ll the components of indiction tht node in ( 1 2, Nb F 2 lyer of GAM nd converging, to ll the nodes in W re equl to 0, except componentw b F 2 is mpped to lbel k in F b 2. b k, it is n n is Contrry to FAM nd EAM, GAM does not crete enclosed structures (such s hyperrectngles or hyper-ellipsoids) tht contin within their boundries ll the input ptterns tht 27

43 were encoded by these structures. In GAM ctegory is represented by Gussin (bellshped) curve, whose men vector nd the stndrd devition vector corresponds to the men nd stndrd devition vector of ll the input ptterns tht chose nd were encoded by this ctegory in GAM s trining phse. Furthermore, every GAM ctegory hs nother prmeter n, ssocited with it, tht is equl to the number of ptterns tht were encoded by this ctegory, nd s such defines how importnt this bell-shped curve is in representing the input ptterns tht re presented to GAM. For instnce, see Figure 2-8, where few input ptterns re shown (in one dimension) nd the Gussin curve tht they define is depicted. Figure 2-8: A 2D GAM ctegory tht encodes 5 ptterns within 2 stndrd devitions GAM Opertions nd Prmeters In the trining phse of GAM list of input ptterns/output lbels, for exmple, {( I, O( I )),...,( I, O( I )),...,( I, O( I 1 1 r r PT PT ))}, is repetedly presented to GAM until GAM lerns the required mpping. The tsk is considered done (i.e., the lerning is complete) when user defined mximum number of presenttions is reched. The performnce phse of GAM works s follows: We hve list of input ptterns, such s 1 ~ I, I ~ 2 ~,..., I PS, nd we wnt to find the GAM output produced when ech one of the forementioned test ptterns is presented t its F 1 lyer of GAM. To ttin the forementioned gol we present the test list to the trined GAM rchitecture nd we observe the network s output (i.e., lbel). The opertion of GAM is ffected by two network prmeters, the initil stndrd devition prmeterγ, nd the bseline vigilnce prmeter ρ. When GAM ctegory 28

44 encodes n input pttern for the first time its stndrd devition vector of the Gussin curve tht it defines hs components tht re ll equl to the initil stndrd devition prmeterγ. Both prmeters γ nd the bseline vigilnce prmeter ρ ssume vlues in the intervl [0, 1]. Both of these prmeters ffect the number of nodes creted in the ctegory representtion lyer of GAM. Higher vlues of ρ crete more nodes in the ctegory representtion lyer of Gussin ARTMAP, nd consequently produce less compression of the input ptterns. There re two other network prmeter vlues in GAM tht re worth mentioning, such s the vigilnce prmeter ρ, nd the number of nodes N in the ctegory representtion lyer of GAM. The vigilnce prmeter ρ tkes vlue in the intervl [ ρ, 1] nd its initil vlue is set to be equl to ρ. The number of nodes N in the ctegory representtion lyer of GAM corresponds to the number of committed nodes in GAM plus one uncommitted node. Prior to inititing the trining phse of GAM µ is set to 0's nd σ is chosen equl to γ, nd the inter-art weights (the b W k s ) re chosen equl to 0. There re three mor opertions tht tke plce during the presenttion of trining input/output pir r r (e.g., ( I, O ) ) to Gussin ARTMAP. Opertion 1: Clculting the Ctegory Choice Function (CCF) vlue Clcultion of the ctegory (node) choice function vlue (i.e., T ( J I) ) for every node (ctegory) in F 2, is s follows: P( I / ) P( ) P ( I) = 2-13 P( I) where P(I ) is the conditionl density of I given equls P( I M 1 1 µ i I ) = exp M / 2 M (2π ) t= 1 σ i 2 i= 1 σ i i nd the priori probbility of is 29

45 n P( ) = N 2-15 n = 1 where N is the number of ctegories in the system. Since P(I) is the sme for ll ctegories it is ignored nd hence the eqution of T ( I) is M M M / 2 1 µ i Ii ( 2π ) P( I ) P( ) ) = log T ( I ) = log σ + i log( P( )) i= 1 σ i i= 1 After clcultion of the choice commitment function vlues the node J with the mximum choice commitment function vlue is chosen. Opertion 2: Clculting the Ctegory Mtch Function (CMF) Vlue The node J with the lrgest CCF vlue is exmined to determine whether it psses the vigilnce criterion. A node J (ctegory) psses the vigilnce criterion if its ctegory (node) mtch function vlue (i.e., ( ρ ( J I) ) exceeds the vigilnce prmeter vlue ρ, tht is if M ρ( I ) = log > ρ 2-17 M / 2 ( 2π ) σ ( ) i P I i= 1 If the vigilnce criterion is pssed we proceed with opertion 3. Otherwise, node J is disqulified nd we find the next in sequence node in 2 F 2 tht mximizes the CCF vlue. Eventully we will end up with node J tht mximizes the CCF vlue nd stisfies the vigilnce criterion. Opertion 3: Mtch Trcking Mechnism/Chnge of the Weights This opertion is implemented only fter we hve found node J tht mximizes the CCF vlue of the remining (in the competition) F 2 nodes nd psses the vigilnce criterion. Opertion 3 determines whether this node J psses the prediction test. The prediction test checks if the inter-art weight vector emnting from node J (i.e., W b J = ( W 1, W 2,..., W, J J J Nb ) ) mtches exctly the desired output vector O (if it does, this is 30

46 referred to s the node pssing the prediction test ). If the node does not pss the prediction test, the vigilnce prmeter ' ρ is incresed to the level of ( I), node J is disqulified, nd g J the next in sequence node J tht mximizes the CCF vlue nd psses the vigilnce is chosen (this ction is referred to s mtch trcking mechnism). If node J though psses the prediction test, the weights w J in GAM re modified in wy tht includes this pttern in the ctegory. The lgebric equtions tht define the lerning tht follows: w J undergoes re s n : n J = J n 1 1 µ : = (1 ) µ + n I 2-19 J J J J (1 n + n I if n = > J ) σ Ji J ( µ Ji i) J 1 σ Ji : γ otherwise 2-20 GAM trining is considered complete if nd only if fter repeted presenttions of ll trining input/output pirs to GAM, where Opertions 1-3 re recursively pplied for every input/output pir, we rech the mximum number of list presenttions defined by the user. In the performnce phse of GAM only Opertions 1 nd 2 re implemented for every input pttern presented to GAM. By registering the network output to every test input presented to GAM, nd by compring it to the desired output we cn clculte the network s performnce (i.e. Percent Correct Clssifiction or PCC) 2.4 Genetic Algorithms One of the most widely used EA s techniques is the Genetic Algorithm. Experiments hve showed tht GAs re very powerful serching techniques. They were used successfully in mny res, such s optimiztion (Annie S. Wu, et l., 2004), scheduling (Annie Wu, Hn Yu, et l., 2004), pttern recognition (Auwtnmongkol, S., 2000), robot control (Dvidor 31

47 Y., 1990), nd mny others. The implementtion of GA follows the sme steps of the generl frmework for EA s. A GA hs the following components: Popultion of Chromosomes: Chromosomes represent solutions to the problem t hnd. Chromosomes cn be encoded in mny different wys the most fmous of which is the binry encoding. Fitness Function: Used to evlute the chromosomes, where ech chromosome s evlution determines how close or fr this solution is from the optiml. Selection Function: Used to select prent chromosomes (exploit prtil solutions). Genetic Opertors: Exchnge nd modify knowledge mongst the selected prent chromosomes to crete new offspring (explores the serch spce) Chromosome Representtion Mny different chromosome representtion pproches hve been introduced in the literture, of which the following re the most common (Mitchell T., 1997): Binry: Uses string of bits to represent the solution, ech gene is represented by its equivlent binry code nd ll genes re then rrnged in one string of bits. Loction Independent:. Messy: uses string of cells, ech cell consists of the loction nd the vlue of the bit. b. Floting Representtion: uses building blocks tht consist of tg nd body where the tg is building block identifier nd the body is the ctul building block ( group of bits dcent to the tg). Redundnt: Every building block is repeted mny times in the chromosome. Non-Coding: In this scheme chromosome consists of building blocks long with set of non-coding blocks. Integer: Uses integer insted of binry numbers. Floting Point: Uses rel numbers insted of binry numbers. 32

48 Problem Specific: Uses structures tht re problem dependent, nd it could use ny of the bove representtions or it might use mixture of ny of them. All of the bove representtions hve dvntges nd disdvntges (the interested reder could refer to (Mitchell T., 1997) to get more informtion bout the pros nd cons of ech representtion method) Genetic Opertors The purpose of genetic opertors is to explore new regions in the solution spce by discovering new solutions tht preserve some of the current knowledge. Two well known opertors re used in GA:. Crossover: A method to exchnge informtion mong two (or more) prent chromosomes. In order to produce n offspring (or more) from the prents, it uses rndom numbers to select crossing points. Three different types of crossover re common: One-point: Selects crossing point t rndom from ech prent nd exchnges the subsections of both chromosomes fter the crossover point. Prent Prent b Figure 2-9: One-point crossover Two-point/multi-point: Selects two/more rndom crossover points nd exchnges subsections of both chromosomes (see Exmple in Figure 2-10). If more thn two crossover points re chosen the sme principle pplies. Prent Prent b Figure 2-10: Two-point crossover 33

49 Uniform: This method uses pre-specified probbility tht ech bit will be swpped, which is similr to using msk of bits where 0 mens no swp nd 1 mens swp or vice vers. Msk Prent Prent b Figure 2-11: Uniform crossover Ech of the bove types hs dvntges nd disdvntges (for more informtion see Mitchell T., 1997). b. Muttion: Muttion is method of creting new offspring by modifying the prent. Binry muttion is quite simple, it is done by flipping bit from 0 to 1, or the other wy round, ccording to specific probbility. Floting point muttion cn be ccomplished in more wys thn one, one of them is to dd rndomly selected number (could be from Gussin Distribution) to rndomly selected gene. It is importnt to mention tht low muttion rte results in less explortion, while high muttion rte could be disruptive. The crossover nd muttion opertors hve mny vritions reported in the literture (see T.B ck, et l., 1997, nd T. Mitchell, 1997). Also it is importnt to mention tht some other genetic opertors, beyond crossover nd muttion, were used in the GA literture (for n exmple see Ling-Hsun, et l., 2002) Selection Selection is the process of choosing prents to crete new offspring from. This process directs the serch towrd the promising res of the solution surfce nd is usully bsed on the fitness of the individul chromosomes. Mny selection methods were introduced in the literture, some of which re reported below: 34

50 Fitness Proportionl Selection: every individul gets probbility of being selected bsed on the rtio of its fitness to the verge fitness of ll individuls. Stochstic Universl Smpling: divides wheel spin into N eqully spced mrkers, nd then uses only one rndomly generted number to select the prent. Sigm Scling: The vlue of the fitness is modified ccording to the following eqution f ' = f ( f c * σ ) (where f is the originl fitness vlue, f is the men, σ is the stndrd devition nd c is constnt) to either eliminte lrge difference in the fitness vlues or to show n un cler difference in the fitness vlues, nd hence, mintins selection pressure over the length of run, thus minimizing the ffects of convergence on reproductive selection. Rnk Selection: Rnks the individuls ccording to their fitness nd then clculte the number of offspring bsed on the rnk rther thn the fitness. Tournment Selection: Selects t rndom two individuls, nd then genertes nother rndom number. If this rndom number is greter thn specific vlue choose the individul with higher fitness, otherwise choose the individul with less fitness. In this pproch more thn two individuls could be selected. Elitism: Preserve the best performers from the current genertion, nd then uses nother selection method to generte the rest of the individuls. The interested reder cn find more informtion in J. H. Hollnd, 1975, Bker, J., 1987 nd (Mitchell T., 1997 bout these nd other selection strtegies. 35

51 3. GENETIC FUZZY ARTMAP (GFAM) GFAM (Genetic Fuzzy ARTMAP) is n evolved FAM network tht is produced by pplying, repetedly, genetic opertors on n initil popultion of trined FAM networks. GFAM uses tournment selection with elitism, s well s genetic opertors, including crossover nd muttion. In ddition, GFAM uses two specil opertors, Ct dd nd Ct del. To better understnd how GFAM is designed we resort to step-by-step description of this design. It is instructive though to first introduce some terminology tht is included in Appendix A. The design of GFAM cn be rticulted through sequence of steps, defined succinctly below, nd explined in detil lter. Step 1: Initilize Pop size number of FAM networks, ech one of them operting with different vlue for the bseline vigilnce prmeter ρ, nd different orders of trining pttern presenttion. Step 2: Trin ech one of the mximum number of itertions. Step 3: Convert the so tht no one-point ctegories exist. Pop size initilized FAM networks, using the trining set for Pop size trined FAM networks into chromosomes. Crop ll chromosomes Step 4: Evolve the chromosomes of the current genertion by executing the following substeps: Sub-Step 4: Clculte the fitness for ll chromosomes of the current genertion. Sub-Step 4b: Initilize n empty genertion (referred to s temporry genertion). Sub-Step 4c: Move the temporry genertion. NC best best chromosomes from the current genertion to the Sub-Step 4d: Select chromosomes for crossover from the current genertion to populte the reminder of the temporry genertion. 36

52 Sub-Step 4e: With probbility P( Ctdd ) pply the Ctdd opertor to every individul generted in sub-step 4d. Sub-Step 4f: With probbility P ( Ct del ) pply the Ct del opertor to every individul generted in sub-step 4e. Sub-Step 4g: With probbility P (Mut) pply the muttion opertor to every individul generted in sub-step 4f. Sub-Step 4h: Replce the current genertion with the members of the temporry genertion Step 5: If evolution hs reched the mximum number of itertions, Gen mx, then clculte the performnce of the best-fitness FAM network on the test set nd report clssifiction ccurcy nd number of ctegories tht this best-fitness FAM network possesses. If the mximum number of itertions hs not been reched yet, go to step 4 to evolve one more popultion of chromosomes Ech one of the forementioned steps of the lgorithm is now described in more detil, s needed. Step 1 (More Detils): The lgorithm strts by trining Pop FAM networks, ech one of them trined with different vlue of the bseline vigilnce prmeter ρ. In prticulr, we size first define ρ inc mx min ρ ρ = Pop 1 size, nd then the bseline vigilnce prmeter of every network is inc determined by the eqution ρ min + i * ρ, where } i { 0, 1,..., Popsize 1. In our experiments with GFAM we chose ρ min = 0. 1, nd ρ mx = Menwhile, GFAM llows the user to chnge the order of pttern presenttion utomticlly nd rndomly. Step 2 (More Detils): We ssume tht the reder is fmilir with how trining of FAM networks is ccomplished, nd thus the detils here re omitted. 37

53 Step 3 (More Detils): Once the Pop size networks re trined, they need to be converted to chromosomes so tht they cn be mnipulted by the genetic opertors. GFAM uses rel number representtion to encode the networks. Ech FAM chromosome consists of two levels, level 1 contining ll the ctegories of the FAM network, nd level 2 contining the lower nd upper endpoints of every ctegory in level 1, s well s the lbel of tht ctegory (see Figure 3-1). We denote the ctegory of trined FAM network with index p 1 p Pop ) by c w ( p), where w ( p) = ( u ( p), ( v ( p) ) nd the lbel of this ctegory ( size by l ( p) for 1 N ( p). Chromosome p w ( ) w ( ) w ( p) ( p) 1 p 2 p w N Level 1 u ( p) v ( p) l ( p) Level 2 Figure 3-1: GFAM chromosome structure In this step, we eliminte ll single-point ctegories in the trined FAM networks, referred to s cropping the chromosomes. Since our ultimte obective is to design FAM network tht reduces the network size nd improves generliztion we discourge t this stge the cretion of single-point ctegories. Our experiments hve shown tht cropping single-point ctegories is beneficil becuse it speeds-up the convergence of the GA. Step 4 (More Detils): In this step the GFAM pplies GA to the popultion of trined FAMs. Sub-step 4 (More Detils): Clculte the fitness of ech chromosome (trined FAM). This is ccomplished by feeding into ech trined FAM the vlidtion set nd by clculting the percentge of correct clssifiction exhibited by ech one of these trined FAM networks. In prticulr, if PCC( p) designtes the percentge of correct clssifiction, exhibited by the p-th 38

54 FAM, nd this FAM network possesses N ( p) nodes in its ctegory representtion lyer, then its fitness function vlue is defined by: ( Ctmx N ( p)) PCC ( p) Fit( p) = 100 PCC( p) + ε Ct N ( p) min where, Ct Ct min nd mx re the minimum nd mximum number of ctegories tht FAM network is llowed to hve during the evolutionry process ( Ct min is chosen equl to 1, or equl to the number of clsses in the clssifiction problem under considertion, while Ctmx is chosen to be reltively lrge number for the clssifiction problem t hnd). The constnt ε in the denomintor of the bove eqution is smll positive constnt nd it is needed to mke sure tht the denomintor would not be zero in the cse when N ( p) = Ct nd PCC ( p) = 100. min Sub-step 4b (More Detils): Obvious, no further explntions re needed. Sub-step 4c (More Detils): The lgorithm serches for the best the current genertion nd copies them to the temporry genertion. NCbest chromosomes from Sub-step 4d (More Detils): The remining Pop NC chromosomes in the temporry size best genertion re creted by crossing over pirs of prents from the current genertion. The prents re chosen using deterministic tournment selection, s follows: Rndomly select two groups of four chromosomes ech from the current genertion, nd use s prent, from ech group, the chromosome with the best fitness vlue in the group. If it hppens tht from both groups the sme chromosome is chosen then we choose from one of the groups the chromosome with the second best fitness vlue. If two prents with indices p, p re crossed over two rndom numbers n, n re generted from the index sets { 1, 2,..., N ( p)} nd { 1, 2,..., N ( p )}, respectively. Then, ll the ctegories with index greter thn index n in the chromosome with index p nd ll the ctegories with index less 39

55 thn index n in the chromosome with index p re moved into n empty chromosome within the temporry genertion. Notice tht crossover is done on level 1 of the chromosome. This opertion is pictorilly illustrted in Figure 3-2. n p w ( ) w ( ) w ( ) w ( ) w ( ) 1 p 2 p 3 p 4 p 5 p n' w ( ) w ( ) w ( ') w ( ') 1 p 2 p 4 p 5 p p w ( ') w ( ') w ( ') w ( ') w ( ') 1 p 2 p 3 p 4 p 5 p Sub-step 4e (More Detils): The opertor Figure 3-2: Crossover implementtion Ctdd dds new ctegory to every chromosome creted in step 4d with probbility P Ct ). The new ctegory hs lower nd upper endpoints ( dd u, v tht re rndomly generted s follows: For every dimension of the input feture spce ( M dimensions totl) we generte two rndom numbers uniformly distributed in the intervl [0, 1]; the smllest of the two rndom numbers is ssocited with the u coordinte long this dimension., while the lrgest of these numbers is ssocited with the v coordinte long this dimension. The lbel of this newly creted ctegory is chosen rndomly mongst the N b ctegories of the pttern clssifiction tsk under considertion. A chromosome does not dd ctegory if the ddition of this ctegory cuses the number of ctegories for this chromosome tht exceed the designted mximum number of ctegories, Ct. mx Sub-step 4f (More Detils): The opertor Ct del deletes one of the ctegories of every chromosome creted in step 4e with probbility P Ct ). A chromosome does not delete ( del 40

56 ctegory if the deletion of this ctegory results in the number of ctegories for this chromosome to fll below the designted minimum number of ctegoriesct. Sub-Step 4g (More Detils): In GFAM, every chromosome creted by step 4f gets mutted s follows: with probbility P (mut) every ctegory is mutted. If ctegory is chosen to be mutted, either its u or v endpoints is selected rndomly (50% probbility) nd then every component of this selected vector gets mutted by dding to it smll number. This number is drwn from Gussin distribution with men 0 nd stndrd devition If the component of the chosen vector becomes smller thn 0 or greter thn 1 (fter muttion), it is set bck to 0 or 1, respectively. Notice tht muttion is pplied to level 2 of the chromosome structure. The lbel of the chromosome is not mutted becuse our initil GA popultion consists of trined FAMs, nd consequently we hve lot of confidence in the lbels of the ctegories tht these trined FAMs hve discovered through the FAM trining process. Sub-Step 4h (More Detils): Obvious, no more detils re needed. Step 5 (More Detils): Obvious, no more detils re needed. min 3.1 Justifiction of the Evolutionry Choices for GFAM Justifiction of the Fitness Function Choice for GFAM We hve chosen to use fitness function tht is provided by eqution 3-1: 2 ( Ctmx N ( p)) PCC ( p) Fit( p) = 100 PCC( p) + ε Ct N ( p) As reminder, nd min Ct mx is the mximum number of ctegories tht n evolved FAM cn hve, Ct min is the minimum number of ctegories tht n evolved FAM cn hve. The prmeter Ct min is chosen to be equl to 1. It seems tht more nturl choice would hve been to choose Ct min equl to the number of different clsses in the problem t hnd; 41

57 however this choice, lthough it mkes GFAM converge fster to solution, it occsionlly compromises the generliztion. The prmeter PCC ( p) is the percentge of correct clssifiction of n evolved FAM on the vlidtion set, nd N ( p) is the ctul number of ctegories of the evolved FAM. Finlly, ε is smll positive number. The chosen fitness function hs number of good properties. First, it depends on both mesures of performnce, size of the FAM network nd ccurcy on the vlidtion set. It depends on the ccurcy in wy tht higher ccurcy leds us to lrger fitness vlues, s figures 3-3 nd 3-3b to 3-3e demonstrte. It depends on size in wy tht smll size leds us to lrger fitness vlues, everything else kept fixed, s figures 3-3 nd 3-3f to 3-3i demonstrte. Note tht if the size decreses the numertor of the fitness increses nd the denomintor decreses, provided tht everything else is kept fixed. Similrly, if the ccurcy increses the numertor increses nd the denomintor decreses, provided tht everything else is kept fixed. It is lso worth noting tht when the size is equl to the minimum size nd the ccurcy is equl to the highest ccurcy the denomintor of the fitness function prcticlly pproches zero nd the fitness function ssumes very high vlue s plots 3-3b through 3-3e illustrte. Hence, the fitness function shows strong preference towrds the cretion of minimum size nd highest ccurcy networks, s it should. Finlly, in ddition to the 2-d plots in figures 3-3f to 3-3i (plots of fitness versus ccurcy for different size networks), nd the 2-d plots in figures 3-3f to 3-3i (plots of fitness versus size for different ccurcy networks), 3-d plot tht shows the fitness vlues s both size nd ccurcy re chnging is provided in Figure

Figure3-3: 3D plot of log(fit(p)) log(fit(p)) 22 20 18 16 14 12 10 8 N = 2 N = 8 N = 20 N = 40 N = 100 6 20 30 40 50 60 70 80 90 100 PCC(p)

58 Figure3-3: 3D plot of log(fit(p)) log(fit(p)) N = 2 N = 8 N = 20 N = 40 N = PCC(p) log(fit(p)) N = 4 N = 16 N = 40 N = 80 N = PCC(p) Figure 3-3b: CATmin = 2 Figure 3-3c: CATmin = 4 43

59 N = 6 N = 24 N = 60 N = 120 N = N = 10 N = 40 N = 100 N = 200 N = log(fit(p)) log(fit(p)) PCC(p) PCC(p) Figure 3-3d: CATmin = 6 Figure 3-3e: CATmin = PCC = 95 PCC = 85 PCC = 75 PCC = 55 PCC = PCC = 95 PCC = 85 PCC = 75 PCC = 55 PCC = 35 log(fit(p)) 11 log(fit(p)) N(p) N(p) Figure 3-3f: CATmin = 2 Figure 3-3g: CATmin = PCC = 95 PCC = 85 PCC = 75 PCC = 55 PCC = PCC = 95 PCC = 85 PCC = 75 PCC = 55 PCC = 35 log(fit(p)) log(fit(p)) N(p) N(p) Figure 3-3h: CATmin = 8 Figure 3-3i: CATmin = Justifiction of the Genetic Opertors Choices for GFAM In the previous section, we hve introduced number of typicl genetic opertors, such s muttion nd crossover. We hve lso introduced two genetic opertors, Ctdd nd Ct del, tht re more pertinent to the type of problem on which we focusing (co-optimize the 44

60 number of ctegories nd generliztion performnce of the GFAM network tht the evolution of the trined FAM rchitectures produces). These opertors were explined in the previous section. In this section, we re focusing on the ustifiction of why these specil genetic opertors ( Ctdd ndct del ) re needed for the evolution of FAM rchitectures. We lso provide good defult vlues for the Pr( Ct ), Pr( Ct ), Pr( mut) probbilities. dd Our pproch cn be summrized s follows. We hve chosen three clssifiction problems to work with tht re described below (we refer to these problems s Problems 1, 2, nd 3). For ech one of these problems we generted number of trined FAM rchitectures nd we evolved these rchitectures for number of genertions. In prticulr, we generted 20 trined FAMs nd evolved them for 500 genertions (Experiment 1), we lso generted 40 trined FAMs nd we evolved them for 250 genertions (Experiment 2), nd we finlly generted 100 trined FAMs nd we evolved them for 100 genertions (Experiment 3). Hence in ll of these experiments the product of trined FAMs nd number of genertions tht these trined FAMs were evolved ws constnt (i.e., Pop size Gen = constnt = 10,000). Note tht ech one of the experiments 1, 2, nd 3 ws run 50 times with different rndom seed ech time, nd the verge fitness vlue of the best FAM trined network over these 50 runs ws reported. For ech one of these (problem, experiment) pir, we used three different vlues for ech of the probbilities P (mut), P Ct ) nd P Ct ) del ( dd specificlly, for the muttion probbility, we used the following three mx ( del. More vlues: 0, 1/ N, min(5/ N,1), where N represents the number of ctegories tht the FAM network possesses. The muttion probbility of 5 / N seems to be lrge, especilly when N is smll, but the muttion effect on the ctegories is firly smll. This is due to the fct tht 95% of the rndom numbers, indicting how much ctegory should be mutted, lie within n intervl whose endpoints re 0.01 wy from zero. For P Ct ), we used the following ( dd 45

61 three vlues: 0, 0.1 nd 0.3. For the P Ct ), we used the following three vlues: 0, 0.1, nd ( del 0.3, It is obvious tht if P ( Ct del ) = 0 then the Ctdel opertor is not used in the evolution of FAMs. Similrly, if P( Ct dd ) = 0 then the Ct dd opertor is not used in the evolution of FAMs. The vlues used for the muttion probbility, ctegory dd probbility nd ctegory delete probbility re depicted in Tble 3-1. In Tble 3-2, we re depicting (in tbulr form) the number of simultion runs, the number of genertions per simultion run, the number of FAMs in the initil popultion of ech of the simultion runs, nd the number of combintions of probbilities for muttion, delete ctegory nd dd ctegory tht were tested for ech one of the simultion runs. Note tht for ech set of P (mut), P Ct ) nd ( dd P ( Ctdel ) vlues, we performed 50 simultion runs for () Gen mx = 500, Popsize = 20, (b) Gen 250, Pop 40, nd (c) Gen 100, Pop 100. Hence, we evolved the mx = size = mx = size = trined FAM networks totl of 1, 350 (=50 x 27) times for ech one of the experiments mentioned bove, nd 4,050 for ech one of the problems mentioned below. Tble 3-1: The vlues of the probbilities for muttion, ctegory dd, nd ctegory delete used in the experiments to determine good vlues for the GA prmeters Vlue P (mut) P Ct ) P Ct ) ( dd ( del Not Selected Low Level 1/N High Level 5/N Note: these best vlues re bsed on previous experiments Tble 3-2: For ech problem (dtbse) we rn 3 experiments. For ech experiment we used the depicted combintions of number of genertions, nd popultion size (3 combintions). We evolved the trined Fuzzy ARTMAPs 50 different times (50 rndom seeds), nd for ech time we used the combintions of probbility vlues, shown in Tble 3-1. Hence, the FAMs were evolved 4050 times for ech problem, or totl of times for ll the problems. Gen mx size P, P (mut), Pop # Rndom Seeds ( ( Ctdel ) ) combintions P ( Ctdd ) # of Runs Experiment * 50 Experiment * 50 Experiment * 50 46

The three problems tht we chose to experiment with re: Problem 1 (Four squres in squre problem): In this problem we hve four squres (smller squres) symmetriclly locted within squre (lrger squre), s

Once point is chosen to lie within region, its loction within the region is chosen ccording to uniform distribution.

Some of the smller squres re completely or prtilly overlpping with the bigger squres. The probbility tht point flls in ny of the smller squres is 1/7.

62 The three problems tht we chose to experiment with re: Problem 1 (Four squres in squre problem): In this problem we hve four squres (smller squres) symmetriclly locted within squre (lrger squre), s Figure 3-4 demonstrtes. The probbility tht dt-point flls inside ech one of the smller squres or inside the lrger squre but outside the smller squres is equl to 0.2. Once point is chosen to lie within region, its loction within the region is chosen ccording to uniform distribution. Problem 2 (Vrying size squres within squre problem): In this problem we hve seven squres, ll enclosed within the squre ([0, 1]x[0,1]), s Figure 3-4b demonstrtes. Some of the smller squres re completely or prtilly overlpping with the bigger squres. The probbility tht point flls in ny of the smller squres is 1/7. The probbility tht point flls in ny of the bigger squres but not in the region defined by the overlpping smller squres is lso 1/7. Once point is chosen to lie within region, its loction within the region is chosen ccording to uniform distribution. Problem 3 (Two circles in squre problem): In this problem we hve two circles (of different size) within squre ([0, 1]x[0, 1]), s Figure 3-4c demonstrtes. The probbility tht dt-point flls inside the smll circle, inside the lrge circle, or inside the squre but outside the circles is equl to 0.2, 0.3, nd 0.5, respectively. Once point is chosen to lie within region, its loction within the region is chosen ccording to uniform distribution. b c Figure 3-4: : Problem 1 (Four squres in squre problem), b: (Asymmetric squres within squre problem), c: Problem 3 (Two circles in squre problem) 47

63 These problems without necessrily being representtive of the type of clssifiction problems tht we encounter in rel-world pplictions hve the dvntge tht they del with 2-D dt (for which visuliztion of the results is esy); they re multi-clss clssifiction problems where the number of clsses rnges from 3 (Problem 3) to 5 (Problem 1) to 7 (Problem 2); they correspond to problems tht hve symmetry (Problem 1), or not (Problems 2, 3); they re problems for which the clss boundries hve different structures (Problems 1 nd 2 hs clsses with rectngulr boundries, while Problem 3 hs clsses with circulr boundries); they re problems where one clss s boundry is completely or prtilly included within nother clss s boundry (thus mking the problems non-trivil); nd finlly they correspond to problems whose optiml decision boundries nd clssifiction ccurcy is known. In Figure 3-5, we show the verge fitness vlue (verge over the 50 runs) of the best FAM network produced by GFAM for Problem 1. The verticl xis in Figure 3-5 is showing the verge fitness vlue, while the horizontl xis hs discrete ticks tht correspond to ll possible combintions of the P Ct ) nd P Ct ) probbilities tht we chose to ( dd ( del experiment with (e.g., one of the ticks is 0.0, 0.1 indicting P Ct ) nd P( Ctdd ( del ) probbilities equl to 0.0 nd 0.1, respectively). Furthermore, there re three curves depicted in Figure 3-5 (with different colors (mrkers)) tht correspond to the three different vlues of the muttion probbility ( P (mut). In similr fshion, in Figure 3-6, we depict the verge fitness vlue (verged over 50 runs) of the best FAM network produced by GFAM for Problem 2. The philosophy of presenting the results in Figure 3-6 is the sme s the philosophy dopted for Figure 3-5. Finlly, in Figure 3-7 we re showing the verge fitness vlue (verge over the 50 runs) of the best FAM network produced by GFAM for Problem 3. The philosophy of presenting the results in Figure 3-7 is the sme s the philosophy dopted for Figure 3-5 nd

64 Two obvious observtions tht cn be extrcted from the three figures (Figures 3-5, 3-6 nd 3-7) re: () zero vlue of P (mut) is non-optiml choice, (b) zero vlues for both P ( Ctdd ) nd P ( Ctdel ), is lso non-optiml choice. If we exclude the vlue of zero muttion probbility nd compute the verge of the fitness vlues for the remining two competing muttion probbilities (muttion probbilities of 1 / N nd 5 / N ) we end up with the curves depicted in Figure 3-8. From Figure 3-8, it is obvious tht P(mut) of 5 / N gve better results for the following combintions of P Ct ) nd P Ct ) : (0, 0) (0, 0.1) (0, ( dd ( del 0.3) (0.1, 0) (0.1, 0.1). On the other hnd it is lso obvious tht ( P( mut)) of 1 / N gve better results for the following combintions of P Ct ) nd P Ct ) : (0.1, 0.3) (0.3, 0) (0.3, ( dd ( del 0.1) (0.3, 0.3). However, both of these vlues of muttion probbilities produced good results when P Ct ) nd P Ct ) were equl to 0.1 nd 0.1, respectively. Hence, for our future ( dd ( del experiments with GFAM we chose to experiment only with vlues of P (mut), P Ct ) nd P ( Ctdel ) equl to5 / N, 0.1 nd 0.1, respectively. ( dd Figure 3-5: Averge Fitness vlue of the Best FAM produced by GFAM for Problem 1. The verge is computed over the 50 runs. The verge fitness vlues re shown with respect to ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the three different vlues of the muttion probbility. 49

65 Figure 3-6: Averge Fitness vlue of the Best FAM produced by GFAM for Problem 2. The verge is computed over the 50 runs. The verge fitness vlues re shown with respect to ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the three different vlues of the muttion probbility. Figure 3-7: Averge Fitness vlue of the Best FAM produced by GFAM for Problem 2. The verge is computed over the 50 runs. The verge fitness vlues re shown for ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the three different vlues of the muttion probbility 50

66 Figure 3-8: Averge Fitness vlue of the Best FAM produced by GFAM for Problems 1, 2 nd 3. The verge is computed over the 50 runs. The verge fitness vlues re shown for ll pirs of ctegory dd nd ctegory delete probbilities. The different colored curves correspond to the two different non-zero vlues of the muttion probbility 3.2 Experiments with GFAM We hve performed number of experiments with GFAM. The purpose of these experiments ws two-fold: First to exmine the performnce of GFAM on vriety of clssifiction problems (some of them simulted, some of them rel) with respect to the resulting network ccurcy (generliztion performnce) nd with respect to resulting network size. Secondly, to compre the performnce of GFAM with other ART network clssifiers tht hve been proposed in the literture with the intent of ddressing the ctegory prolifertion problem in FAM. We undertke the first tsk (performnce of GFAM) in this section nd the second tsk (comprisons of GFAM nd other ART clssifiers) in the following section Dtbses We experimented with both rtificil nd rel dtbses. The specifics of these dtbses re given in Tble

67 1. Gussin Dtbses: These re rtificil dtbses, where we creted 2-dimensionl dt sets, Gussinly distributed, belonging to 2-clss, 4-clss, nd 6-clss problems. In ech one of these dtbses we vried the mount of overlp of dt belonging to different clsses. In prticulr, we considered 5%, 15%, 25%, nd 40% overlp. Note tht 5% overlp mens the optiml Byesin Clssifier would hve 5% misclssifiction rte on the Gussinly distributed dt. There re totl of 3 4=12 Gussin dtbses. We nme the dtbses s G#c-## where the first number is the number of clsses nd the second number is the clss overlp. For exmple, G2c-05 mens the Gussin dtbse is 2-clss nd 5% overlp dtbse. 2. Structures within Structure dtbses: These re rtificil dtbses tht were inspired by the circle (structure) in the squre (structure) problem. This problem hs been extensively exmined in the ART, nd other thn ART neurl network literture. Eight different dtsets were generted by chnging the structures (type, number nd probbility) tht we were deling with. The dt-points within ech structure of these rtificil dtsets re either uniformly distributed within the structure. The number of points within ech structure is chosen in wy tht the probbility of finding point within this structure is equl to pre-specified number. 4Ci/Sq: This is 4 circle in squre problem, obviously five clss clssifiction problem. The probbility of finding dt point within circle or inside the squre nd outside the circles is equl to 1/5. 4Sq/Sq: This is 4 squre (inside squres) in squre (outside squre) problem, obviously five clss clssifiction problem. The probbility of finding dt point within n inside squre or outside the inside squres nd inside the outside squre is equl to 1/5. 52

68 7Sq: This is seven clss problem, with four squres nd 3 rectngle-like shpes. The probbility of finding dt point within ny of the seven squres is equl to 1/7. 1Ci/Sq: This is 1 circle in squre problem, obviously two clss clssifiction problem. The probbility of finding dt point within circle or inside the squre nd outside the circle is equl to 1/2. The sizes of the res in the circle nd outside the circle nd inside the squre re the sme. This is the benchmrk circle in the squre problem. 1Ci/Sq/0.3:0.7: This is 1 circle in squre problem, obviously two clss clssifiction problem. The probbility of finding dt point within circle or inside the squre nd outside the circle is equl to 0.3 nd 0.7, respectively. The sizes of the res in the circle nd outside the circle nd inside the squre re 0.3 nd 0.7 respectively. 5Ci/Sq: This is 5 concentric circles in squre problem, obviously six clss clssifiction problem. The probbility of finding dt point within ech one of the co-centric circles, or inside the squre nd outside the circles is equl to 1/6. 2Ci/Sq/5:25:70: This is two circles in squre problem, obviously three clss clssifiction problem. One of the circles is smller thn the other. The probbility of finding dt point within the smll circle, the lrge circle, nd outside the circles nd inside the squre is 0.05, 0.25, nd 0.7, respectively. 2Ci/Sq/20:30:50: This is two circles in squre problem, obviously three clss clssifiction problem. One of the circles is smller thn the other. The probbility of finding dt point within the smll circle, the lrge circle, nd outside the circles nd inside the squre is 0.2, 0.3, nd 0.5, respectively. In Figure 3-9 nd 3-10 we show plots of the simulted dtbses. 53

69 3. Modified Iris Dtbse (MOD-IRIS): In this dtbse we strted from the IRIS dtset (Hettich et l. [16]) of the clss problem. We eliminted the dt corresponding to the clss tht is linerly seprble from the others. Thus, we ended up with 100 dtpoints. From the four input ttributes of this IRIS dtset we focused on only two ttributes (ttribute 3 nd 4) becuse they seem to hve enough discrimintory power to seprte the 2-clss dt. Finlly, in order to crete resonble size dtset from these 100 points (so we cn relibly perform cross-vlidtion to identify the optiml ART, GFAM networks) we creted noisy dt round ech one of these 100 dt-points (the noise ws Gussin of zero men nd smll vrince) to end up with pproximtely 10,000 points. We nmed this dtbse Modified Iris. 4. Modified Ablone Dtbse (ABALONE): This dtbse is originlly used for prediction of the ge of n blone (Hettich et l. [16]). It contins 4177 instnces, ech with 7 numericl ttributes, 1 ctegoricl ttribute, nd 1 numericl trget output (ge). We discrded the ctegoricl ttribute in our experiments, nd grouped the trget output vlues into 3 clsses: 8 nd lower (clss 1), 9-10 (clss 2), 11 nd greter (clss 3). This grouping of output vlues hs been reported in the literture before. 5. Pge Blocks Dtbse (PAGE): This dtbse represents the problem of clssifying the blocks of the pge lyout in document (Hettich et l. [16]). It contins 5473 exmples coming from 54 distinct documents. Ech exmple hs 10 numericl ttributes (e.g., height of the block, length of the block, eccentricity of the block, etc.,) nd one trget (output) ttribute, representing the type of the block (text, horizontl line, grphic, verticl line, nd picture). One of the noteworthy points bout this dtbse is tht its mor clss (text) hs high probbility of occurring (bove 80%). This dtset hs five clsses, four of them mke only 9% of the totl instnces. 54

70 The dt in ech one of the bove dtbses ws split into trining set, vlidtion set, nd test set. The percentge of clsses in ech one of these subsets resembled the percentge of clsses in the originl dtset. The summrized specifics of ech one of these dtbses re depicted in Tble 3-3. The trining set ws used to trin the FAM networks tht were used to initilize the popultion in GFAM, the vlidtion test ws used to ssess the performnce of the FAM networks during their evolution, nd the test set ws used to report the performnce of the best FAM network (clled GFAM) t the completion of the evolutionry process. Dtbse Nme Tble 3-3: Dtbses used in the Genetic ARTMAP experiments # Trining Instnces # Vlidtion Instnces # Test Instnces # Numericl Attributes # Clsses ( N b ) % Mor Clss ( A 0 ) 1 G2c /2 2 G2c /2 3 G2c /2 4 G2c /2 5 G4c /4 6 G4c /4 7 G4c /4 8 G4c /4 9 G6c /6 10 G6c /6 11 G6c /6 12 G6c /6 13 4Ci/Sq Sq/Sq Sq /7 16 1Ci/Sq Ci/Sq/0.3: Ci/Sq /6 19 2Ci/Sq/5:25: Ci/Sq/20:30: SqWN /7 21 5Ci/SqWN //6 22 MOD-IRIS /2 23 ABALONE /3 24 PAGE

71 G2c_5 G2c_15 G2c_25 G2c_40 G4c_5 G4c_15 G4c_25 G4c_40 G6c_5 G6c_15 G6c_25 G6c_40 Figure 3-9: Gussin Dtbses (2-dimensionl, 2, 4 or 6 clss, 5, 15, 25 nd 40 % of overlp) 4Ci/Sq 4Sq/Sq 7Sq 1Ci/Sq 1Ci/Sq/0.3:0.7 5Ci/Sq 2Ci/Sq/20:30:50 Figure 3-10: Structures within Structure Dtbses 56

72 3.2.2 Experimentl Procedure Experimentl Results In section , we hve experimented extensively with GFAM to identify good initiliztion of the GA process nd to specify good set of prmeters for the evolution of trined FAMs. From this point on, the GFAM is produced by first initilizing popultion of 20 trined FAM networks (they were trined with different vlues of the bseline vigilnce prmeter nd different orders of trining pttern presenttions), nd by evolving them for 500 genertions. In prticulr, the GA prmeters used for the cretion of GFAM were: min ρ = 0.1, mx ρ = 0.95, β =0.1, Pop size = 20, Gen mx = 500, NC best = 3, Ct min = 1, Ct mx = 300, P Ct ) =0.1, P Ct ) =0.1, P (mut) = 5/N. GFAM is the FAM network tht ttins ( dd ( del the highest vlue of the fitness function t genertion 500 of the evolutionry process. 3.3 GFAM Performnce In this section we re reporting the performnce of GFAM on ech one of the dtbse problems, described in Section The performnce of GFAM is ssessed by reporting the size of GFAM nd the ccurcy ttined by GFAM on the test set. The results re reported in Tble 3-4. In Tble 3-4, the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in section Columns 3 nd 4 contin the ccurcy nd size of the GFAM network for the designted dtbse. The performnce of GFAM, s it is evidenced by the results in Tble 3-4, is verified by some obvious observtions. For instnce, GFAM s performnce on dtbses 1-12 (Gussin dtsets of known mount of overlp) is nerly optiml; for exmple the best performnce on the G6c-15 problem (6 clss Gussin dtset of 15% overlp) is clssifier with 6 ctegories nd 85% correct clssifiction, nd GFAM is clssifier with 6 ctegories nd 84.71% of correct clssifiction. Similrly, in the 7Squre problem the optiml clssifier would require 7 ctegories nd ttin 100% correct clssifiction; GFAM is 7 ctegory clssifier exhibiting 97.2% of correct clssifiction. Finlly, two of the rel problems 57

73 reported here, MOD-IRIS nd PAGE, lso gve very good results lmost 95% percentge of correct clssifiction, by creting only two ctegories. Notice how GFAM ignored one clss in the blone dtset (blone is 3 clss problem), since the number of smples from this clss is very smll compred to the number of smples pertining to the other two clsses. Hence, GFAM s fitness function ws higher for network with 2 ctegories nd 58.73% generliztion ccurcy thn for network with 3 ctegories nd slightly higher generliztion ccurcy Performnce Comprisons of GFAM nd other ART Networks We compred GFAM s performnce with the performnce of the following networks: ssfam, sseam, ssgam, nd sfe micro-artmap. We chose these networks for reson. Ech one of these ART networks t the time of their introduction into the literture emphsized tht they were ddressing the ctegory prolifertion problem in ART. More detils bout the specifics of ech one of these networks cn be found in their ssocited references (provide references here). For the purposes of this thesis it suffices to know tht sseam covers the spce of the input ptterns with ellipsoids, while ssgam covers the spce of the input ptterns with bell-shped curves. Furthermore ssfam, sseam, nd ssgam llow ctegory (hyper-rectngle or ellipsoid or hyper-dimensionl bell shped curve) to encode ptterns of different lbels provided tht the plurlity lbel of ctegory exceeds certin, user-specified, threshold. Finlly, micro-artmap llows the encoding of ptterns of different lbels by single ctegory, provided tht the entropy of the ctegory does not exceed certin, user-defined threshold. The comprisons of GFAM nd the forementioned ART networks re depicted in Tble 3-4 where the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in erlier sections. Columns 3-10 of Tble 3-4 contin the performnce of the designted ART networks. The performnce 58

74 reported includes the ccurcy of the best ART network on the test set. The performnce lso includes the number of ctegories creted of the designted ART network. The reported numbers of ccurcy nd size of the best network correspond to the ART network tht ttined the highest vlue of the fitness function (this vlue ws computed bsed on the ccurcy of the ART network on the cross-vlidtion set, nd on the size of the ART network). Note, tht for networks, other thn GFAM, the best ART network ws determined fter extensive experimenttion with the ART network s prmeter vlues (e.g., in ssfam the best network ws determined fter trining ssfam networks with different vlues of the choice prmeter, vigilnce prmeter, order of pttern presenttion, nd mount of mixture of lbels llowed within ctegory; totl of 20,000 ssfam networks were trined nd their performnce ws exmined). On the other hnd, the performnce of the GFAM is the one clculted fter the evolution of 20 FAM trined networks for 500 genertions with GA vlues s indicted in Section 5.3. According to the results in Tble 3-4, in ll instnces (except minor exceptions) the ccurcy of GFAM (generliztion performnce) is higher thn the ccurcy of the other ART network (where ART is ssfam, sseam, ssgam or sfe micro-artmap). According to the results in Tble 3-4, in ll instnces (with no exceptions) the size of GFAM is smller thn the size of the other ART network (where ART is ssfam, sseam, ssgam or sfe micro-artmap), sometimes even by fctor of 15. For exmple, the generliztion performnce of GFAM cn be s 13% better thn the generliztion performnce of ssfam, while its size cn be by fctor of 4 times smller thn the size of ssfam. Also, the generliztion performnce of GFAM cn be s 13% better thn the generliztion performnce of sseam, while its size cn be by fctor of 4.5 times smller thn the size of sseam. Furthermore, the generliztion performnce of GFAM cn be s 6% better thn the generliztion performnce of ssgam, while its size cn be by fctor of 15 times smller 59

75 thn the size of ssgam. Finlly, the generliztion performnce of GFAM cn be s 10% better thn the generliztion performnce of sfe micro-artmap, while its size cn be by fctor of 3 times smller thn the size of sfe micro-artmap. The comprison results between GFAM nd the other ART networks re lso pictorilly depicted in figures 3-11 to 3-11d. In ech one of these figures we re showing the ccurcy of GFAM, nd tht of one other network (e.g., ssfam). In the sme figure we re lso showing the size of the GFAM nd tht of one other ART network. This wy the one-toone comprison of the GFAM nd the other ART network cn be quickly ssessed. Wht is most worth pointing out is tht the better performnce of GFAM is ttined with reduced computtions compred with the computtions needed by the lternte methods (ssfam, sseam, ssgam, sfe micro-artmap). Specificlly, the performnce ttined by ssfam, sseam, ssgam nd the sfe micro-artmap required trining these networks for lrge number of network prmeter settings (t lest 20,000 experiments) nd then choosing the network tht chieved the higher vlue for the fitness function tht we introduced erlier in Section 4. Of course, one cn rgue tht such n extensive experimenttion with these networks might not be needed, especilly if one is fmilir with the functionlity of these networks nd chooses to experiment only with limited set of network prmeter vlues. However, the prctitioner in the field might lck the expertise to crefully choose the network prmeters to experiment with, nd consequently might need to experiment extensively to come up with good network. In Appendix B, we show, in more detil, how more computtionlly efficient GFAM is compred to ssfam, sseam, ssgam nd sfe micro- ARTMAP. The comprison is bsed under the ssumption tht extensive prmeter experimenttion with the network prmeters of ssfam, sseam, ssgam or sfe micro- ARTMAP is needed to obtin good performing ssfam, sseam, ssgam or sfe micro- ARTMAP network, respectively. 60

76 Figure 3-11: Performnce nd Size comprison of GFAM vs ssfam Figure 3-11b: Performnce nd Size comprison of GFAM vs sseam 61

77 Figure 3-11b: Performnce nd Size comprison of GFAM vs ssgam Figure 3-11d: Performnce nd Size comprison of GFAM vs microartmap 62

78 Tble 3-4: Accurcy nd size results chieved by GFAM nd other ART networks. Note tht:sfe uam: Sfe microartmap; FAM: Fuzzy ARTMAP; EAM: Ellipsoidl ARTMAP; GAM: Gussin ARTMAP; ss*: semi-supervised version Dtbse Nme GFAM Sfe µam ssfam sseam ssgam 1 G2c G2c G2c G2c G4c G4c G4c G4c G6c G6c G6c G6c Ci/Sq Sq/Sq Sq Ci/Sq Ci/Sq/0.3: Ci/Sq Ci/Sq/20:30: SqWN Ci/SqWN MOD-IRIS ABALONE PAGE Performnce Comprisons of GFAM nd Other Neurl Networks The comprison of GFAM, nd ssfam, sseam, ssgam, provided in the previous section is fir becuse it used the sme dtbses nd dtsets/per dtbse for trining, 63

79 vlidtion nd testing of these rchitectures, nd the sme criterion for finding the best of these ART rchitectures (the criterion ws to mximize the fitness function, defined in Section 3.1.1). However, some of the structures/in/ structure rtificil dtbses, extensively exmined bove, hve lso been utilized to ssess the performnce of other ART rchitectures, such s the distributed Fuzzy ARTMAP (dfam), FsART, nd distributed FsART (see Prdo-Hernndez, et l., 2003). Distributed Fuzzy ARTMAP differs by Fuzzy ARTMAP in the sense tht more thn one ctegory is ctivted to represent n input pttern in ART s trining phse. FsART uses different ctivtion function compred to the one used by Fuzzy ARTMAP. Finlly, distributed FsART is the distributed version of FsART, in similr mnner s distributed Fuzzy ARTMAP is the distributed version of Fuzzy ARTMAP. More detils bout the functionlity of these ART networks cn be found in Prdo-Hernndez, et l., 2003 nd they re beyond the scope of this pper. We voided the extensive comprison of GFAM with dfam, FsART, nd dfsart for reson. Although some of the dtbses used to ssess the performnce of dfam, FsART, nd dfsart, in Prdo-Hernndez, et l., 2003, re the sme s the dtbses used to ssess the performnce of GFAM, the ctul dt used for trining, nd testing of GFAM re not the sme used for the trining nd testing of dfam, FsART, nd dfsart. Furthermore, prmeter network optimiztion with vlidtion set, such s to optimize fitness function, ws not conducted for FsART, dfam, nd dfsart. Actully, the results reported in Prdo-Hernndez, et l., re verges of the performnces of the dfam, FsART, nd dfasart on test set of 5,000 points for specific set of network prmeter vlues (we tend to think tht it ws good set of network prmeter vlues). The verges correspond to the verge performnce ttined by 100 different choices of trining sets of size equl 2,000 points. The comprison between GFAM performnce nd dfam, FsART, nd dfsart performnces cn be deduced from the summrized numbers of Tble 3-5. Bsed on this tble, we cn stte tht 64

80 the GFAM performnce is better thn the verges of the performnces ttined by dfam, FsART nd dfsart, t lest for the dtbses contined in Tble 3-5. Tble 3-5: Accurcy nd size results chieved by GFAM nd other ART networks. Note: dfam: Distributed Fuzzy ARTMAP, FsART, dfsart : Distributed FsART, GFAM : Genetic Fuzzy ARTMAP Dtbse Index Dtbse Nme dfam FsART dfsart GFAM 13 4Ci/Sq Sq/Sq Ci/Sq Ci/Sq/0.3: Ci/Sq Ci/Sq/20:30: GFAM Summry nd Conclusions Adptive Resonnce Theory (ART) neurl networks hve been introduced into the literture by Crpenter, Grossberg nd their collegues t Boston University, s well s other reserchers in the field. The consensus with ART networks is tht they converge fst to solution for rbitrry clssifiction problems they cn provide explntions for the nswers tht they produce, they cn function in n on-line trining mode, nd they solve effectively vriety of clssifiction problems. However, ll these benefits sometimes come t the expense of unnecessrily creting too mny ctegories to solve the problem t hnd, referred to s the ctegory prolifertion problem in ART. This problem is more cute when ART is confronted with clssifiction problems tht del with noisy or highly overlpping dt. To llevite this problem number of reserchers hve proposed solutions, such s ssfam, sseam, ssgam (see Angnostopoulos, et l., 2003, Verzi, et l., 2001), nd sfe micro-artmap (see Gomez, et l., 2002), to mention only few. In this thesis, we hve introduced, yet nother, method of solving the ctegory prolifertion problem in ART. This method relies on evolving popultion of trined ART 65

81 networks, nd more specificlly Fuzzy ARTMAP (FAM) neurl networks. The evolution of trined FAMs cretes n ART network, referred to s GFAM. We hve experimented with number of dtbses tht helped us identify good defult prmeter settings for the evolution of FAM. We defined fitness function tht gve emphsis to the cretion of smll size FAM networks which exhibited good generliztion. In the evolution of FAM trined networks we lso defined nd ustified the usge of unique opertors, such s the delete ctegory nd dd ctegory opertors. The GFAM network identified t the end of the evolutionry process (lst genertion) ws the FAM network tht ttined the highest fitness vlue. Our method for creting GFAM resulted in FAM network tht performed well on number of clssifiction problems. In prticulr, GFAM ws found superior to number of other ART techniques (ssfam, sseam, ssgam, sfe micro-artmap) tht hve been introduced into the literture to ddress the ctegory prolifertion problem in FAM. More specificlly, GFAM gve better generliztion performnce (in lmost ll problems tested) nd smller size network (in ll problems tested), compred to these other ART techniques. Wht is lso worth mentioning is tht GFAM outperformed these other ART techniques by requiring only frction of the computtions needed by these other networks. Obviously, the introduced method to evolve trined FAMs cn be extended to other ART rchitectures, such s EAM, nd GAM, mongst others, without ny significnt chnges in the pproch followed, nd tht is the issue tht we tckle in Sections 4 nd 5 of this thesis. 66

82 4. GEAM AND GGAM 4.1 Genetic Ellipsoidl ARTMAP (GEAM) GFAM (Genetic Ellipsoidl ) is n evolved EAM network tht is produced by pplying, repetedly, genetic opertors on n initil popultion of trined EAM networks. GEAM uses tournment selection with elitism, s well s genetic opertors, including crossover nd muttion. In ddition, GEAM uses two specil opertors, Ct dd nd Ct del. To better understnd how GEAM is designed we resort to step-by-step description of this design. Plese refer to the terminology introduced in Appendix A before you dwell in the ste-by-step description of EAM. The design of GEAM cn be rticulted through sequence of steps, defined succinctly below, nd explined in detil lter. Step 1: Initilize Pop size number of EAM networks, ech one of them operting with different vlue for the bseline vigilnce prmeter ρ, nd possibly different orders of pttern presenttion. Step 2: Trin ech one of the mximum number of itertions ( Gen ). Step 3: Convert the Pop size initilized EAM networks, using the trining set for mx Pop size trined EAM networks into chromosomes. Crop ll chromosomes so tht no one-point ctegories exist. Step 4: Evolve the chromosomes of the current genertion by executing the following substeps: Sub-Step 4: Clculte fitness for ll chromosomes of the current genertion. Sub-Step 4b: Initilize n empty genertion (referred to s temporry genertion). Sub-Step 4c: Move the best ( NC temporry genertion. best ) chromosomes from the current genertion to the 67

83 Sub-Step 4d: Select chromosomes for crossover from the current genertion nd thus further populte the temporry genertion. Sub-Step 4e: With probbility P( Ctdd ) pply the Ctdd opertor on every individul generted in sub-step 4d. Sub-Step 4f: With probbility P ( Ctdel ) pply the Ct del opertor on every individul generted in sub-step 4e. Sub-Step 4g: With probbility P (Mut) pply the muttion opertor on every individul generted in sub-step 4f. Sub-Step 4h: Replce the current genertion with the members of the temporry genertion Step 5: If evolution hs reched the mximum number Genmx of itertions, then clculte the performnce of the best-fitness EAM network on the test set nd report clssifiction ccurcy nd number of ctegories tht this Best-Fitness EAM network possesses. If the mximum number of itertions hs not been reched yet, go to step 4 to evolve one more popultion of chromosomes Ech one of the forementioned steps of the lgorithm is now described in more detil, s needed. Step 1 (More Detils): The lgorithm strts by trining Pop EAM networks, ech one of them trined with different vlue of the bseline vigilnce prmeter ρ. In prticulr, we size first define ρ inc mx min ρ ρ = Pop 1 size, nd then the bseline vigilnce prmeter of every network is inc determined by the eqution ρ min + i * ρ, where } i { 0, 1,..., Popsize 1. In our experiments with GFAM we chose ρ min = 0. 1, nd ρ mx = Menwhile, GEAM llows the user to chnge the order of pttern presenttion utomticlly nd rndomly. 68

84 Step 2 (More Detils): We ssume tht the reder is fmilir of how trining of EAM networks is ccomplished, nd thus the detils here re omitted. Step 3 (More Detils): Once the Pop size networks re trined they need to be converted to chromosomes, so tht they cn be mnipulted by the genetic opertors. GEAM uses mix of rel numbers representtion to encode the networks. Ech EAM chromosome consists of two levels, level 1 contining ll the ctegories of the EAM network, nd level 2 contining the center, the direction, the rdius nd the xis rtio, s well s the lbel of tht ctegory (see Figure 4-1). We denote the ctegory of trined EAM network with index p 1 p Pop ) ( size by w ( p), where w ( p) = ( m, d, r ), the xis rtio by µ ( p), nd the lbel of this ctegory by l ( p) for 1 N ( p). In this step we re lso eliminting single-point ctegories in the trined EAM networks, referred to s cropping the chromosomes. Since our ultimte obective is to design n EAM network tht reduces the network size nd improves generliztion we re discourging t this stge the cretion of single-point ctegories. w ( ) w ( ) w ( p) ( p) 1 p 2 p w N m ( p) d ( p) r ( p) µ ( p) l ( p) Figure 4-1: GEAM chromosome structure Step 4 (More Detils): In this step the GFAM pplies GA to the popultion of trined EAMs. Sub-step 4 (More Detils): Clculte the fitness of ech chromosome (trined EAM). This is ccomplished by feeding into ech trined EAM the vlidtion set nd by clculting the percentge of correct clssifiction exhibited by ech one of these trined EAM networks. In 69

85 prticulr, if PCC( p) designtes the percentge of correct clssifiction, exhibited by the p-th EAM, nd this EAM network possesses N ( p) nodes in its ctegory representtion lyer, then its fitness function vlue is defined by: ( Ctmx N ( p)) PCC ( p) Fit( p) = 100 PCC( p) + ε Ct N ( p) min where, Ct Ctmin nd mx re the minimum nd mximum number of ctegories tht FAM network is llowed to hve during the evolutionry process ( Ct min is chosen equl to 1, or equl to the number of clsses in the clssifiction problem under considertion, while Ctmx is chosen to be reltively lrge number for the clssifiction problem t hnd). The constnt ε in the denomintor of the bove eqution is smll positive constnt nd it is needed to mke sure tht the denomintor would not be zero in the cse when N ( p) = Ct nd PCC ( p) = 100. min Sub-step 4b (More Detils): Obvious, no further explntions re needed. Sub-step 4c (More Detils): The lgorithm serches for the best the current genertion nd copies them to the temporry genertion. NCbest chromosomes from Sub-step 4d (More Detils): The remining Pop NC chromosomes in the temporry size best genertion re creted by crossing over pirs of prents from the current genertion. The prents re chosen using deterministic tournment selection, s follows: Rndomly select two groups of four chromosomes ech from the current genertion, nd use s prent, from ech group, the chromosome with the best fitness vlue in the group. If it hppens tht from both groups the sme chromosome is chosen then we choose from one of the groups the chromosome with the second best fitness vlue. If two prents with indices p, p re crossed over two rndom numbers n, n re generted from the index sets { 1, 2,..., N ( p)} nd { 1, 2,..., N ( p )}, respectively. Then, ll the ctegories with index 70

86 greter thn index n in the chromosome with index p nd ll the ctegories with index less thn index n in the chromosome with index p re moved into n empty chromosome within the temporry genertion. Notice tht crossover is done on level 1 of the chromosome. This opertion is pictorilly illustrted in Figure 3-2. Sub-step 4e (More Detils): The opertor Ctdd dds new ctegory to every chromosome creted in step 4d with probbility P Ct ). The new ctegory hs center m, direction ( dd vector d, rdius r, n xis rtio µ, nd lbel l tht re prtilly rndomly generted s follows: For every dimension of the input feture spce ( M dimensions totl) we generte rndom number uniformly distributed in the intervl [0, 1]; we ssign m these vlues, the direction vector d is ssigned zeros (i.e. it will ct s if it is circle, lthough µ could be mutted to vlue other thn 1 in the next genertions), the xis rtio µ is ssigned 1, the rdius r is given rndom numbers in the intervl [0, 1], nd the lbel of this newly creted ctegory is chosen rndomly mongst the N b ctegories of the pttern clssifiction tsk under considertion. A chromosome does not dd ctegory if the ddition of this ctegory results in number of ctegories for this chromosome tht exceeds the designted mximum number of ctegoriesct. mx Sub-step 4f (More Detils): The opertor Ct del deletes one of the ctegories of every chromosome creted in step 4e with probbility P Ct ). A chromosome does not delete ( del ctegory if the deletion of this ctegory results in the number of ctegories for this chromosome to fll below the designted minimum number of ctegoriesct. Sub-Step 4g (More Detils): In GEAM, every chromosome creted by step 4f gets mutted s follows: with probbility P (mut) every ctegory is mutted. If ctegory is chosen, then every component of m gets mutted by dding to it smll number. This number is drwn from Gussin distribution with men 0 nd stndrd devition If the component of min 71

87 the chosen vector becomes smller thn 0 or greter thn 1 (fter muttion), it is set bck to 0 or 1, respectively. Furthermore, the ctegory s xis rtio µ or rdius r is selected (50% probbility); we then dd smll number drwn from Gussin distribution to the selected item with the sme rules s bove, here though, if µ gets greter thn 1 we set it to one, otherwise, if it becomes zero or less we set its vlue to , lso, if the rdius r becomes zero or less wet it bck to Notice tht muttion is pplied on level 2 of the chromosome structure, but the lbel of the chromosome is not mutted (the reson being tht our initil GA popultion consists of trined EAMs, nd consequently we hve lot of confidence in the lbels of the ctegories tht these trined EAMs hve discovered through the EAM trining process). Step 5 (More Detils): Obvious, no more detils re needed GEAM Experiments nd Results We used the sme defult set of prmeters used for GFAM to run ll the experiments of GEAM nd the results were very good. Hence, in GEAM s cse we voided the experimenttion (pplied to GFAM) to choose good defult vlues for the GA prmeters. Hence, GEAM is produced by first initilizing popultion of 20 trined EAM networks (they were trined with different vlues of the bseline vigilnce prmeter nd different orders of trining pttern presenttions), nd by evolving them for 500 genertions. In min mx prticulr, the GA prmeters used for the cretion of GEAM were: ρ = 0.1, ρ = 0.95, β =0.1, Pop size = 20, Gen mx = 500, NC best = 3, Ct min = 1, mx Ct = 300, P Ct ) ( dd =0.1, P Ct ) =0.1, P (mut) = 5/N. GEAM is the EAM network tht ttins the highest ( del vlue of the fitness function t genertion 500 of the evolutionry process GEAM Performnce In this section we re reporting the performnce of GEAM on ech one of the dtbse problems, described in Section Similr to tht of GFAM, the performnce of 72

88 GEAM is ssessed by reporting the size of GEAM nd the ccurcy ttined by GEAM on the test set. The results re reported in Tble 4-1. In Tble 4-1, the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in section , nd summrized in Tble 3-3. Columns 3 nd 4 contin the size nd ccurcy of the GEAM network for the designted dtbse. The performnce of GEAM, s it is evidenced by the results in Tble 4-1, is verified by some obvious observtions. For instnce, GEAM s performnce on dtbses 1-12 (Gussin dtsets of known mount of overlp) is nerly optiml; for exmple the best performnce on the G6c-40 problem (6 clss Gussin dtset of 40% overlp) is clssifier with 6 ctegories nd 60% correct clssifiction, nd GEAM is clssifier with 6 ctegories nd 59.35% of correct clssifiction. Similrly, in the CINS problem the optiml clssifier would require 2 ctegories nd ttin 100% correct clssifiction; GEAM is 2 ctegory clssifier exhibiting 99.9% of correct clssifiction. Finlly, two of the rel problems reported here, MOD-IRIS nd PAGE, lso gve very good results ttining 94.8% nd 94.12% of correct clssifiction, while creting only two nd three ctegories, respectively Performnce Comprisons of GEAM nd other ART Networks We compred GEAM s performnce with the performnce of other ART networks, such s sseam, sseam, ssgam, nd sfe micro-artmap. The comprisons of GEAM nd the forementioned ART networks re depicted in Tble 4-1, where the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in erlier sections. Columns 3-10 of Tble 4-1 contin the performnce of the designted ART networks. The performnce reported includes the ccurcy of the best ART network on the test set. The performnce lso includes the number of ctegories creted of the designted ART network. The reported numbers of ccurcy nd size of the best network correspond to the ART network tht ttined the 73

89 highest vlue of the fitness function (this vlue ws computed bsed on the ccurcy of the ART network on the cross-vlidtion set, nd on the size of the ART network). Note, tht for networks, other thn GEAM, the best ART network ws determined fter extensive experimenttion with the ART network s prmeter vlues (e.g., in sseam the best network ws determined fter trining sseam networks with different vlues of the choice prmeter, vigilnce prmeter, order of pttern presenttion, nd mount of mixture of lbels llowed within ctegory; totl of more thn 20,000 sseam networks were trined nd their performnce ws exmined). On the other hnd, the performnce of the GEAM is the one clculted fter the evolution of 20 EAM trined networks for 500 genertions with GA vlues s indicted in Section According to the results in Tble 4-1, in ll instnces (except minor exceptions) the ccurcy of GEAM (generliztion performnce) is higher thn the ccurcy of the other ART network (where ART is sseam, sseam, ssgam or sfe micro-artmap). According to the results in Tble 4-1, in ll instnces (with no exceptions) the size of GEAM is smller thn the size of the other ART network (where ART is sseam, sseam, ssgam or sfe micro-artmap), sometimes even by fctor of 12. For exmple, the generliztion performnce of GEAM cn be s 13% better thn the generliztion performnce of ssfam, while its size cn be by fctor of 4 times smller thn the size of ssfam. Also, the generliztion performnce of GEAM cn be s 15% better thn the generliztion performnce of sseam, while its size cn be by fctor of 6.5 times smller thn the size of sseam. Furthermore, the generliztion performnce of GEAM cn be s 9% better thn the generliztion performnce of ssgam, while its size cn be by fctor of 12 times smller thn the size of ssgam. Finlly, the generliztion performnce of GEAM cn be s 10% better thn the generliztion performnce of sfe micro-artmap, while its size cn be by fctor of 4 times smller thn the size of sfe micro-artmap. 74

90 The comprison results between GEAM nd the other ART networks re lso pictorilly depicted in figures 4-2 to 4-2d. In ech one of these figures we re showing the ccurcy of GEAM, nd tht of one other network (e.g., sseam). In the sme figure we re lso showing the size of the GEAM nd tht of one other ART network. This wy the one-toone comprison of the GEAM nd the other ART network cn be quickly ssessed. Wht is most worth pointing out is tht the better performnce of GEAM is ttined with reduced computtions compred with the computtions needed by the lternte methods (sseam, sseam, ssgam, sfe micro-artmap). Specificlly, the performnce ttined by sseam, sseam, ssgam nd the sfe micro-artmap required trining these networks for lrge number of network prmeter settings (t lest 20,000 experiments) nd then choosing the network tht chieved the higher vlue for the fitness function tht we introduced erlier in Section Of course, one cn rgue tht such n extensive experimenttion with these networks might not be needed, especilly if one is fmilir with the functionlity of these networks nd chooses to experiment only with limited set of network prmeter vlues. However, the prctitioner in the field might lck the expertise to crefully choose the network prmeters to experiment with, nd consequently might need to experiment extensively to come up with good network. In Appendix B, we show, in more detil, how more computtionlly efficient GFAM is compred to sseam, sseam, ssgam nd sfe micro- ARTMAP. The computtionl complexity of GEAM is given by similr equtions s the GFAM computtionl complexity, clculted in Appendix B. The comprison between GFAM (nd GEAM) nd the rest of the ART networks is bsed under the ssumption tht extensive prmeter experimenttion with the network prmeters of sseam, sseam, ssgam or sfe micro-artmap is needed to obtin good performing sseam, sseam, ssgam or sfe micro-artmap network, respectively. 75

91 Figure 4-2: Performnce nd Size comprison of GEAM vs ssfam Figure 4-2b: Performnce nd Size comprison of GEAM vs sseam 76

92 Figure 4-2c: Performnce nd Size comprison of GEAM vs ssgam Figure 4-2d: Performnce nd Size comprison of GEAM vs microartmap 77

93 Tble 4-1: Accurcy nd size results chieved by GEAM nd other ART networks. Note tht:sfe uam: Sfe microartmap; FAM: Fuzzy ARTMAP; EAM: Ellipsoidl ARTMAP; GAM: Gussin ARTMAP; ss*: semi-supervised version Dtbse Nme GEAM Sfe µam ssfam sseam ssgam 1 G2c G2c G2c G2c G4c G4c G4c G4c G6c G6c G6c G6c Ci/Sq Sq/Sq Sq Ci/Sq Ci/Sq/0.3: Ci/Sq Ci/Sq/20:30: SqWN Ci/SqWN MOD-IRIS ABALONE PAGE Summry/Conclusions In this section, we hve introduced, yet nother, method of solving the ctegory prolifertion problem in ART. This method relies on evolving popultion of trined ART 78

94 networks, nd more specificlly Ellipsoidl ARTMAP (EAM) neurl networks. The evolution of trined EAMs cretes n ART network, referred to s GEAM. In chpter 3 we defined methodology of evolving trined FAM networks, resulting in GFAM. This methodology ws lso pplied successfully for the evolution of EAM networks, resulting in GEAM. In chpter 3 we experimented with number of dtbses tht helped us identify good defult prmeter settings for the evolution of FAM The sme prmeters nd settings used in chpter 3 for the evolution of FAM networks (GFAM) were lso used for the evolution of EAM networks (GEAM). Our experiments with GEAM indicte tht GEAM is superior to number of other ART techniques (sseam, sseam, ssgam, sfe micro-artmap) tht hve been introduced into the literture to ddress the ctegory prolifertion problem in EAM. More specificlly, GEAM gve better generliztion performnce (in lmost ll problems tested) nd smller size network (in ll problems tested), compred to these other ART techniques. Wht is lso worth mentioning is tht GEAM outperformed these other ART techniques by requiring only frction of the computtions needed by these other networks. 4.2 Genetic Gussin ARTMAP (GGAM) GGAM (Genetic Gussin ARTMAP ) is n evolved GAM network tht is produced by pplying, repetedly, genetic opertors on n initil popultion of trined GAM networks. GGAM uses tournment selection with elitism, s well s genetic opertors, including crossover nd muttion. In ddition, GGAM uses two specil opertors, Ct dd nd Ct del. To better understnd how GGAM is designed we resort to step-by-step description of this design. Plese refer to the terminology introduced in Appendix A before you dwell in the ste-by-step description of EAM. The design of GGAM cn be rticulted through sequence of steps, defined succinctly below, nd explined in detil lter. 79

95 Step 1: Initilize Pop size number of GAM networks, ech one of them operting with different vlue for the bseline vigilnce prmeter ρ, nd possibly different orders of pttern presenttion. Step 2: Trin ech one of the mximum number of itertions ( Gen ). Step 3: Convert the Pop size initilized GAM networks, using the trining set for mx Pop size trined GAM networks into chromosomes. Crop ll chromosomes so tht no one-point ctegories exist. Step 4: Evolve the chromosomes of the current genertion by executing the following substeps: Sub-Step 4: Clculte fitness for ll chromosomes of the current genertion. Sub-Step 4b: Initilize n empty genertion (referred to s temporry genertion). Sub-Step 4c: Move the best ( NC temporry genertion. best ) chromosomes from the current genertion to the Sub-Step 4d: Select chromosomes for crossover from the current genertion nd thus further populte the temporry genertion. Sub-Step 4e: With probbility P( Ctdd ) pply the Ctdd opertor on every individul generted in sub-step 4d. Sub-Step 4f: With probbility P ( Ctdel ) pply the Ct del opertor on every individul generted in sub-step 4e. Sub-Step 4g: With probbility P (Mut) pply the muttion opertor on every individul generted in sub-step 4f. Sub-Step 4h: Replce the current genertion with the members of the temporry genertion Step 5: If evolution hs reched the mximum number Genmx of itertions, then clculte the performnce of the best-fitness GAM network on the test set nd report clssifiction 80

96 ccurcy nd number of ctegories tht this Best-Fitness GAM network possesses. If the mximum number of itertions hs not been reched yet, go to step 4 to evolve one more popultion of chromosomes Ech one of the forementioned steps of the lgorithm is now described in more detil, s needed. Step 1 (More Detils): The lgorithm strts by trining Pop GAM networks, ech one of them trined with different vlue of the bseline vigilnce prmeter ρ. In prticulr, we size first define ρ inc mx min ρ ρ = Pop 1 size, nd then the bseline vigilnce prmeter of every network is inc determined by the eqution ρ min + i * ρ, where } i { 0, 1,..., Popsize 1. In our experiments with GFAM we chose ρ min = 0. 1, nd ρ mx = Menwhile, GGAM llows the user to chnge the order of pttern presenttion utomticlly nd rndomly. Step 2 (More Detils): We ssume tht the reder is fmilir of how trining GAM networks is ccomplished, nd thus the detils here re omitted. Step 3 (More Detils): Once the Pop size networks re trined they need to be converted to chromosomes, so tht they cn be mnipulted by the genetic opertors. GGAM uses mix of rel numbers representtion to encode the networks. Ech GAM chromosome consists of two levels, level 1 contining ll the ctegories of the GAM network, nd level 2 contining the men, the stndrd devition nd the number of encoded nodes (during trining), s well s the lbel of tht ctegory (see Figure 4-3). We denote the ctegory of trined GAM network with index p 1 p Pop ) by w ( p), where w = µ, σ, n ), nd the lbel of ( size ( this ctegory by l ( p) for 1 N ( p). In this step we re lso eliminting single-point ctegories in the trined GAM networks, referred to s cropping the chromosomes. Since our 81

97 ultimte obective is to design GAM network tht reduces the network size nd improves generliztion we re discourging t this stge the cretion of single-point ctegories. w ( ) w ( ) w ( p) ( p) 1 p 2 p w N µ ( p) σ ( p) n ( p) l ( p) Figure 4-3: GGAM Chromosome Structure Step 4 (More Detils): In this step the GFAM pplies GA to the popultion of trined FAMs. Sub-step 4 (More Detils): Clculte the fitness of ech chromosome (trined GAM). This is ccomplished by feeding into ech trined GAM the vlidtion set nd by clculting the percentge of correct clssifiction exhibited by ech one of these trined GAM networks. In prticulr, if PCC( p) designtes the percentge of correct clssifiction, exhibited by the p-th GAM, nd this GAM network possesses N ( p) nodes in its ctegory representtion lyer, then its fitness function vlue is defined by: ( Ctmx N ( p)) PCC ( p) Fit( p) = 100 PCC( p) + ε Ct N ( p) min where, Ctmin nd Ctmx re the minimum nd mximum number of ctegories tht FAM network is llowed to hve during the evolutionry process ( Ct min is chosen equl to 1, or equl to the number of clsses in the clssifiction problem under considertion, while Ctmx is chosen to be reltively lrge number for the clssifiction problem t hnd). The constnt ε in the denomintor of the bove eqution is smll positive constnt nd it is needed to mke sure tht the denomintor would not be zero in the cse when N ( p) = Ct nd PCC ( p) = 100. min Sub-step 4b (More Detils): Obvious, no further explntions re needed. 82

98 Sub-step 4c (More Detils): The lgorithm serches for the best the current genertion nd copies them to the temporry genertion. NCbest chromosomes from Sub-step 4d (More Detils): The remining Pop NC chromosomes in the temporry size best genertion re creted by crossing over two prents from the current genertion. The prents re chosen using the deterministic tournment selection method, s follows: Rndomly select two groups of four chromosomes ech from the current genertion, nd use s prent from ech group the chromosome with the best fitness vlue in the group. If it hppens tht from both groups the sme chromosome is chosen then we choose from one of the groups the chromosome with the second best fitness vlue. If two prents with indices p, p re crossed over two rndom numbers n, n re generted from the index sets { 1, 2,..., ( p)} nd { 1, 2,..., N ( p )}, respectively. Then, ll the ctegories with index greter thn index n in chromosome with index p nd ll the ctegories with index less thn index n in the ctegory with index p re moved into n empty chromosome within the temporry genertion. Notice tht crossover is done on level 1 of the chromosome. This opertion is pictorilly illustrted in the Figure 3-2. Sub-step 4e (More Detils): The opertor Ctdd N dds new ctegory to every chromosome creted in step 4d with probbility P Ct ). The new ctegory hs men vector µ, ( dd stndrd devition vector σ, probbility (number of encoded ptterns) n, nd lbel l tht re rndomly generted s follows: For every dimension of the input feture spce ( M dimensions totl) we generte rndom number uniformly distributed in the intervl [0, 1]; ech one of these numbers is chosen to be one of the components of µ. In similr fshion, we choose the components of the stndrd devition vectorσ, whoever σ s vlues re chosen in the intervl [0.1,0.9]. Furthermore, n is chosen to be positive rel rndom number, uniformly distributed, while the lbel of this newly creted ctegory is chosen 83

99 rndomly mongst the N b ctegories of the pttern clssifiction tsk under considertion. A chromosome does not dd ctegory if the ddition of this ctegory results in number of ctegories for this chromosome tht exceeds the designted mximum number of ctegoriesct. mx Sub-step 4f (More Detils): The opertor Ct del deletes one of the ctegories of every chromosome creted in step 4e with probbility P Ct ). A chromosome does not delete ( del ctegory if the deletion of this ctegory results in the number of ctegories for this chromosome to fll below the designted minimum number of ctegoriesct. Sub-Step 4g (More Detils): In GGAM, every chromosome creted by step 4f gets mutted s follows: with probbility P (mut) every ctegory is mutted. If ctegory is chosen, its men vector µ or stndrd devition vectorσ is selected rndomly (50% probbility). Then every component of this selected vector gets mutted by dding to it smll number. This number is drwn from Gussin distribution with men 0 nd stndrd devition If the component of the chosen vector becomes smller thn 0 or greter thn 1 (fter muttion), it is set bck to 0 or 1, respectively. Also the probbility of the ctegory n gets mutted by dding smll number drwn from Gussin distribution to the selected item with the sme rules s bove. Notice tht muttion is pplied on level 2 of the chromosome structure, but the lbel of the chromosome is not mutted (the reson being tht our initil GA popultion consists of trined GAMs, nd consequently we hve lot of confidence in the lbels of the ctegories tht these trined GAMs hve discovered through the GAM trining process). Step 5 (More Detils): Obvious, no more detils re needed. min GGAM Experiments nd Results We used the sme defult set of prmeters used for GFAM to run ll the experiments of GGAM nd the results were very good. Hence, in GGAM s cse we voided the 84

100 experimenttion (pplied to GFAM) to choose good defult vlues for the GA prmeters. Hence, GGAM is produced by first initilizing popultion of 20 trined GAM networks (they were trined with different vlues of the bseline vigilnce prmeter nd different orders of trining pttern presenttions), nd by evolving them for 500 genertions. In min mx prticulr, the GA prmeters used for the cretion of GGAM were: ρ = 0.1, ρ = 0.75, β =0.1, Pop size = 20, Gen mx = 500, NC best = 3, Ct min = 1, mx Ct = 300, P Ct ) ( dd =0.1, P Ct ) =0.1, P (mut) = 5/N. GGAM is the GAM network tht ttins the highest ( del vlue of the fitness function t genertion 500 of the evolutionry process GGAM Performnce In this section we re reporting the performnce of GGAM on ech one of the dtbse problems, described in Section Similr to tht of GFAM, the performnce of GGAM is ssessed by reporting the size of GGAM nd the ccurcy ttined by GGAM on the test set. The results re reported in Tble 4-2. In Tble 4-2, the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in section nd summrized in Tble 3-3. Columns 3 nd 4 contin the size nd ccurcy of the GGAM network for the designted dtbse. The performnce of GGAM, s it is evidenced by the results in Tble 4-2, is verified by some obvious observtions. For instnce, GGAM s performnce on dtbses 1-12 (Gussin dtsets of known mount of overlp) is nerly optiml; for exmple the best performnce on the G6c-40 problem (6 clss Gussin dtset of 40% overlp) is clssifier with 6 ctegories nd 60% correct clssifiction, nd GGAM is clssifier with 6 ctegories nd 59.43% of correct clssifiction. Similrly, in the CINS problem the optiml clssifier would require 2 ctegories nd ttin 100% correct clssifiction; GGAM is 2 ctegory clssifier exhibiting 99.77% of correct clssifiction. Finlly, two of the rel problems reported here, 85

101 MOD-IRIS nd PAGE, lso gve very good results 94.83% nd 95.02% of correct clssifiction respectively, by creting two ctegories only Performnce Comprisons of GGAM nd other ART Networks As it ws the cse with GFAM, we compred GGAM s performnce with the performnce of the following networks: sseam, sseam, ssgam, nd sfe micro- ARTMAP. The comprisons of GGAM nd the forementioned ART networks re depicted in Tble 4-2, where the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in erlier sections. Columns 3-10 of Tble 4-2 contin the performnce of the designted ART networks. The performnce reported includes the ccurcy of the best ART network on the test set. The performnce lso includes the number of ctegories creted of the designted ART network. The reported numbers of ccurcy nd size of the best network correspond to the ART network tht ttined the highest vlue of the fitness function (this vlue ws computed bsed on the ccurcy of the ART network on the cross-vlidtion set, nd on the size of the ART network). Note, tht for networks, other thn GGAM, the best ART network ws determined fter extensive experimenttion with the ART network s prmeter vlues (e.g., in ssfam the best network ws determined fter trining ssfam networks with different vlues of the choice prmeter, vigilnce prmeter, order of pttern presenttion, nd mount of mixture of lbels llowed within ctegory; totl of 20,000 ssfam networks were trined nd their performnce ws exmined). On the other hnd, the performnce of the GGAM is the one clculted fter the evolution of 20 GAM trined networks for 500 genertions with GA vlues s indicted in Section According to the results in Tble 4-2, in ll instnces (except minor exceptions) the ccurcy of GGAM (generliztion performnce) is higher thn the ccurcy of the other ART network (where ART is sseam, sseam, ssgam or sfe micro-artmap). According 86

102 to the results in Tble 4-2, in ll instnces (with no exceptions) the size of GGAM is smller thn the size of the other ART network (where ART is sseam, sseam, ssgam or sfe micro-artmap), sometimes even by fctor of 12. For exmple, the generliztion performnce of GGAM cn be s 15% better thn the generliztion performnce of ssfam, while its size cn be by fctor of 4 times smller thn the size of ssfam. Also, the generliztion performnce of GGAM cn be s 14% better thn the generliztion performnce of sseam, while its size cn be by fctor of 4 times smller thn the size of sseam. Furthermore, the generliztion performnce of GGAM cn be s 8% better thn the generliztion performnce of ssgam, while its size cn be by fctor of 12 times smller thn the size of ssgam. Finlly, the generliztion performnce of GGAM cn be s 10% better thn the generliztion performnce of sfe micro-artmap, while its size cn be by fctor of 4 times smller thn the size of sfe micro-artmap. The comprison results between GGAM nd the other ART networks re lso pictorilly depicted in figures 4-4 to 4-4d. In ech one of these figures we re showing the ccurcy of GGAM, nd tht of one other network (e.g., sseam). In the sme figure we re lso showing the size of the GGAM nd tht of one other ART network. This wy the one-toone comprison of the GGAM nd the other ART networks cn be quickly ssessed. Wht is most worth pointing out is tht the better performnce of GGAM is ttined with reduced computtions compred with the computtions needed by the lternte methods (sseam, sseam, ssgam, sfe micro-artmap). Specificlly, the performnce ttined by sseam, sseam, ssgam nd the sfe micro-artmap required trining these networks for lrge number of network prmeter settings (t lest 20,000 experiments) nd then choosing the network tht chieved the higher vlue for the fitness function tht we introduced erlier in Section Of course, one cn rgue tht such n extensive experimenttion with these networks might not be needed, especilly if one is fmilir with the functionlity of these 87

103 networks nd chooses to experiment only with limited set of network prmeter vlues. However, the prctitioner in the field might lck the expertise to crefully choose the network prmeters to experiment with, nd consequently might need to experiment extensively to come up with good network. The computtionl complexity of GGAM is given by similr equtions s the GFAM computtionl complexity, clculted in Appendix B. The comprison between the computtionl complexity GFAM (nd GGAM) nd the rest of the ART networks is bsed under the ssumption tht extensive prmeter experimenttion with the network prmeters of sseam, sseam, ssgam or sfe micro-artmap is needed to obtin good performing sseam, sseam, ssgam or sfe micro-artmap network, respectively. 88

104 Tble 4-2: Accurcy nd size results chieved by GGAM nd other ART networks. Note tht:sfe uam: Sfe microartmap; FAM: Fuzzy ARTMAP; EAM: Ellipsoidl ARTMAP; GAM: Gussin ARTMAP; ss*: semi-supervised version Dtbse Nme GGAM Sfe µam ssfam sseam ssgam 1 G2c G2c G2c G2c G4c G4c G4c G4c G6c G6c G6c G6c Ci/Sq Sq/Sq Sq Ci/Sq Ci/Sq/0.3: Ci/Sq Ci/Sq/20:30: SqWN Ci/SqWN MOD-IRIS ABALONE PAGE

105 Figure 4-4: Performnce nd Size comprison of GGAM vs ssfam Figure 4-4b: Performnce nd Size comprison of GGAM vs sseam 90

106 Figure 4-4c: Performnce nd Size comprison of GGAM vs ssgam Figure 4-4d: Performnce nd Size comprison of GGAM vs microartmap 91

107 4.2.2 Summry/Conclusions In this section, we hve introduced, yet nother, method of solving the ctegory prolifertion problem in ART. This method relies on evolving popultion of trined ART networks, nd more specificlly Gussin ARTMAP (GAM) neurl networks. The evolution of trined GAMs cretes n ART network, referred to s GGAM. In chpter 3 we defined methodology of evolving trined FAM networks, resulting in GFAM. This methodology ws lso pplied successfully for the evolution of GAM networks, resulting in GGAM. In chpter 3 we experimented with number of dtbses tht helped us identify good defult prmeter settings for the evolution of FAM The sme prmeters nd settings used in chpter x for the evolution of FAM networks (GFAM) were lso used for the evolution of GAM networks (GGAM). Our experiments with GGAM indicte tht GGAM is superior to number of other ART techniques (sseam, sseam, ssgam, sfe micro-artmap) tht hve been introduced into the literture to ddress the ctegory prolifertion problem in GAM. More specificlly, GGAM gve better generliztion performnce (in lmost ll problems tested) nd smller size network (in ll problems tested), compred to these other ART techniques. Wht is lso worth mentioning is tht GGAM outperformed these other ART techniques by requiring only frction of the computtions needed by these other networks. 92

108 5. UNIVERSAL ART (UART) In ll our previous geneticlly engineered ART rchitectures we only evolved one type of ART rchitectures, such s FAM, or EAM or GAM. It will be dvntgeous for some clssifiction problem to be ble to evolve mixture of ART rchitectures, such s FAM nd EAM, EAM nd GAM, FAM nd GAM, or FAM, EAM nd GAM. The evolution of mixture of ART rchitectures leds us to geneticlly designed ART network tht we cll Universl ART (UART). The motivtion behind the cretion of UART could be presented by number of exmples. For some dtsets one of the FAM, EAM, GAM rchitectures will produce the best results, while for nother dtset nother rchitecture will do the best. Furthermore, it could lso be the cse tht GAM is better t describing the input ptterns in portion of the input spce, while FAM might be ble to do better ob t nother portion of the input spce (nd the sme dtset). These observtions gve birth to the ide of UART tht does not - priori determine which of is the best ctegory structure (hyper-rectngle, hyper-ellisoid, Gussin) tht could best represent the dt in vrious portions of the input pttern spce. It is resonble to think tht problem spce would not be suitble to be covered by only one of the geometricl shpes mentioned bove, nd this could explin the extr nodes creted nd the lck of ccurcy ttined by specific ART rchitecture. The figure below supports our clim. In figure 5-1, there re three different clssifiction problems represented in 2-dimentionl spce. 93

109 Figure 5-1: These figures show wht hppens when using unsuitble clssifiers for certin problem. In figure 5-1 (), we see 2-clss problem tht is represented by two rectngles, for which the geometricl shpes creted by FAM would be the best to solve, nd would be wsteful to use EAM to solve it. In the other hnd, figure 5-1 (b) shows n exmple of 2-clss problem for which the geometricl shpes creted by EAM would be the best to solve, nd it would be wsteful to use FAM to solve it, figure 5-1 (c) shows similr scenrio when using GAM to clssify the two rectngle problem, ( GAM ctegories re not relly circles, but depicted here s such to show the point). Furthermore, there might be clssifiction exmples where one of the ART clssifies will not be the best choices to use in every portion of the input spce (see Figure 5-2). clss 2 clss 1 c4 clss3 Figure 5-2: clssifiction problem where the boundries cn t be optimlly covered by FAM, GAM or EAM s ctegories. For this problem geneticlly engineered ART rchitecture, such s UART, might perform well becuse it hs the bility to crete three different geometricl structures in the input spce (see Figure 5-3). It is expected tht if the input spce is covered by the minimum 94

110 possible number of correct structures by n ART neurl network the generliztion performnce of the network on unseen dt might be improved too. clss 2 clss 1 c4 clss3 Figure 5-3: Using UART to solve the problem in figure 5-2, notice tht prts of the problem spce re not covered, but becuse UART encourges smller size, it might scrifice little ccurcy to get optiml size. 5.1 UART Design UART cn operte in three distinct phses: the trining phse, the geometry selection phse (or Genetic Phse) nd the performnce phse. In the trining phse of UART, the user defines the number of the networks the system should generte, referred to s The system then cretes Pop size. Pop size trined ART networks, hlf of them FAM nd the other hlf EAM networks. These FAM nd EAM networks re generted by using different vlues of the bseline vigilnce prmeter nd orders of trining pttern presenttions, s we did for GFAM nd GEAM. For the trining of this initil popultion of Pop size ART networks list 1 1 r r PT PT of input ptterns/output lbels pirs, (i.e. {( I, O( I )),...,( I, O( I )),...,( I, O( I ))}), is repetedly presented to the FAM/EAM network until it lerns the required mpping. Trining is over when user defined mximum list presenttion number is reched. After creting FAM or EAM trined network UART converts it into chromosome nd sves it in the FAM/EAM network continer of the UART rchitecture. A pictoril illustrtion of the UART rchitecture in its trining phse, consisting of two independently operting FAM nd EAM rchitectures nd the ssocited FAM/EAM chromosome continer is shown in Figure

111 In the geometry selection phse, the gol is to find n ART network (UART network) which contins the best types nd smllest number of ART ctegories (rectngles or ellipsoids, or combintion of the two) tht chieves good generliztion. This UART network is found by strting from n initil popultion of Pop size UART networks nd by pplying the GA lgorithm to this initil popultion, in the sme wy tht we pplied to produce GFAM nd GEAM (see erlier sections). The distinct nd importnt difference between the initil popultion of FAMs, nd EAMs, tht we strted with then nd the initil popultion of UARTs tht we strt with here is tht the initil popultion of FAMs nd EAMs consisted of chromosomes, ech one of which contined ctegories of the sme geometricl structure, such s rectngles or ellipsoids. Now, ech chromosome in the initil popultion of UART strts form n initil popultion of ART networks, whose chromosome contins mixture of rectngles nd ellipsoids. Ech member of this initil popultion chose the rectngles nd ctegories contined in its chromosome rndomly from the popultion of rectngles nd ctegories included in the UART continer from UART s trining phse. It is importnt, to lso mention tht in the geometry selection phse (or genetic phse) UART is clled upon to clculte its output for ech input pttern in the vlidtion dt, i.e., UART is clled upon to operte in the performnce mode. The steps tht UART is going through to produce n output lbel for n input pttern presented t its input during UART performnce mode re included below. The UART rchitecture, when it opertes in the performnce mode, is different thn the UART rchitecture when it opertes in the trining mode (see Figure 5-5). The UART rchitecture, in its performnce phse, consists of three min lyers. These b re: the input lyer ( U 1 ), the ctegory representtion lyer ( U 2 ), nd the output lyer ( U 2 ). The input lyer of UART hs ll the nodes in 2 M nodes, nodes numbered 1 through M re connected to U 2 lyer tht represent FAM ctegory, while only the nodes 1 to connected to those ctegories in the tht EAM does not require complement encoding). M ) re U 2 lyer tht represent n EAM ctegory (remember 96

112 of During the performnce phse, is fed to U 1 nd its complement coded version, I, occupies the U 1, so tht occupies the first 2 M nodes of U 1. Lyer M nodes UART represents ll the ctegories tht the UART network possesses, nd hence the nme ctegory representtion lyer. This lyer could hve ll the nodes s FAM ctegories, s U 2 in EAM ctegories or s mixture of both (these nodes represent ctegories tht were rndomly chosen from the mixture of FAM/EAM ctegories stored in the FAM/EAM continer of UART s rchitecture, t the end of its trining phse). The nodes in the ctegory representtion lyer re connected to the nodes in the Finlly, the output lyer (lyer U 1 lyer s shown in Figure 5-5. b U 2 ) is the lyer tht produces the outputs of the network. Every node in the output lyer of UART represents one of the lbels of the pttern recognition tsk. The index k ( k N ) designtes generic node inu b 2 ; N represents the 1 b highest index needed to represent ll the lbels of the pttern clssifiction tsk t hnd. The UART s performnce steps re delineted (below) nd then the genetic phse of UART is emphsized in more detil, s it ws done for GFAM nd GEAM. b Performnce Phse of UART This phse is similr to those of FAM or n EAM. The process cn be summrized in the following steps: 1. Present n input pttern (from the vlidtion or test set) to the UART network 2. Clculte the CCF function, corresponding to this input pttern, for ll the nodes in the U 2 lyer ccording to the following equtions: M s( w ) dis( I, w ). For FAM ctegory: T ( I) = β + M s( w ) b. For n EAM ctegory : T D s( w ) dis( I, w ) I) = β + D s( w ) ( Find the node J tht hs the mximum CCF 97

113 3. Check the lbel of this node J. This will be the predicted lbel of the UART network for this input pttern. 4. If more ptterns re still in the list (vlidtion or test set) present the next input pttern to the UART network. Otherwise, the performnce phse is completed. Through this sequence of four steps UART is ble to produce predictions for ll the input ptterns of the vlidtion set (during UART s genetic phse) or for ll the input ptterns of the test set (during UART s performnce phse for the UART chosen s hving the best fitness vlue t the lst genertion of the genetic phse). b F 2 b W b W F 2 N N w w F 1 M M + 1 c I = (, ) 2M I = M Figure 5-4: A simple UART structurl digrm during the trining phse 98

114 b U 2 b W N U 2 w U 1 M M + 1 c I = (, ) 2M Figure 5-5: A simple UART structurl digrm during the performnce phse Geometry Selection Phse (Genetic Phse) of UART As it ws the cse for GFAM nd GEAM we strt with n initil popultion of UARTs tht we evolve. The GA prmeters used for the evolution of FAM nd EAM networks re lso used for UART networks. This process follows the following steps: Step 1: A chromosome in the initil popultion of UARTs contins FAM nd EAM ctegories rndomly chosen from the list of FAM nd EAM ctegories, creted during the Pop size trining phse of UART nd contined in the FAM/EAM continer mixture module. At this stge ll one-point ctegories re cropped. Step 2: Evolve the chromosomes of the current genertion by executing the following substeps: Sub-Step 2: Clculte fitness for ll chromosomes of the current genertion. Sub-Step 2b: Initilize n empty genertion (referred to s temporry genertion). 99

115 Sub-Step 2c: Move the best ( NC temporry genertion. best ) chromosomes from the current genertion to the Sub-Step 2d: Select chromosomes for crossover from the current genertion nd thus further populte the temporry genertion. Sub-Step 2e: With probbility P( Ctdd ) pply the Ctdd opertor on every individul generted in sub-step 2d. Sub-Step 4f: With probbility P ( Ct del ) pply the Ct del opertor on every individul generted in sub-step 2e. Sub-Step 4g: With probbility P (Mut) pply the muttion opertor on every individul generted in sub-step 2f. Sub-Step 2h: Replce the current genertion with the members of the temporry genertion Step 3: If evolution hs reched the mximum number Genmx of itertions, then clculte the performnce of the best-fitness UART network on the test set nd report clssifiction ccurcy nd number of ctegories tht this Best-Fitness FAM network possesses. If the mximum number of itertions hs not been reched yet, go to step 2 to evolve one more popultion of chromosomes Ech one of the forementioned steps of the lgorithm is now described in more detil, s needed. Step 1 (More Detils): We use rel number representtion to encode the UART networks. Ech UART chromosome consists of two levels, level 1 contining ll the ctegories of the UART network, nd level 2 contining the following components: two generic vectors u nd v, n integer l for the lbel, double µ for the xis rtio, nd double r for the rdius nd n integer t for the type of the ctegory. In the cse of FAM ctegory, the generic vectors re equl to vectors u nd v re used to represent the end points of the hyper-rectngle, l 100

116 represent the lbel, r nd µ re not used nd t is equl to 1, otherwise (i.e. n EAM ctegory) the generic vectors re equl to center m nd the direction d, l represent the lbel, r encodes the rdius, µ encodes the xis rtio nd t is equl to 0. (see Figure 5-6). We denote the ctegory of trined UART network with index p 1 p Pop ) by w ( p), where ( size, FAM c, EAM w ( p) = w ( p) = ( u ( p), ( v ( p) ) or w ( p) = w ( p) = ( m, d, r ) nd the lbel of this ctegory by l ( p) for 1 N ( p). In this step we re lso eliminting single-point ctegories in the trined FAM networks, referred to s cropping the chromosomes, this is done by deleting FAM ctegories with size zero nd EAM ctegories with rdius equls to zero. Since our ultimte obective is to design FAM network tht reduces the network size nd improves generliztion we re discourging t this stge the cretion of single-point ctegories. We lso rndomly redistribute the ctegories mongst the chromosomes, this step is necessry to eliminte ny dvntges of FAM network over n EAM network or vice vers. This is done by putting ll the ctegories of ll chromosomes in one group nd then ressigning ech chromosome rndomly chosen set of ctegories. Chromosome p w ( ) w ( ) w ( p) ( p) 1 p 2 p w N Level 1 u = m ( p) v = d ( p) r = r ( p) µ = µ ( p) l = l ( p) t = 0 u = u ( p) v = ( p) v OR r = NA µ = NA l = l ( p) t = 1 Level 2 Figure 5-6: GFAM chromosome structure Step 2 (More Detils): In this step the UART pplies the GA lgorithm on the popultion of UARTs. Sub-step 4 (More Detils): Use the steps described in the performnce phse below, clculte the fitness of ech chromosome. This is ccomplished by converting ech chromosome into UART network nd then feeding into it the vlidtion set nd by 101

117 clculting the percentge of correct clssifiction exhibited by ech one of these UART networks. In prticulr, if PCC( p) designtes the percentge of correct clssifiction, exhibited by the p-th UART, nd this UART network possesses N ( p) nodes in its ctegory representtion lyer, then its fitness function vlue is defined by: ( Ctmx N ( p)) PCC ( p) Fit( p) = 100 PCC( p) + ε Ct N ( p) min where, Ct Ct min nd mx re the minimum nd mximum number of ctegories tht FAM network is llowed to hve during the evolutionry process ( Ct min is chosen equl to 1, or equl to the number of clsses in the clssifiction problem under considertion, while Ctmx is chosen to be reltively lrge number for the clssifiction problem t hnd). The constnt ε in the denomintor of the bove eqution is smll positive constnt nd it is needed to mke sure tht the denomintor would not be zero in the cse when N ( p) = Ct nd PCC ( p) = 100. min Sub-step 4b (More Detils): Obvious, no further explntions re needed. Sub-step 4c (More Detils): The lgorithm serches for the best the current genertion nd copies them to the temporry genertion. NCbest chromosomes from Sub-step 4d (More Detils): The remining Pop NC chromosomes in the temporry size best genertion re creted by crossing over two prents from the current genertion. The prents re chosen using the deterministic tournment selection method, s follows: Rndomly select two groups of four chromosomes ech from the current genertion, nd use s prent from ech group the chromosome with the best fitness vlue in the group. If it hppens tht from both groups the sme chromosome is chosen then we choose from one of the groups the chromosome with the second best fitness vlue. If two prents with indices p, p re crossed over two rndom numbers n, n re generted from the index sets { 1, 2,..., ( p)} nd N 102

118 { 1, 2,..., N ( p )}, respectively. Then, ll the ctegories with index greter thn index n in chromosome with index p nd ll the ctegories with index less thn index n in the ctegory with index p re moved into n empty chromosome within the temporry genertion. Notice tht crossover is done on level 1 of the chromosome. This opertion is pictorilly illustrted in the Figure 5-7. n p w ( ) w ( ) w ( ) w ( ) w ( ) 1 p 2 p 3 p 4 p 5 p n' w ( ) w ( ) w ( ') w ( ') 1 p 2 p 4 p 5 p p w ( ') w ( ') w ( ') w ( ') w ( ') 1 p 2 p 3 p 4 p 5 p Sub-step 4e (More Detils): The opertor Figure 5-7: Crossover implementtion Ctdd dds new ctegory to every chromosome creted in step 4d with probbility P Ct ). With 0.5 probbility the new ctegory is ( dd chosen to be FAM or n EAM ctegory. If the chosen ctegory is FAM ctegory, the ctegory gets lower nd upper endpoints u, v tht re rndomly generted s follows: For every dimension of the input feture spce ( M dimensions totl) we generte two rndom numbers uniformly distributed in the intervl [0, 1]; the lrgest of these numbers is ssocited with the v coordinte long this dimension, while the smllest of the two rndom numbers is ssocited with the u coordinte long this dimension. If the chosen ctegory is n EAM ctegory then the ctegory hs center m, direction vector d, rdius r, nd n xis rtion µ, tht re rndomly generted s follows: For every dimension of the input feture spce ( M dimensions totl) we generte rndom number uniformly distributed in the intervl [0, 1]; we ssign these vlues to the individul components of m, the direction vector d is ssigned zeros, the xis rtio µ nd the rdius r re given positive rel rndom 103

119 numbers in the intervl [0,1]. The lbel of this newly creted ctegory is chosen rndomly mongst the N b ctegories of the pttern clssifiction tsk under considertion. A chromosome does not dd ctegory if the ddition of this ctegory results in number of ctegories for this chromosome tht exceeds the designted mximum number of ctegoriesct. mx Sub-step 4f (More Detils): The opertor Ct del deletes one of the ctegories of every chromosome creted in step 4e with probbility P Ct ). A chromosome does not delete ( del ctegory if the deletion of this ctegory results in the number of ctegories for this chromosome to fll below the designted minimum number of ctegoriesct. Sub-Step 4g (More Detils): In UART, every chromosome creted by step 4f gets mutted s follows: with probbility P (mut) every ctegory is mutted. If FAM ctegory is chosen, its u or v endpoints is selected rndomly (50% probbility), nd then every component of this selected vector gets mutted by dding to it smll number. This number is drwn from Gussin distribution with men 0 nd stndrd devition If the component of the chosen vector becomes smller thn 0 or greter thn 1 (fter muttion), it is set bck to 0 or 1 respectively. If n EAM ctegory is chosen its center m or direction d is selected rndomly (50% probbility), then every component of this selected vector gets mutted by dding to it smll number. This number is drwn from Gussin distribution with men 0 nd stndrd devition If the component of the chosen vector becomes smller thn 0 or greter thn 1 (fter muttion), it is set bck to 0 or 1, respectively. Furthermore, its xis rtio µ or rdius r is selected (50% probbility), we lso dd smll number drwn from Gussin distribution to the selected item with the sme rules s bove. Notice tht muttion is pplied on level 2 of the chromosome structure, but the lbel of the chromosome is not mutted (the reson being tht our initil GA popultion consists of trined FAMs nd min 104

120 EAMs, nd consequently we hve lot of confidence in the lbels of the ctegories tht these trined EAMs hve discovered through the EAM trining process). Step 5 (More Detils): Obvious, no more detils re needed. 5.2 Results of UART We used the sme defult set of prmeters used for GFAM to run ll the experiments of UART with one modifiction, nd the results were excellent. Hence, in UART s cse we voided the experimenttion (pplied to GFAM) to choose good defult vlues for the GA prmeters. Hence, UART is produced by first initilizing popultion of 20 trined FAM nd EAM networks (they were trined with different vlues of the bseline vigilnce prmeter nd different orders of trining pttern presenttions), nd by evolving them for min 500 genertions. In prticulr, the GA prmeters used for the cretion of UART were: ρ = 0.1, mx ρ = 0.75, β =0.1, Pop size = 20, Gen mx = 500, NC best = 3, Ct min = 1, Ct mx = 300, P ( Ctdd ) =0.1, P ( Ct del ) =0.1, P (mut) = 1/N. UART is the network tht ttins the highest vlue of the fitness function t genertion 500 of the evolutionry process UART Performnce In this section we re reporting the performnce of UART on ech one of the dtbse problems, described in Section Similr to tht of GFAM, the performnce of UART is ssessed by reporting the size nd the ccurcy ttined by UART on the test set. The results re reported in Tble 5-1. In Tble 5-1, the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in section nd summrized in Tble 3-3. Columns 3 nd 4 contin the size nd ccurcy of the UART network for the designted dtbse. The performnce of UART, s it is evidenced by the results in Tble 5-1, is verified by some obvious observtions. For instnce, UART s performnce on dtbses 1-12 (Gussin dtsets of known mount of overlp) is 105

121 nerly optiml; for exmple the best performnce on the G6c-40 problem (6 clss Gussin dtset of 40% overlp) is clssifier with 6 ctegories nd 60% correct clssifiction, nd UART is clssifier with 6 ctegories nd 59.83% of correct clssifiction. Similrly, in the CINS problem the optiml clssifier would require 2 ctegories nd ttin 100% correct clssifiction; UART is 2 ctegory clssifier exhibiting 99.37% of correct clssifiction. Finlly, ll of the rel problems reported here, MOD-IRIS, ABALONE nd PAGE, lso gve very good results 94.88%, 64.1% nd 95.73% of correct clssifiction respectively, by creting 2,3 nd 3 ctegories respectively only Performnce Comprisons of UART nd other ART Networks As it ws the cse with GFAM, we compred UART s performnce with the performnce of the following networks: sseam, sseam, ssgam, nd sfe micro- ARTMAP. The comprisons of UART nd the forementioned ART networks re depicted in Tble 5-1, where the first column is the index of the dtbse tht we re experimenting with. The second column is the ctul dtbse nme, s reported in erlier sections. Columns 3-10 of Tble 5-1 contin the performnce of the designted ART networks. The performnce reported includes the ccurcy of the best ART network on the test set. The performnce lso includes the number of ctegories creted of the designted ART network. The reported numbers of ccurcy nd size of the best network correspond to the ART network tht ttined the highest vlue of the fitness function (this vlue ws computed bsed on the ccurcy of the ART network on the cross-vlidtion set, nd on the size of the ART network). Note, tht for networks, other thn UART, the best ART network ws determined fter extensive experimenttion with the ART network s prmeter vlues (e.g., in ssfam the best network ws determined fter trining ssfam networks with different vlues of the choice prmeter, vigilnce prmeter, order of pttern presenttion, nd 106

122 mount of mixture of lbels llowed within ctegory; totl of 20,000 ssfam networks were trined nd their performnce ws exmined). On the other hnd, the performnce of the UART is the one clculted fter the evolution of 20 trined FAM nd EAM networks for 500 genertions with GA prmeters s indicted in Section 5.4. According to the results in Tble 5-1, in ll instnces (except minor exceptions) the ccurcy of GGAM (generliztion performnce) is higher thn the ccurcy of the other ART network (where ART is sseam, sseam, ssgam or sfe micro-artmap). According to the results in Tble 5-1, in ll instnces (with no exceptions) the size of GGAM is smller thn the size of the other ART networks (where ART is sseam, sseam, ssgam or sfe micro-artmap), sometimes even by fctor of 15. For exmple, the generliztion performnce of UART cn be s 12% better thn the generliztion performnce of ssfam, while its size cn be by fctor of 4 times smller thn the size of ssfam. Also, the generliztion performnce of UART cn be s 14% better thn the generliztion performnce of sseam, while its size cn be by fctor of 4 times smller thn the size of sseam. Furthermore, the generliztion performnce of UART cn be s 9% better thn the generliztion performnce of ssgam, while its size cn be by fctor of 17 times smller thn the size of ssgam. Finlly, the generliztion performnce of UART cn be s 9% better thn the generliztion performnce of sfe micro-artmap, while its size cn be by fctor of 4 times smller thn the size of sfe micro-artmap. The comprison results between UART nd the other ART networks re lso pictorilly depicted in figures 5-8 to 5-8d. In ech one of these figures we re showing the ccurcy of UART, nd tht of one other network (e.g., sseam). In the sme figure we re lso showing the size of the UART nd tht of one other ART network. This wy the one-toone comprison of the UART nd the other ART networks cn be quickly ssessed. 107

123 Wht is most worth pointing out is tht the better performnce of UART is ttined with reduced computtions compred with the computtions needed by the lternte methods (sseam, sseam, ssgam, sfe micro-artmap). Specificlly, the performnce ttined by sseam, sseam, ssgam nd the sfe micro-artmap required trining these networks for lrge number of network prmeter settings (t lest 20,000 experiments) nd then choosing the network tht chieved the higher vlue for the fitness function tht we introduced erlier in Section Of course, one cn rgue tht such n extensive experimenttion with these networks might not be needed, especilly if one is fmilir with the functionlity of these networks nd chooses to experiment only with limited set of network prmeter vlues. However, the prctitioner in the field might lck the expertise to crefully choose the network prmeters to experiment with, nd consequently might need to experiment extensively to come up with good network. The computtionl complexity of UART is given by similr equtions s the GFAM computtionl complexity, clculted in Appendix B. The comprison between the computtionl complexity GFAM (nd UART) nd the rest of the ART networks is bsed under the ssumption tht extensive prmeter experimenttion with the network prmeters of sseam, sseam, ssgam or sfe micro-artmap is needed to obtin good performing sseam, sseam, ssgam or sfe micro-artmap network, respectively. 108

124 Tble 5-1: UART performnce nd size compred to other ART rchitectures Dtbse Nme UART Sfe µam ssfam sseam ssgam 1 G2c G2c G2c G2c G4c G4c G4c G4c G6c G6c G6c G6c Ci/Sq Sq/Sq Sq Ci/Sq Ci/Sq/0.3: Ci/Sq Ci/Sq/20:30: SqWN Ci/SqWN MOD-IRIS ABALONE PAGE

125 Figure 5-8: Performnce nd Size comprison of UART vs ssfam Figure 5-8b: Performnce nd Size comprison of UART vs sseam 110

126 Figure 5-8c: Performnce nd Size comprison of UART vs ssgam Figure 5-8d: Performnce nd Size comprison of UART vs microartmap 111

127 5.2.3 Performnce Comprisons of UART nd other Genetic ART Networks In this section we re compring the size nd ccurcy of UART with the other three genetic ART modules nmely: GFAM, GEAM nd GGAM. Tble 5-2 shows the results of ll of the genetic modules on ll the dtbses presented in section , ( one to one comprison figures re lso dippected in figures 5-9 to 5-9c). It is cler tht we used the sme method to choose the best genetic ART in ll of the genetic modules. From quick look t tble 5-2, it is cler tht in the Gussin dtbses ll of the genetic ART modules performed very well with minor differences. On the structure within structure dtbses however, the performnce ws different. Tble 5-2 shows tht GFAM gve better ccurcy on dtbses 14, 15, 18 nd 21 while GEAM nd GGAM gve better ccurcy on dtbses 13, 16, 17, 19 nd 20. If we investigte further, we find tht GFAM gve better ccurcy when the problem didn t hve circle in it, on the other hnd, GEAM nd GGAM gve better results on those problems tht hve circles in them, this ws the reson behind the development of UART. UART performnce compres very well to the best of the three on ll problems, for exmple, UART gve 99.37% nd 99.3% for dtbses 16 nd 17 respectively with only 2 ctegories in ech cse, this performnce is very close to tht of GEAM nd GGAM on the sme dtbses of 99.99% with 2 ctegories, on the other hnd, UART gve 98.5% ccurcy on dtbse 15 with only 7 ctegories, while GFAM gve 97.2% with 7 ctegories. the bottom line is UART performed very well ll the dtbses described in

128 Tble 5-2: UART performnce nd size compred to other genetic ART rchitectures Dtbse Nme UART GFAM GEAM GGAM 1 G2c G2c G2c G2c G4c G4c G4c G4c G6c G6c G6c G6c Ci/Sq Sq/Sq Sq Ci/Sq Ci/Sq/0.3: Ci/Sq Ci/Sq/20:30: SqWN Ci/SqWN MOD-IRIS ABALONE PAGE

129 Figure 5-9: Performnce nd Size comprison of UART vs GFAM Figure 5-9b: Performnce nd Size comprison of UART vs GEAM 114

Figure 5-9c: Performnce nd Size comprison of UART vs GGAM 5.3 UART Summry UART is novel pproch of mixing different types of ART ctegories to obtin better coverge of the input spce.

130 Figure 5-9c: Performnce nd Size comprison of UART vs GGAM 5.3 UART Summry UART is novel pproch of mixing different types of ART ctegories to obtin better coverge of the input spce. UART used two types of ctegories, nmely: FAM ctegories nd EAM ctegories to enhnce its performnce. UART used genetic lgorithms to solve the ctegory prolifertion problem in ART. This method relies on evolving popultion of trined ART networks, nd more specificlly Ellipsoidl ARTMAP (EAM) nd Fuzzy ARTMAP (FAM) neurl networks. The evolution of trined FAM s nd EAM s cretes n ART network, referred to s UART. In chpter 3 we defined methodology of evolving trined FAM networks, resulting in GFAM. This methodology ws lso pplied successfully for the evolution of GAM networks, resulting in UART. In chpter 3 we experimented with number of dtbses tht 115

131 helped us identify good defult prmeter settings for the evolution of FAM The sme prmeters nd settings used in chpter 3 for the evolution of FAM networks (GFAM) were lso used for the evolution of FAM nd EAM networks (UART). Our experiments with UART indicte tht UART is superior to number of other ART techniques (sseam, sseam, ssgam, sfe micro-artmap) tht hve been introduced into the literture to ddress the ctegory prolifertion problem in ART. More specificlly, UART gve better generliztion performnce (in lmost ll problems tested) nd smller size network (in ll problems tested), compred to these other ART techniques. Wht is lso worth mentioning is tht UART outperformed these other ART techniques by requiring only frction of the computtions needed by these other networks. Bsed on simple comprison, UART lso outperformed the other genetic ART modules we introduced in chpter 3 nd 4 on mny dtbses nd gve lmost s good results on the rest of the dtbses. 116

132 6. ANALYSIS In UART we hve two types of ctegories tht re competing for the ttention of n input pttern. In Georgiopoulos, et l., very comprehensive nlysis ws presented tht explined the order ccording to which ctegories re chosen in Fuzzy ARTMAP. We repet the nlysis here for the cse when the ctegory in the ctegory representtion lyer of UART is either n ellipsoid (EAM ctegory) or rectngle (FAM ctegory). Actully, to simplify the nlysis we ssume tht the ellipsoid is circle (specil cse of n ellipsoid). We lso compre the computtionl complexity ssocited with GFAM nd the computtionl complexity ssocited with the other ART lgorithms tht we hve experimented with in the previous sections. This comprisons llows one to see nlyticlly tht the superior performnce of GFAM is ttined by reduced computtions compred with the computtions needed for the other ART rchitectures. 6.1 UART Order of Serch Anlysis During the trining phse of UART, the order ccording to which ctegories re ccessed in UART depends on the type of ctegories considered, the closeness of n input pttern to the ctegories, the size of the ctegories, nd the vlue of the choice prmeter β. Before we present number of theorems tht explin the order ccording to which ctegories re ccessed we present few ssumptions nd definitions, included below. Assumptions: Only circles re considered here which re specil cse of n ellipsoid. The D prmeter of EAM is equl to M (dimensionlity of the input pttens). Definitions The size of FAM ctegory R is size( R ) = R = v u. 6-1 The size of n EAM ctegory E is size ( E ) = r where r is the rdius of E 117

133 Theorem 1: if n input pttern I is presented to FAM ctegory E nd I lies inside both ctegories, then I will be represented by R nd n EAM ctegory E iff size( E ) < size ( R ) nd will be represented by R iff size( E ) > size ( R ) (i.e. the ctegory with the smller size). Figure 6-1: Cse # 1, pttern inside both FAM ctegory nd n EAM ctegory Theorem 1 Proof: Let us consider the first cse where I will be represented by size( E ) < size ( R ), this implies tht 2 r < R E iff Let s strt by clculting the CCF function of both ctegories where T E = D r mx{ r, I m c}, 6-3 D 2r + β where I m c here is the distnce between two points (since we re deling with circles) (i.e. And I m c = 1 µ I m 2 ), µ = 1 nd β is smll positive number. T F, old I w =, old w +β 6-4 Since the pttern is inside both ctegories then mx{ r, I m } = r,old,old nd I w = w c 118

134 And so T E = D r mx{ r, I m c} = D 2r + β M 2r M 2r + β 6-5 And T F, old I w = =, old w +β w, w, old old M R = + β M R +β 6-7 If E is to be chosen before R, then T E should be greter thn T F. But, M 2r M R > becuse M 2r + β M R +β M 2r > M R nd the function x x + β 6-8 is n incresing function of its rgument x; hence the result is proven. The proof of the reverse scenrio is stright forwrd, nd it is omitted. Theorem 2: if n input pttern I is presented to FAM ctegory E, nd I lies inside R but outside E, then I will be represented by R nd n EAM ctegory E iff d E < g(β ), where d E = dis( I, E ), ( b + β ) g( β ) = b, = M R, nd b = + β M 2r. Figure 6-2: Cse # 2, pttern inside FAM ctegory but outside n EAM ctegory 119

135 Theorem 2 Proof: Let s strt by clculting the CCF function of both ctegories s in 6-3 nd 6-4 respectively T E = D r mx{ r, I m c} D 2r + β And T F, old I w =., old w +β Since the pttern is inside,old,old I w = w Consequently, R nd outside E, then mx{ r, m c} I = I m c, nd T E = D r I m D 2r + β c = D r I m D 2r + β 2 = M 2r de M 2r + β 6-9 And T F, old I w = =, old w +β w, w, old old M R = + β M R +β 6-10 By substituting nd b with their corresponding equls, in the bove equtions we get b de T E = 6-11 b + β nd T F = + β 6-12 If E is to represent the input pttern then T should be greter thnt, hence, E F b d E > b + β + β 6-13 which implies tht ( b + β ) d E < b + β

136 Theorem 3: if n input pttern I is presented to FAM ctegory E, nd I lies inside E but outside R, then I will be represented by R nd n EAM ctegory R iff d F < k(β ), where d F= dis( I, R ), b( + β ) k( β ) =, = M R, nd b = b + β M 2r. Figure 6-3: Cse # 3, pttern inside n EAM ctegory but outside FAM ctegory Theorem 3 Proof: respectively Let s strt by clculting the CCF function of both ctegories in 6-3 nd 6-4 T E = D r mx{ r, I m c} D 2r + β And T F, old I w =., old w +β Since the pttern is inside E then mx{ r, I m c} = r. Furthermore, since the pttern is outside R,,old,new I w = M R = M R Consequently, d F

137 T E = D r I m D 2r + β c = M 2r M 2r + β 6-16 nd T F, old I w = =, old w +β w, w, old old M R df = + β M R +β 6-17 By substituting nd b for their vlues in the 6-16 nd 6-17 we get b T E = b + β 6-18 nd T F = If df β R is to represent the pttern then d F > + β b which implies tht b + β T should be greter thnt, hence, F E b( + β ) d F < b + β 6-20 Theorem 4: if n input pttern I is presented to FAM ctegory E, nd I lies outside both R nd E, then I will be represented by R nd n EAM ctegory E iff d E < h(β ), nd will be represented by R iff l(β ) d F <, where d E = dis( I, E ), d F = dis( I, R ), ( d F )( b + β ) h( β ) = b, + β ( b d E )( + β ) l( β ) =, = M R, b + β nd b = M 2r. 122

138 Figure 6-4: Cse # 4, pttern outside both FAM nd EAM ctegories Theorem 4 Proof: respectively Let s strt by clculting the CCF function of both ctegories in 6-3 nd 6-4 T E nd = D r mx{ r, I m c} D 2r + β T F, old I w =., old w +β Since the pttern is outside both ctegories, then mx{ r, I m c} = I m c, nd,old,new I w = M R = M R Consequently, d F T E = D r I m D 2r + β c = D r I m D 2r + β 2 = M 2r de M 2r + β 6-21 And T F, old I w = =, old w +β w, w, old old M R df = + β M R +β 6-22 By substituting nd b with their vlues in 6-21 nd 6-22 we get 123

139 b de T E = 6-23 b + β nd T F = If df β E is to represent the input pttern then b d E df > which implies tht b + β + β T E should be greter thn T F, hence, d E ( d F)( b + β ) < b + β 6-25 The second cse is obvious. Now we present the three results for three generl vlues of β nmely: very smll vlue of β (close to zero), very lrge vlue of β (close to infinite) nd intermedite vlues of β (ny vlue between 0 nd infinite). To produce these results we rely on the three theorems we hve proven bove. Result 1: If n input pttern is presented to UART rchitecture, with very smll β (close to zero), then between FAM ctegory mke the following choices: R nd n EAM ctegory E, the input pttern will. If the pttern is inside both ctegories the pttern will be represented by the ctegory with the smller size. b. If the pttern is inside the FAM ctegory R nd outside the EAM ctegory E, the pttern will be represented by R. c. If the pttern is inside the EAM ctegory E nd outside the FAM ctegory R, the pttern will be represented by E. 124

140 Proof: d. If the pttern is outside both ctegories then, it will be represented by d E < d F b, s specil cse, if the size of R is equl to the size of E iff E then the pttern will be represented by the ctegory tht is closer to the pttern (i.e. the ctegory whose distnce to the pttern is smllest) Result 1: Theorem1 sttes tht if the pttern is inside both ctegories it will be represented by the ctegory with the smllest size regrdless of the vlue of β. Result 1b: Theorem 2 sttes tht the pttern will be represented by E iff d E < g(β ), but g β ) 0, nd since d E cn never be less thn zero then it is obvious tht ( lim β 0 = R will represent the pttern Result 1c: Theorem 3 sttes tht the pttern will be represented by R iff k(β ) d F <, but k β ) 0, nd since d F cn never be less thn zero then it is obvious tht ( lim β 0 = E will represent the pttern Result 1d: Theorem 4 sttes tht the pttern will be represented by d E < h(β ), but E iff d Fb ( M 2r) df h( β ) lim β 0 = =. Furthermore, if 2r = R then M R h β becomes d F. ( ) lim β 0 Result 2: If n input pttern is presented to UART rchitecture, with very lrge β (close to infinity), then between FAM ctegory will mke the following choices: R nd n EAM ctegory E, the input pttern. If the pttern is inside both ctegories the pttern will be represented by the ctegory with the smller size. 125

141 b. If the pttern is inside the FAM ctegory R nd outside the EAM ctegory E, the pttern will be represented by E iff d < b = R r or d + 2r < R. E 2 E c. If the pttern is inside the EAM ctegory E nd outside the FAM ctegory R, the pttern will be represented by R iff d F b = 2r R < or d + R < r. F 2 d. If the pttern is outside both ctegories then, it will be represented by E E iff d < b + d, or equivlently if d + 2r < d + R. As specil cse, if the size F E F of R is equl to the size of E then the pttern will be represented by the ctegory tht is closer to the pttern (i.e. the ctegory whose distnce to the pttern is smllest) Proof: Result 2: Theorem1 sttes tht the pttern if inside both ctegories will be represented by the one with smller size regrdless of the vlue of β. Result 2b: Theorem 2 sttes tht the pttern will be represented by E iff d E < g(β ), but g( ) = b. Equivlently this rule sys tht the pttern will β lim β be represented by E iff d E 2r < R +. This rule reinforces the well-known fct with ART rchitectures tht structures of smller size re preferred by ART. Result 2c: Theorem 3 sttes tht the pttern will be represented by R iff k(β ) d F <, but k( β ) lim β = b. Equivlently this rule sys tht the pttern will be represented by R iff d + R < r. This rule reinforces the well-known fct with ART F 2 rchitectures tht structures of smller size re preferred by ART. Result 2d: Theorem 4 sttes tht the pttern will be represented by d E <, E iff h(β ) but h ( ) = b + d. Equivlently, this rule sys tht the pttern will be β lim β F represented by E iff d E 2r < d F + R +. This rule reinforces the well-known 126

142 fct with ART rchitectures tht structures of smller size re preferred by ART. Note tht if the size of E will be chosen over R is equl to the size of smller thn the distnce of the pttern from R, then 2r = R, nd consequently, R, if nd only if the distnce of the pttern from E is R. This sttement lso reinforces the common knowledge tht in ART distnces of input ptterns from the ART structures mtter (nd the smller the distnce of pttern from structure the more likely it is for this structure to represent the input pttern). Result 3: If n input pttern is presented to UART rchitecture, with intermedite β vlues (i.e. 0 < β < ), then between FAM ctegory pttern will mke the following choices: R nd n EAM ctegory E, the input. If the pttern is inside both ctegories the pttern will be represented by the ctegory with the smller size. b. If the pttern is inside the FAM ctegory R nd outside the EAM ctegory E, the pttern will be represented by E iff ( b + β ) d E < g( β ) = b. Where g (β ) is + β non-decresing function of β nd 0 < g( β ) < b (ssuming tht size( E ) < size( R )). c. If the pttern is inside the EAM ctegory E nd outside the FAM ctegory R, the pttern will be represented by R iff b( + β ) k( β ) =. Where k (β ) is non- b + β decresing function of β nd 0 < k( β ) < b (ssuming tht size( E ) > size( R )). 127

143 d. If the pttern is outside both ctegories then, it will be represented by E iff ( d F )( b + β ) d E < h( β ) = b, where h (β ) is non-decresing function of β + β d Fb nd < h(β ) < b + df (ssuming tht size( E ) < size( R ). Proof: Result 3:Theorem1 sttes tht the pttern if inside both ctegories will be represented by the one with smller size regrdless of the vlue of β. Result 3b: Theorem 2 sttes tht the pttern will be represented by E iff d E < g(β ). To prove tht g (β ) is non-decresing function we tke the derivtive of dg( β ) ( b ) g(β ) s follows: =, which is clerly lwys positive bsed on the 2 dβ ( + β ) ssumption size( E ) < size( R ) => b >, nd knowing the fct tht 0 nd b 0. Since g β ) 0 nd g( ) = b, it follows tht 0 < g( β ) < b. ( lim β 0 = β lim β Result 3c: Theorem 3 sttes tht the pttern will be represented by d F <. R iff k(β ) To prove tht k (β ) is non-decresing function we tke the derivtive of k(β ) dk( β ) b( b) follows: =, which is clerly lwys positive bsed on the ssumption 2 dβ ( b + β ) s size( E ) > size( R ) => b <. Since k β ) 0 nd k( ) = b, it ( lim β 0 = β lim β follows tht 0 < k( β ) < b. Result 3d: Theorem 4 sttes tht the pttern will be represented by E iff h(β ) d E <. To prove tht h (β ) is non-decresing function we tke the derivtive of h(β ) s 128

144 dh( β ) ( b )( df ) follows: =, which is clerly lwys positive bsed on the 2 dβ ( + β ) ssumption size( E ) < size( R ) => b >, nd knowing the fct tht d F. Since d Fb h( β ) lim β 0 = nd h ( β ) lim β = b + df, it follows tht d Fb < h(β ) < b + df. 6.2 Time Complexity Anlysis The process of finding n optiml or even suboptiml solution could be very lengthy nd complicted, in this reserch we used GA to get this solution in previous reserch we used simpler method, we trined the network lrge number of times, strting by n initil vlue for ll the prmeters of the network nd then incrementing these prmeters by specific increments, in n effort to exhust ll the combintion bsed on the specific increments. Both pproches gve good results lthough GFAM pproch hs better solutions in most cses. Let s ssume tht T of FAM network is O (N 3 ), nd trin T is O (N 2 ), where N is the test number of nodes. GFAM simply works s follows For 1 to Pop size // # of chromosomes in popultion Adust prmeters Trin FAM network For 1 to Gen mx // # genertions in run For 1 to Pop size Apply GA Opertors // TOpertor nd is O(N) Decode Chromosome to FAM //TDecode nd is O(N) 129

145 Since Test FAM //Ttest Encode FAM to Chromosome //TEncode nd is O(N) And hence, the time complexity of GFAM is roughly Popsize * Ttrin + Genmx * Popsize * ( Topertor + Tdecode + Ttest + Tencode) Pop size nd Gen mx re vribles we lso ssume tht they re fctors of N s follows Pop szie = c8* N nd Gen mx = c9* N by substituting vlues of individul complexities we get GFAM time s c8* N * Ttrin + c9* N * c8* N * Ttrin + c9* N * c8* N *( N + N + Ttest + N) 6-26 If we substitute the vlues from T nd trin T test c8* N * c4 * N 3 + c9 * N * c8* N * which could be simplified s ( N + N + c7 * N 2 + N) c 10* N + N c11* N c12* 6-27 which is clerly O (N ) 4 For the Exhustive Serch (Liner increment of prmeters) pproch the lgorithm is roughly like this For 1 to P1 // P1 # number of increments of prmeter1 Increment prmeter1 For 1 to P2 // P2 # number of increments of prmeter2 Increment prmeter2 : For 1 to Pn // Pn # number of increments of prmetern (n is //the number of prmeters for specific clssifier) Increment prmetern 130

146 Trin FAM Test FAM In cse of ssfam we hve four prmeters (epsilon, order, rho nd lph) so we hve four FOR sttements nd hence the exhustive serch time is P 1* P2 * P3* P4*( Ttrin + Ttest) 6-28 Since P1.. P4 re ll vribles dependent on the number of increments we will ssume them s fctors of N, nd so 3 becomes cp1* N * cp2* N * cp3* N * cp4* N * ( Ttrin + Ttest) cp * N 4 *( Ttrin + Ttest) 6-29 where cp, cp1 cp4 re ll constnts If we substitute vlues from T nd trin T in 4 we get test cp * N *( c4* N + c7 * N ) c13* N c14* N which is O ( N ) 6-27 nd 6-30 show us tht there is big difference between the time complexity of GFAM nd the liner increment of properties pproch. Notice tht the bove nlysis is rough estimte nd is only ment to show the difference between the two pproches. Also notice tht here we only considered ssfam which hppened to hve only four prmeter to chnge, other networks could hve five or more prmeters nd so their time complexity will be even worse thn tht of ssfam while GFAM frme work does not get ffected by the number of prmeters. We lso ssumed tht the number of nodes N is constnt which bsolutely not the cse, however it is importnt to mention tht in the trining process N grows from 1 to lrge number, while in the testing process N is constnt. It is lso importnt to know tht in the GFAM pproch the N tends to get smller nd smller s the genertions is incremented 131

147 (since we optimize on it), until it converges to very smll number, this leds to even shorter time when testing the chromosome (FAM networks). It is rguble tht the number of increments for ech prmeter in the liner pproch could be chosen to be smll number, this is true, but this defets the purpose of this pproch, becuse it ws ment to try s mny prmeter combintions s possible to get the best performing network. And so the lrger the number of increments of every prmeter the better is the chnce to find n optiml or suboptiml solution. 132

148 7. USER INTERFACE It would hve been difficult to chieve ny of the results in this disserttion without the use of custom built, esy to use, flexible nd sclble user interfce. In this chpter we present four similr in concept, but different in functionlity user interfce progrms tht were used to chieve the gols of this work. GFAM UI, GEAM UI, GGAM UI nd UART UI were developed nd tested to find the best GFAM, GEAM, GGAM, nd UART networks respectively. All four progrms were developed using the C++ lnguge nd using the Borlnd CBuilder 6 softwre development kit. The user mnul is presented in ppendix E. 7.1 GFAM User Interfce The GFAM UI is windows progrm tht llows the user to build genertion of trined FAM networks of ny number of individuls, select the trining, vlidtion nd testing files, use different prmeters to trin the networks, define the genetic prmeters, run the genetic opertors, disply the ctegories of ny of the chromosomes long with the testing or vlidtion ptterns, disply the clssifiction borders of ny chromosome, mnully delete one or ll the ctegories of specific chromosome, nd mnully dd ctegories to ny chromosome. The progrm lso llows the user to run the GA for specific number of genertions without stopping, or the user could define number of genertions t which the progrm stops nd the then could continue per user commnd, or to step one genertion t time. The progrm shows the results of the best network whenever it stops, nd lso shows the results of individuls when they re displyed. The progrm logs vrible mount of informtion to n utomticlly nmed file, s follows. In the directory where the progrm resides it looks for file clled GFAM0.csv, if it finds it, it then looks for GFAM1.csv nd so on, until it looks for file clled GFAMn.csv (n is positive number) nd it does not find it, 133

149 it cretes this file nd logs the dt to it. The logged dt cn be customized by the user. Figure 7-1 shows the GFAM interfce progrm. Figure 7-1: GFAM User interfce Now we define the controls on the window of Figure 7-1, GFAM Controls # Fetures: The user inserts here the number of fetures in the problem t hnd. Trining File: The user inserts here the pth of the file tht contins the trining dt. Vlidtion File: The user inserts here the pth of the file tht contins the vlidtion dt. Testing File: The user inserts here the pth of the file tht contins the testing dt. Browse: These three buttons re used to open n Open File dilogue tht llows the user to select the file from its loction. 134

Figure 7-2: An open dilogue window, llows the user to select the trining, vlidting nd testing files Min RHO: Is used to trin the first FAM network in genertion, nd lso used to determine the RHO

150 Figure 7-2: An open dilogue window, llows the user to select the trining, vlidting nd testing files Min RHO: Is used to trin the first FAM network in genertion, nd lso used to determine the RHO increment for the following networks. The defult is 0.1. Mx RHO: Is used to trin the lst FAM network in genertion, nd lso used to determine the RHO increment for the following networks. The defult is Bet : Is used to set the FAM network β prmeter. The defult is 0.1. RHO: Displys the ρ prmeter of the current network or chromosome. PopSize: Determine the number of chromosomes in single genertion. The defult is set to 20. CtMin: The minimum number of ctegories chromosome is llowed to hve, used in the fitness function clcultions. The defult is set to 1. CtMx: The mximum number of ctegories chromosome is llowed to hve, used in the fitness function clcultions. The defult is set to 300. GenMx: The mximum number of genertion the progrm hs to process before it stops. The defult is set to 500. Mutte (check box): If checked tells the progrm to use muttion in its clcultions. Checked by defult. 135

151 AddCt (check box): If checked tells the progrm to use CtAdd opertor in its clcultions. Checked by defult. DelCt (check box): If checked tells the progrm to use CtDell in its clcultions. Checked by defult. Rndom Ptrns Order (check box): If checked tells the progrm to rndomly present the ptterns when trining the networks of the initil genertion. Checked by defult. Ct Mut Prob (x/ # of Ct s): This number ( positive integer) is used to clculte the probbility of ctegory muttion. This number divided by the number of ctegories in the chromosome is the probbility of muttion. The defult number is set to 5. The probbility of muttion is never llowed to increse beyond the vlue of 1.0. Add Ct Prob: The probbility of dding ctegory to specific chromosome. Defults to 0.1. Del Ct Prob: The probbility of deleting ctegory from specific chromosome. The defult vlue is set to 0.1. # of Elites: Is the number of chromosomes tht should trnsfer from one genertion to the next one, without ny modifiction. Usully the best chromosomes in genertion. The defult vlue is set to 3. Interrupt Every (check box): If checked tells the progrm to stop fter running specific number of genertions. The edit box tht goes with it used to llow the user to insert ny number of genertions. Checked by defult, the number of genertions to interrupt fter is set to 50. Current Genertion: Displys the number of the current genertion. Log Number of Ctegories (check box): If checked tells the progrm to log the number of ctegories of every chromosome in every genertion. Not Checked by defult. 136

152 Log Individul PCC (check box): If checked tells the progrm to log the ccurcy (on the vlidtion set) of every chromosome in every genertion. Not Checked by defult. Log Individul Fitness (check box): If checked tells the progrm to log the vlue of the fitness function of every chromosome in every genertion. Not Checked by defult. Crete Init Pop (Button): When clicked the progrm trins Pop size FAM networks. Optimize (Button): When clicked the progrm strts the genetic optimiztion process, it goes on until the number Genmx is reched, if the interrupt every check box is not checked. If the interrupt every check box is checked the progrm will stop every 50 genertions nd would disply the results of the best chromosome (see e.g, Figure 7-3). Figure 7-3: A dilogue box tht displys the results fter n interruption of the process If the user clicks the No button the progrm resumes its execution, otherwise the progrm stops t the current genertion. If the optimize button is clicked gin the progrm resumes its execution until it reches Genmx or until it is interrupted gin. Next Gen (Button): Allows the user to run the optimiztion process one genertion t time. Finish (Button): Does some clening nd finishes logging some informtion in the file. 137

153 Drw Chroms (Button): This button llows the user to see the ctegories tht every chromosome hs so fr, nd hence, it could be used to monitor the progress of the process nd to see how well every chromosome is doing. Figure 7-4 shows the outcome when this button is clicked. This button displys ctegories only in 2D. If problem hs more thn 2 fetures only the first 2 components of ech vector will be displyed. Also, the grph window tht ppers when the user clicks this button hs dditionl functionlity tht will be explined lter. Figure 7-4: A 2D Grph tht displys the dt points s well s the ctegories Figure 7-5: Sme grph s in figure 7-4 but displying the ctegories only 138

154 Get Boundries (Button): This feture llows the user to visully see how the network is clssifying the input spce. This feture works only for 2D problems. Figure 7-6 shows the outcome when this button is clicked. To chieve this gol, the progrm genertes list of ptterns s mtrix with (0.01) increment on both x nd y xis, nd then feeds this list to the network nd displys the wy the network hs clssified these ptterns by coloring them differently (in the figure two different colors were needed since this ws two clss problem). Figure 7-6: A 2D grph displying the clssifiction borders of this GFAM s well s the ctegories Run on Testing File (Button): Allows the user to run the performnce phse on the selected testing file. Cler GFAM (Button): Clers the memory so tht nother problem cn be checked. All of the bove were controls nd their functionlity on the min window. The grphing window (Figure 7-4) hs some interesting functionlity, s shown below. Drw Cts Only (Check Box): If checked the window will only grph the ctegories but not the ptterns, otherwise, the ctegories s well s ptterns (of the vlidtion set) re displyed. 139

155 Previous GFAM (Chromosome) (Button): Displys the content of the previous network. Next GFAM (Chromosome) (Button): Displys the content of the next network Sttus Br: t the bottom of the window, the chromosome ccurcy nd number of ctegories re shown. Del All (Button): This feture llows the user to delete ll the ctegories of the current chromosome. (this feture is used to mnully dd ctegories nd see how tht ffect the clssifiction ccurcy nd boundries). Del Ct: Deletes only one ctegory from the current chromosome. Add Ct: Brings up dilogue box tht hs few controls to llow the user to mnully dd ctegory to the current chromosome. Figure 7-9, explins this feture. After clicking the Del All button. Figure 7-7: After pushing the Del All button, the GFAM does not hve ny more ctegories 140

Figure 7-8: This figure shows figure 7-7, but only displying the ctegories (none in this cse) fter clicking the Add Ct button Figure 7-9: An dd ctegory dilogue box Where: point

156 Figure 7-8: This figure shows figure 7-7, but only displying the ctegories (none in this cse) fter clicking the Add Ct button Figure 7-9: An dd ctegory dilogue box Where: point 1 is the u vector, point 2 is the v vector. The Ok button dds the ctegory to the chromosome: Figure 7-10: This figure is the sme s figure 7-9 but fter filling in some vlues 141

After clicking the ok button: Figure 7-11: This figure shows the mnully dded ctegory Here is how the boundries look now Figure 7-12: This figure shows the clssifiction

157 After clicking the ok button: Figure 7-11: This figure shows the mnully dded ctegory Here is how the boundries look now Figure 7-12: This figure shows the clssifiction borders of the mnully dded ctegory The ccurcy here ws 50% becuse we only hd two clsses, nd the number of ptterns is divided between them. After dding nother ctegory 142

shows the two mnully dded ctegories Figure 7-15: This figure

158 Figure 7-13: This figure shows the vlues of the endpoints of the second mnully dded ctegory Figure 7-14: This figure shows the two mnully dded ctegories Figure 7-15: This figure shows the clssifiction borders of the mnully dded ctegories 143

159 7.1.2 GFAM UI Abstrct Design GFAM UI consists of 7 min obects. These re: GFAMForm obect: The min window obect tht shows most of the controls nd llows for most of the user interction. GrphForm obect: The grph window, displys ctegories nd clssifiction boundries, nd llows mnul deletion nd ddition of ctegories to ny chromosome. FNode obect: Represent ctegory in FAM network. PtrnNode obect: Represent n input pttern. Chrom obect: represent chromosome, collection of ctegories encoded in specil formt. Ct obect: Represent FAM ctegory fter encoding. AddCtForm obect: The Form tht llows the user to insert the ctegories properties. Now we present these seven obects in more detils: GFAMForm Obect This is the min window obect nd hs most of the controls listed in Its functionlly is briefly described below. Vlidtes the inputs. Cretes the trined FAMs nd stores them. Optimizes the genertion of chromosomes to get the best GFAM. Displys the results nd prepre the dt for the grphing obect. The following is list of the most importnt methods this obect hs: openfile: This method opens the files nd reds the dt cretes PtrnNode obects nd stores them into vectors. initpopclick: Cretes FAM prmeters nd strt the loop of creting FAMs. buildfm: Trins FAM network. 144

160 crop: Deletes ll the ctegories tht encoded one pttern only. decode: Tkes chromosome s n input nd converts it to FAM network. optimize: Optimizes the initil genertion of trined FAMs. crossover: This function does the crossover process s described bove. ddct: Adds ctegory to given chromosome. delct: Deletes ctegory from given chromosome. muttion: This function muttes given chromosome s described erlier. PrepBoundries: Cretes list of ptterns s grid in 2D, their vlues re s follows (0,0),(0,0.01) (0,1),(0.01,0) (0.01,1) (1,1). It lso initilize some other vribles for the grphform obect to use. drwcts: Initilizes some vribles nd clls the grphform obect. showboundries: Initilizes some vribles nd clls the grphform obect. ~GFAMForm: Destructor function. This obect lso hs mny properties (Vribles) nd below is list of the most importnt ones: A vector of FNode obects: Stores FNode obects s FAM network. A vector of strings: Stores the nmes of the clsses in problem. A vector of PtrnNode obects: Stores the trining ptterns. A vector of PtrnNode obects: Stores the testing ptterns. A vector of Chrom obect pointers: Stores pointers to ll the creted chromosomes in specific genertion. A vector of Chrom obect pointers: Stores pointers to ll the chromosomes in temp genertion. A vector of integers: Stores the predicted lbels fter running the performnce phse. An integer: Stores the number of creted ctegories for specific network. 145

161 These vribles nd mny others re used in this obect. However, since detiled design of this UI is not the purpose of this disserttion, the detils re not shown here GrphForm Obect This obect is much simpler thn the GFAMForm obect, it is responsible for showing the ctegories, the vlidtion ptterns, nd the clssifiction boundries of specific chromosome. It is lso responsible of llowing the user to mnully dd nd delete ctegories from the chromosome. The following is list of the min methods nd vribles of this obect: pbpint: This function displys the ctegories long with the vlidtion ptterns or the grid of utomticlly generted ptterns long with the ctegories (clssifiction boundries). This function is one of the most difficult functions in this UI, since one hs to mnully convert the origin of the grphing obect such tht the (0,0) point is the bottom left corner. nextclick: This functions displys the grph of the next chromosome. previousclick: This function displys the grphs of the previous chromosome. ddcclick: This function clls the AddCtForm obect so tht the user cn insert the informtion of the new ctegory. delallct: This functions invokes the del function of the Chrom obect which deletes ll the ctegories from tht chromosome. ~GrphForm: Destructor function. Few vribles exist in this obect nd the following is list of the min ones: An Integer c: Is the index of the current chromosome. An rry of Point obect pointers: Stores the points tht re displyed in the grphing re. 146

162 A pointer to GFAMForm obect: This obect gives the GrphForm obect ccess to mny functions nd vribles from the min window. A boolen bounds: Tells the obect if the ctegories re to be displyed or the clssifiction boundries FNode Obect This obect represents FAM ctegory, A list of the min methods nd vribles in this obect is listed below. FNode: This is the constructor. It cretes new FNode obect nd initilizes it either s new ctegory (single point), or s templte where the informtion of the templte is pssed s n rgument to this function. clc_scled_cmf: This function clcultes the CMF vlue of this ctegory. clc_ccf: This function clcultes the CCF vlue of this ctegory. updte: This function updtes the ctegory during the trining session. ~FNode: Destructor function. The min vribles re: Integer M: The number of fetures. Double Bet: This is the β prmeter. Double lerning_rte: This is set to 1, fst lerning scenrio. Arry of Doubles U: Represents the u vector in FAM. Arry of Doubles V: Represents the v vector in FAM. Integer Lbel: represents the ctegory lbel PtrnNode Obect This is very simple obect tht represent n input pttern. Here is list of its functions nd vribles: 147

163 PtrnNode: This is the constructor. It initilizes the obect from the input rguments it gets. ~PtrnNode: Destructor function. This obect hs the following vribles: An rry of doubles w: This rry holds the vlue of the fetures of n input pttern. Integer clssindex: This integer holds the lbel of the pttern. Integer wc: It represents the number of fetures this pttern hs Chrom Obect This obect represents chromosome, which is collection of ctegories s shown in Figure 4-1. A chromosome obect hs the following functions: Chrom: It is the constructor of this obect, nd it hs very importnt functionlity tht is converting the FNode obect (FAM ctegories) into Ct obects tht re suitble for genetic mnipultion. There is nother version of this function tht cretes n empty Chrom obect. Del: This function deletes ll the ctegories the Chrom obect hs. This opertion is used when mnul ddition nd deletion of the ctegories is invoked. ~Chrom: Destructor function. A Chrom obect hs the following vribles: A vector of Ct obect pointers: This vector holds pointers to ll Ct obect this chromosome hs. Double fit: This vrible holds the fitness vlue of this chromosome. Double rho: This holds the vigilnce prmeter of the FAM network represented by this chromosome. Integer size: This is the number of ctegories in this chromosome. Double ccurcy: This vrible holds the ccurcy of this network. 148

164 Ct Obect This is n obect tht resembles n encoded ctegory. This ctegory is redy for genetic mnipultions, nd here is list of its functions: Ct: This is the constructor, nd it tkes s n input n FNode obect nd extrcts the dt from it nd sves it in its internl vribles. Another version of it tkes s n input Ct obect pointer, nd this one copies ll the informtion of the originl obect to the newly creted one. pointct: Returns Boolen indicting whether or not this ctegory is point ctegory. ~Ct: Destructor function. The following vribles exist in this obect: Arry of Doubles v: obvious. Arry of Doubles u: obvious. Integer l: clss lbel. Integer s: number of fetures AddCtForm Obect This obect is smll form tht hs six controls to ll the user to insert vlues for mnully dded ctegories. It hs one function other thn constructor tht returns one if some vlues where inserted. In the following sections, the user interfce of GEAM, GGAM nd UART will be discussed very briefly, since they re very similr to the UI of GFAM. We will only focus on the differences between the two UI s. 7.2 GEAM User Interfce GEAM UI is windows progrm tht llows the user to build genertion of trined EAM networks of ny number of individuls, select the trining, vlidtion nd testing files, 149

use different prmeters to trin the networks, define the genetic prmeters, run the genetic opertors, disply the ctegories of ny of the chromosomes long with the testing or vlidtion ptterns, disply the

165 use different prmeters to trin the networks, define the genetic prmeters, run the genetic opertors, disply the ctegories of ny of the chromosomes long with the testing or vlidtion ptterns, disply the clssifiction borders of ny chromosome, mnully delete one or ll the ctegories of specific chromosome, nd mnully dd ctegories to ny chromosome. The progrm lso llows the user to run specific number of genertions without stop, or run specific number of genertions t which the progrm stops nd then continue per user commnd, or finlly to run one genertion t time. The progrm shows the results of the best network whenever it stops, nd lso shows the results of individuls when they re displyed. The progrm logs vrible mount of informtion to n utomticlly nmed file, s follows. In the directory where the progrm resides it looks for file clled GEAM0.csv, if it finds it looks for file clled GEAM1.csv, nd so on, until it looks for file clled GEAMn.csv (n is positive number) nd if it does not find it, it cretes it nd logs the dt to it. The logged dt cn be customized by the user. Figure 7-16 shows the GEAM interfce progrm. Figure 7-16: GEAM user interfce 150

166 7.2.1 GEAM Controls Here we only define the controls on the window of Figure 7-16, tht re different from those found in GFAM UI. Mu: This prmeter is the xis rtio of EAM. The defult is set to 1, mening tht the ellipsoids in EAM re ctully circles. Bet : This prmeter is used to set the EAM network β prmeter. The defult is set to 0.1. Crete Init Pop (Button): When clicked the progrm trins Pop size EAM networks. Drw Chroms (Button): This button llows the user to see the ctegories tht every chromosome hs so fr, nd hence, it could be used to monitor the progress of the GA process nd to see how well every chromosome is doing. Figure 7-17 shows the outcome when this button is clicked (this button displys ctegories only in 2D; if problem hs more thn 2 feture, the first 2 components of ech vector will be displyed. Also, the grph window tht pper when the user clicks this button hs significnt functionlity tht will be explined lter). Figure 7-17: 2D grph displying GEAM network; note here ellipsoids re represented by circles Get Boundries (Button): This feture llows the user to visully see how the network is clssifying the input spce. This feture works only for 2D problems. Figure

167 shows the outcome when this button is clicked. To chieve this gol, the progrm genertes list of ptterns s mtrix with (0.01) increment on both x nd y, nd then feeds this list to the network nd displys the wy the network hs clssified these ptterns by coloring them differently (in the fugure two different colors were needed since this ws two clss problem). Figure 7-18: A 2D grph showing the clssifiction borders of the GEAM network Cler GEAM (Button): Clers the memory so tht nother problem cn be checked. The grphing window of GEAM UI (Figure 7-17) hs similr functionlity to tht of GFAM UI. The differences re discussed below. Previous GEAM (Chromosome) (Button): Displys the content of the previous network. Next GEAM (Chromosome) (Button): Displys the content of the next network. Add Ct: Brings up dilogue box tht hs some controls to llow the user to mnully dd ctegory to the current chromosome. Figure 7-20 explins this feture. After clicking the Del All button. 152

Figure 7-19: This figure shows the GEAM network fter pushing the Del All button After clicking the Add Ct button, we get the following screen: Figure 7-20: An dd GEAM ctegory dilogue box Where,

168 Figure 7-19: This figure shows the GEAM network fter pushing the Del All button After clicking the Add Ct button, we get the following screen: Figure 7-20: An dd GEAM ctegory dilogue box Where, Center is the m vector, Direction is the d vector, Clss edit box is the lbel of the ctegory, Mu edit box is the xis rtio nd the Rd edit box is the rdius. The Ok button dds the ctegory to the chromosome: 153

169 Figure 7-21: Mnully filling in vlues for the first ctegory After clicking the ok button, we get the following screen: Figure 7-22: GEAM network fter mnully dding ctegory Now, the network clssifiction boundries look s follows: 154

170 Figure 7-23: Clssifiction borders of the mnully dded ctegory The ccurcy here ws 50% becuse we only hve two clsses here, nd the number of ptterns is divided between them. Figure 7-24: Mnully dding new ctegory After dding nother ctegory we obtin the informtion depicted in the following screen 155

171 Figure 7-25: GEAM network fter dding the second ctegory Figure 1-26: Clssifiction borders of the GEAM network GEAM UI Abstrct Design GEAM UI consists of 7 min obects. These obects re: GEAMForm obect: The min window obect tht shows most of the controls nd llows for most of the user interction. GrphForm obect: This is the grph window tht displys ctegories nd clssifiction boundries, nd llows mnul deletion nd ddition of ctegories to ny chromosome. ENode obect: Represents ctegory in n EAM network. 156

172 PtrnNode obect: Represents n input pttern. Chrom obect: Represents chromosome, collection of ctegories encoded in specil formt. Ct obect: Represents EAM ctegory fter encoding. AddCtForm obect: This is the Form tht llows the user to insert the ctegories properties. Now we present these seven obects in more detil: GEAMForm Obect This is the min window obect nd it hs most of the controls listed in It employs number of methods nd possesses cpbilities tht llow for its functionlity. This functionlity is briefly described in the following points: Vlidtes the inputs. Cretes the trined EAMs nd stores them. Optimizes the genertion of chromosomes to get the best GEAM. Displys the results nd prepre the dt for the grphing obect. The following is list of the most importnt methods this obects hs tht re different from those described in the GFAM UI: initpopclick: Cretes EAM prmeters nd strt the loop of creting EAMs. buildeam: Trins n EAM network. decode: Tkes chromosome s n input nd converts it to n EAM network. optimize: Optimizes the initil genertion of trined EAMs. muttion: This function muttes given chromosome s described erlier in this mnuscript. ~GEAMForm: Destructor function. This obect hs one vrible tht is not listed in those described in GFAM UI: 157

173 A vector of ENode obects: Stores ENode obects s n EAM network GrphForm Obect This obect is very similr to the GrphForm Obect in GFAM UI. However this obect displys ellipsoids rther thn rectngles for ny ctegory ENode Obect This obect represents n EAM ctegory. In the following we describe list of the min methods nd vribles this obect uses: ENode: This is the constructor, it cretes new ENode obect nd initilizes it either s new ctegory (single point), or s templte where the informtion of the templte is pssed s n rgument to this function. clc_scled_cmf: This function clcultes the CMF vlue of this ctegory. clc_ccf: This function clcultes the CCF vlue of this ctegory. updte: This function updtes the ctegory during the trining session. ~GNode: Destructor function. The min vribles re: Integer M: The number of fetures. Double Bet: This is the β prmeter. Double lerning_rte: This is set to 1, corresponding to fst lerning. Arry of Doubles m: Represents the m vector in EAM. Arry of Doubles d: Represents the d vector in EAM. A double r: Represents the rdius. A double mu: Represents the xis rtio. Integer Lbel: Represents the ctegory lbel. 158

174 PtrnNode Obect This obect is copy of the PtrnNode obect used in GFAM UI Chrom Obect This obect represents chromosome, which is collection of ctegories s shown in figure 4-1. The only function tht is different from those in the Chrom Obect in GFAM UI is: Chrom: This is the constructor of this obect. It hs very importnt functionlity tht is converting the FNode obect (EAM ctegories) into Ct obects tht re suitble for genetic mnipultion. There is nother version of this function tht cretes n empty Chrom obect. This obect hs the sme vribles s those of Chrom Obect in GFAM UI Ct Obect This is n obect tht resembles n encoded ctegory. This ctegory is redy for genetic mnipultions. Here is list of its functions: Ct: This is the constructor, it tkes s n input ENode obect nd extrcts the dt from it nd sve in its internl vribles. Another version of it tkes s n input Ct obect pointer; this one copies ll the informtion of the originl obect to the newly creted one. pointct: Returns Boolen indicting whether or not this ctegory is point ctegory. ~Ct: Destructor function. The following vribles exist in this obect: Arry of Doubles m: Obvious. Arry of Doubles d: Obvious. Double r: This is the rdius of the EAM ctegory. Double mu: This is the xis rtio of the EAM ctegory. 159

175 Integer l: This is the clss lbel of the EAM ctegory. Integer s: Represents the number of fetures (i.e. the problem dimensionlity) AddCtForm Obect This obect is smll form tht hs eight controls to llow the user to insert vlues for mnully dded ctegories. It hs one function other thn the constructor tht returns one if some vlues were inserted. 7.3 GGAM User Interfce GGAM UI is windows progrm tht llows the user to build genertion of trined GAM networks of ny number of individuls, select the trining, vlidtion nd testing files, use different prmeters to trin the networks, define the genetic prmeters, run the genetic opertors, disply the ctegories of ny of the chromosomes long with the testing or vlidtion ptterns, disply the clssifiction borders of ny chromosome, mnully delete one or ll the ctegories of specific chromosome, nd mnully dd ctegories to ny chromosome. The progrm lso llows the user to run specific number of genertions without stop, or the user could define number of genertion t which the progrm stops nd the then could continue per user commnd, or to step one genertion t time. The progrm shows the results of the best network whenever it stops, nd lso shows the results of individuls when they re displyed. The progrm logs vrible mount of informtion to n utomticlly nmed file, s follows. In the directory where the progrm resides, it looks for file clled GGAM0.csv, if it finds it looks for file clled GGAM1.csv, nd so on, until the time comes to look for file clled GGAMn.csv (n is positive number); if it does not find the file, the progrm cretes it, nd logs the dt to it. The logged dt cn be customized by the user. Figure 7-27 shows the GGAM interfce progrm. 160

Figure 7-27: GGAM user interfce 7.3.1 GGAM Controls Here we define the controls on the window of Figure 7-27, tht re different from those found in GFAM UI.

176 Figure 7-27: GGAM user interfce GGAM Controls Here we define the controls on the window of Figure 7-27, tht re different from those found in GFAM UI. Crete Init Pop (Button): When clicked the progrm trins Pop size GAM networks. Drw Chroms (Button): This button llows the user to see the ctegories tht every chromosome hs so fr, nd hence, it could be used to monitor the progress of the process nd observe how well every chromosome is doing. Figure 7-28 shows the outcome when this button is clicked (this button displys ctegories only in 2D, if problem hs more thn 2 fetures nd the button is clicked the first 2 components of ech vector will be displyed. Also, the grph window tht ppers when the user clicks this button hs significnt functionlity tht will be explined lter). Since bell-shped norml distribution is hrd to grph, we show the center of ech ctegory s big dot. 161

177 Figure 7-28: A 2D grph showing GGAM network; note here GGAM ctegory is represented by lrge dot Get Boundries (Button): This feture llows the user to visully see how the network is clssifying the input spce. This feture works only for 2D problems. Figure 7-29 shows the outcome when this button is clicked. To chieve this gol, the progrm genertes list of ptterns s mtrix with (0.01) increment on both x nd y, nd then feeds this list to the network nd displys the wy the network hs clssified these ptterns by coloring them differently (in the figure two different colors were needed since this ws two clss problem). Figure 7-29: A 2D grph showing the clssifiction boundries of the GGAM network Cler GGAM (Button): Clers the memory so tht nother problem cn be checked 162

The grphing window of GGAM UI (Figure 7-28) hs similr functionlity to tht of GFAM UI lso, here re the differences Previous GGAM (Chromosome) (Button): Displys the content of the previous network.

178 The grphing window of GGAM UI (Figure 7-28) hs similr functionlity to tht of GFAM UI lso, here re the differences Previous GGAM (Chromosome) (Button): Displys the content of the previous network. Next GGAM (Chromosome) (Button): Displys the content of the next network. Add Ct: Brings up dilogue box tht hs controls to llow the user to mnully dd ctegory to the current chromosome. Figure 7-30, explins this feture: fter clicking the Del All button. fter clicking the Add Ct button Figure 7-30: After deleting ll the ctegories Figure 7-31: An dd GGAM ctegory dilogue box 163

179 where, Men is the µ vector, Direction is the σ vector, Clss edit box is the lbel of the ctegory, Prob edit box is the probbility of the ctegory. The Ok button dds the ctegory to the chromosome: Figure 7-32: Add ctegory dilogue box with numbers in the vilble boxes fter clicking the ok button: Figure 7-33: A figure showing the mnully dded ctegory nd here is how the boundries look now with the dded ctegory 164

Figure 7-34: The clssifiction boundries corresponding to the mnully dded ctegory The ccurcy of the network is 50% becuse we hve two clsses, nd the number of ptterns is

180 Figure 7-34: The clssifiction boundries corresponding to the mnully dded ctegory The ccurcy of the network is 50% becuse we hve two clsses, nd the number of ptterns is divided eqully mongst them. After dding nother ctegory Figure 7-35: Filling in numbers for the second ctegory Figure 7-36: The GGAM network fter dding the second ctegory 165

181 Figure 7-37: The clssifiction boundries of the GGAM fter dding two ctegories GGAM UI Abstrct Design GGAM UI consists of 7 min obects tht re listed below. GGAMForm obect: The min window obect tht shows most of the controls nd hs most of the user interction. GrphForm obect: This is the grph window tht displys ctegories nd clssifiction boundries, nd llows mnul deletion nd ddition of ctegories to ny chromosome. GNode obect: Represents ctegory in n GAM network. PtrnNode obect: Represents n input pttern. Chrom obect: Represents chromosome, tht is collection of ctegories encoded in specil formt. Ct obect: Represents GAM ctegory fter encoding. AddCtForm obect: This is the Form tht llows the user to insert the ctegories properties. Now we discuss these seven obects in more detil: 166

182 GGAMForm Obect This is the min window obect tht hs most of the controls listed in It hs number of fetures nd employs number of methods tht llows it to ccomplish its functionlity. This functionlity is briefly described in the following: Vlidtes the inputs. Cretes the trined GAMs nd stores them. Optimizes the genertion of chromosomes to get the best GGAM. Displys the results nd prepres the dt for the grphing obect. The following is list of the most importnt methods tht this obect employs tht re different from those described in the GFAM UI. initpopclick: Cretes GAM prmeters nd strt the loop of creting GAMs. buildgam: Trins GAM network. decode: Tkes chromosome s n input nd converts it to GAM network. optimize: Optimizes the initil genertion of trined GAMs. muttion: This function muttes given chromosome s described erlier. ~GGAMForm: Destructor function. The min different vrible from those described in GFAM UI tht this obect hs is: A vector of GNode obects: Stores GNode obects s GAM network GrphForm Obect The GrphForm Obect is very similr to the GrphForm Obect in GFAM UI, however this one displys big dots t the center of the ctegory, rther thn rectngles GNode Obect This obect represents GAM ctegory. Here is list of the min methods tht this obect employs nd vribles tht it uses: 167

183 GNode: This is the constructor, it cretes new GNode obect nd initilizes it either s new ctegory (single point), or s templte where the informtion bout this templte is pssed s n rgument to this function. clc_scled_cmf: This function clcultes the CMF vlue of this ctegory. clc_ccf: This function clcultes the CCF vlue of this ctegory. updte: This function updtes the ctegory during the trining session. ~GNode: Destructor function. here re the min vribles: Integer M: The number of fetures. Arry of Doubles micro: Represents the µ vector in GAM. Arry of Doubles sigm: Represents the σ vector in GAM. A double prob: Represents the probbility of ctegory in GAM. Integer Lbel: Represents the ctegory lbel PtrnNode Obect This obect is clone of the PtrnNode obect of GFAM UI, nd no further discussion is needed Chrom Obect This obect represents chromosome, which is collection of ctegories s shown in Figure 4-3, the only function tht is different from those in the Chrom Obect in GFAM UI is: Chrom: This is the constructor of this obect. It hs very importnt functionlity, which is converting the GNode obect (GAM ctegories) into Ct obects tht re suitble for genetic mnipultion. There is nother version of this function tht cretes n empty Chrom obect. This obect hs the sme vribles s those of the Chrom Obect in GFAM UI. 168

184 Ct Obect This is n obect tht resembles n encoded ctegory. This ctegory is redy for genetic mnipultions, nd in the following we present list of its functions: Ct: This is the constructor tht tkes s n input GNode obect nd extrcts the dt from it, nd sves this dt in its internl vribles. Another version of it tkes s n input Ct obect pointer; this version copies ll the informtion of the originl obect to the newly creted one. pointct: Returns Boolen indicting whether or not this ctegory is point ctegory. ~Ct: Destructor function. The following vribles exist in this obect: Arry of Doubles micro: Obvious. Arry of Doubles sigm: Obvious. Double prob: This is the probbility of this ctegory (i.e. number of encoded ptterns over the totl number of trining ptterns). Integer l: This is the clss lbel. Integer s: This integer represents the number of fetures (i.e. problem s input dimensionlity) AddCtForm Obect This obect is smll form tht hs eight controls. It llows the user to insert vlues for mnully dded ctegories. It hs one function other thn the constructor, tht returns one if some vlues were inserted. 7.4 UART User Interfce UART UI is windows progrm tht llows the user to build genertion of trined FAM networks nd trined EAM networks of ny even number of individuls, select the trining, vlidtion nd testing files, use different prmeters to trin the networks, define the 169

genetic prmeters, run the genetic opertors, disply the ctegories of ny of the chromosomes long with the testing or vlidtion ptterns, disply the clssifiction boundries of ny chromosome, mnully delete

185 genetic prmeters, run the genetic opertors, disply the ctegories of ny of the chromosomes long with the testing or vlidtion ptterns, disply the clssifiction boundries of ny chromosome, mnully delete one or ll the ctegories of specific chromosome, nd mnully dd ctegories to ny chromosome. The progrm lso llows the user to run specific number of genertions without stop, or the user could define number of genertions t which the progrm stops nd then could continue per user commnd, or to step one genertion t time. The progrm shows the results of the best network whenever it stops, nd lso shows the results of individuls when they re displyed. The progrm logs vrible mount of informtion to n utomticlly nmed file, s follows: In the directory where the progrm resides it looks for file clled UART0.csv, if it finds it looks for file nmed UART1.csv, nd so on, until the time comes to look for file clled UARTn.csv (n is positive number); if it does not find it cretes it nd logs the dt to it. The logged dt cn be customized by the user. Figure 7-38 shows the UART interfce progrm. Figure 7-38: UART user interfce Although the look nd feel of the UI is similr to tht of GFAM nd GEAM, UART UI opertions nd design re very different. Now we define the controls on the window of Figure 7-38 tht re different from those in GEAM UI. 170

186 7.4.1 UART Controls # Fetures: This is the number of fetures, used by both EAM nd FAM networks. Trining File: The user inserts here the pth of the file tht contins the trining ptterns, used by both EAM nd FAM networks. Vlidtion File: The user inserts here the pth of the file tht contins the vlidtion ptterns, used by both EAM nd FAM networks. Bet : This is used to set the FAM nd EAM β prmeter. The defults is equl to 0.1. RHO: This field displys the ρ prmeter of the current network or chromosome. PopSize: This control determines the number of chromosomes in single genertion. The defults is set to 20, 10 FAM nd 10 EAM networks. Crete Init Pop (Button): When this button is clicked the progrm trins Pop / 2 FAM networks nd Pop / 2 EAM networks. size Drw Chroms (Button): This button llows the user to see the ctegories tht every chromosome hs so fr, nd hence, it could be used to monitor the progress of the process nd to see how well every chromosome is doing. Figure 7-39 shows the outcome when this button is clicked. This button displys ctegories only in 2D, nd if problem hs more thn 2 fetures nd this button is clicked, the first 2 components of ech vector will be displyed. Also, the grph window tht ppers when the user clicks this button hs gret functionlity tht will be explined lter). size 171

This feture works only for 2D problems (Figure 7-41 shows the outcome when this button is clicked).

187 Figure 7-39: A UART network fter rndomly mixing FAM ctegories with EAM ctegories Figure 7-40: A 2D grph showing only the ctegories Get Boundries (Button): This feture llows the user to visully see how the network is clssifying input dt. This feture works only for 2D problems (Figure 7-41 shows the outcome when this button is clicked). To chieve this gol, the progrm genertes list of ptterns s mtrix with (0.01) increment on both x nd y, nd then feeds this list to the network nd displys the wy the network hs clssified these ptterns by coloring them differently. 172

Figure 7-41: The clssifiction borders of the UART network fter the rndom mixing Cler UART (Button): Clicking this button clers the memory so tht nother problem cn be checked.

188 Figure 7-41: The clssifiction borders of the UART network fter the rndom mixing Cler UART (Button): Clicking this button clers the memory so tht nother problem cn be checked. All of the bove were controls nd their functionlity on the min window. The grphing window (Figure 7-39) hve some very interesting functionlity, discussed below. Previous UART (Chromosome) (Button): Displys the contents of the previous network. Next UART (Chromosome) (Button): Displys the content of the next network. Add Ct: Brings up dilogue box tht hs controls to llow the user to mnully dd ctegory to the current chromosome. This form uses the sme controls to dd FAM ctegory or n EAM ctegory, bsed on the rdio button t the bottom right corner of the form. Figure 7-43, explins this feture: fter clicking the Del All button. 173

U/Direction is the u vector of FAM nd the direction vector of n EAM ctegory,

189 Figure 7-42: After deleting ll the ctegories of UART fter clicking the Add Ct button Figure 7-43: An dd ctegory dilogue box where. U/Direction is the u vector of FAM nd the direction vector of n EAM ctegory, V/Center is the v vector of FAM nd the center vector of n EAM ctegory. The Ok button dds the ctegory to the chromosome: 174

190 fter clicking the ok button: Figure 7-44: Filling in dt for n EAM ctegory Figure 7-45: UART fter mnully dding n EAM ctegory After dding the bove ctegory, the clssifiction boundries look s shown in figure

191 Figure 7-46: The clssifiction boundries of UART fter ddition The ccurcy here ws 20% becuse we hve five clsses here, nd the number of ptterns is divided eqully mongst them. After dding nother ctegory Figure 7-47: Filling in dt for FAM ctegory 176

192 Figure 7-48: UART fter mnully dding n EAM nd FAM ctegory Figure 7-49: UART clssifiction boundries fter the mnul ddition UART UI Abstrct Design UART UI consists of 8 min obects, which re: UARTForm obect: This is the min window obect tht shows most of the controls nd hs most of the user interction. 177

193 GrphForm obect: This is the grph window, displys ctegories nd clssifiction boundries, nd llows mnul deletion nd ddition of ctegories to ny chromosome. FNode obect: This obect represents ctegory in FAM network. ENode obect: This obect represents ctegory in n EAM network. PtrnNode obect: This obect represents n input pttern. Chrom obect: This obect represents chromosome, collection of ctegories encoded in specil formt. Ct obect: This obect represents FAM ctegory fter encoding. AddCtForm obect: This is the form tht llows the user to insert the ctegories properties. Now we present these seven obects in more detil: UARTForm Obect This is the min window obect tht hs most of the controls listed in It hs mny fetures nd employs methods tht llow it to do its functionlity. This functionlity is, briefly described in the following: Vlidtes the inputs. Cretes the trined FAMs nd EAMs nd stores them. Optimizes the genertion of chromosomes to get the best UART. Displys the results nd prepres the dt for the grphing obect. The following is list of the most importnt methods this obect hs: openfile: This method opens the files nd reds the dt, cretes PtrnNode obects nd stores them into vectors. 178

194 initpopclick: This method cretes FAM nd EAM prmeters nd strts the loop of creting Pop / 2 FAMs, then loops one more time to crete Pop / 2 EAM networks. size buildfm: This method trins FAM network. buildem: This method trins n EAM network. crop: This method deletes ll the ctegories tht encoded one pttern only. redistributects: This function copies ll the ctegories generted in the initil genertion nd stores them in continer, then, it deletes ll the ctegories of ll the chromosomes, shuffles the ctegories, nd redistributes them to the chromosome s follows: If the totl number of ll ctegories is size ll N then ech chromosome gets N Pop ll size ctegories, the first chromosome gets the first set of N Pop ll size ctegories, the second chromosome gets the second set of N Pop ll size ctegories, nd so on. The lst chromosome however, gets N Pop ll size ll + N mod Popsize ctegories. decode: This method tkes chromosome s n input nd converts it to UART network. optimize: This method optimizes genertion of trined UART networks. crossover: This function does the crossover process s described bove. ddct: This method dds ctegory to given chromosome, if chromosome is chosen the dded ctegory hve 50% probbility of being FAM ctegory nd 50% probbility of being n EAM ctegory. delct: This method deletes ctegory from given chromosome. muttion: This function muttes given chromosome. If chromosome is selected, every ctegory in it will be mutted, if the ctegory is FAM ctegory, the FAM 179

195 muttion procedure (described erlier) is pplied, otherwise the EAM muttion procedure is pplied. PrepBoundries: This method cretes list of ptterns s grid in 2D. Their vlues re s follows: (0,0),(0,0.01) (0,1),(0.01,0) (0.01,1) (1,1). It lso initilizes some other vribles for the grphform obect to use. drwcts: This method initilizes some vribles nd clls the grphform obect. showboundries: This method initilizes some vribles nd clls the grphform obect. ~UARTForm: This is the destructor function. this obect lso hs mny properties (Vribles), here is list of the most importnt ones: A vector of Node obects: This vrible stores FNode nd ENode obects s UART network. Node obect is the bse clss of both FNode nd ENode obects. A vector of strings: This vrible stores the nmes of the clsses in problem. A vector of PtrnNode obects: This vrible stores the trining ptterns. A vector of PtrnNode obects: This vrible stores the testing ptterns. A vector of Chrom obect pointers: This vrible stores pointers to ll the creted chromosomes in specific genertion. A vector of Chrom obect pointers: This vrible stores pointers to ll the chromosomes in temp genertion. A vector of integers: This vrible stores the predicted lbels fter running the performnce phse. An integer: This vrible stores the number of creted ctegories for specific network. These vribles nd mny others re used in this obect. However, since detiled design of this UI is not the purpose of this disserttion, the detils re omitted. 180

196 GrphForm Obect This obect is much simpler thn the UARTForm obect. It is responsible for showing the ctegories, the vlidtion ptterns, nd the clssifiction boundries of specific chromosome. It is lso responsible of llowing the user to mnully dd nd delete ctegories from the chromosome. The following is list of the min methods tht this obect employs nd the vribles tht this obect uses: pbpint: This function displys the ctegories long with the vlidtion ptterns or the grid of utomticlly generted ptterns long with the ctegories (clssifiction boundries). This function is one of the most difficult functions in this UI. Since we hve to mnully convert the origin of the grphing obect such tht the (0,0) point is the bottom left corner. This method hs to differentite between FAM ct (displys it s rectngle), nd n EAM ct (displys it s n ellipsoid). nextclick: This functions displys the grph of the next chromosome. previousclick: This function displys the grph of the previous chromosome. ddcclick: This function clls the AddCtForm obect so tht the user cn insert the informtion of the new ctegory. delallct: This functions invokes the del function of the Chrom obect which deletes ll the ctegories from tht chromosome. ~GrphForm: This is the destructor function. A few vribles re used in this obect. The following is list of the min ones: An Integer c: This is the index of the current chromosome. An rry of Point obect pointers: This vrible stores the points tht re displyed in the grphing re. A pointer to UARTForm obect: This obect gives the GrphForm obect ccess to mny functions nd vribles from the min window. 181

197 A boolen bounds: This vrible tells the obect if the ctegories re to be displyed or the clssifiction boundries re to be displyed FNode, ENode nd PtrnNode Obects FNode nd PtrnNode obects re identicl to those found in GFAM, while, ENode is identicl to tht of GEAM Chrom Obect This obect represents chromosome tht is collection of ctegories s shown in Figure 5-6. A chromosome obect hs the following functions: Chrom: This is the constructor of this obect. It hs very importnt functionlity tht is converting the FNode obect (FAM ctegories) nd FNode obects into Ct obects tht re suitble for genetic mnipultion. There is nother version of this function tht cretes n empty Chrom obect. Del: This function deletes ll the ctegories tht it hs. This opertion is used when mnul ddition nd deletion of the ctegories is invoked. ~Chrom: This is the destructor function. And it hs the following vribles: A vector of Ct obect pointers: This vector holds pointers to ll Ct obect this chromosome hs. Double fit: This vrible holds the fitness vlue of this chromosome. Double rho: This vrible holds the vigilnce prmeter of the FAM network represented by this chromosome. Integer size: This is the number of ctegories in this chromosome. Double ccurcy: This vrible holds the ccurcy of this network. 182

198 Ct Obect This is n obect tht resembles n encoded ctegory. The ctegory could be FAM or n EAM ctegory. This ctegory is redy for genetic mnipultions, nd below is list of its functions: Ct: This is the constructor. It tkes s n input FNode obect or n ENode obect nd extrcts the dt from it, nd sves it in its internl vribles. Another version of it tkes s n input Ct obect pointer; this version copies ll the informtion of the originl obect to the newly creted one. It lso sets the ntype vrible to 1, if the input ctegory ws n FNode, otherwise it sets it to 2. pointct: This function returns Boolen indicting whether or not this ctegory is point ctegory. ~Ct: This is the destructor function. The following vribles exist in this obect: Arry of Doubles v: obvious. Arry of Doubles u: obvious. Integer l: This is the clss lbel. Integer ntype: This vrible represents the type of the ctegory (i.e. FAM or EAM, 1 for FAM ctegory, 2 for n EAM ctegory). Integer s: This integer represents the number of fetures (problem dimensionlity). Double mu: This vrible is the xis rtio of n EAM ctegory. Double r: This is the rdius of n EAM ctegory AddCtForm Obect This obect is smll form tht hs ten controls to llow the user to insert vlues for mnully dded ctegories. It hs one function other thn constructor, tht is, it returns one if some vlues were inserted. 183

199 8. SUMMARY/CONTRIBUTIONS, AND FUTURE WORK 8.1 Summry/Contributions In this disserttion we ccomplished the following: Designed methodology to crete geneticlly engineered ART networks. We used this methodology to crete geneticlly engineered FAMs, EAMs nd GAMs, clled GFAM, GEAM, GGAM. Experimented extensively with GFAM, GEAM, GGAM, nd mnged to crete ART rchitectures tht occsionlly were optiml clssifiers (chieved highest possible clssifiction ccurcy, while using the minimum possible number of ctegories). GFAM, GEAM, nd GGAM compred fvorbly with other ART pproches tht were introduced into the literture to solve the ART ctegory prolifertion problem. Extended the methodology to design geneticlly engineered FAMs, EAMs, nd GAMs to the cse where we creted geneticlly engineered combined FAM nd EAM rchitecture, clled UART. Once more UART s performnce ws compred with other ART rchitectures nd it ws lso compred with GFAM nd GEAM. The conclusion from this comprison ws tht UART hs merit nd should be investigted further. Produced nlyticl results tht explined the order of serch of FAM versus EAM ctegories in UART. This nlysis helped us better understnd how UART mkes ctegory choices. Clculted nlyticlly the computtionl complexity of GFAM, GEAM, GGAM nd UART. This nlyticl clcultion demonstrted tht the computtionl complexity of geneticlly engineered ART rchitectures compres very fvorbly compred to the complexity of other ART rchitectures introduced into the literture. 184

200 Tested the genetic ART modules on set of dtbse tht consists of rel nd rtificil dtbses. In the process of running extensive experiments with the geneticlly engineered ART rchitectures we creted four user-friendly interfces UI GFAM, UI GEAM, UI GGAM, nd finlly UI UART. These interfces not only llowed us to conduct this extensive experimenttion effort in timely mnner, but it lso llowed us to visulize the results, fet tht helps one to better understnd the network s functionlity. 8.2 Future Work We see number of directions tht this reserch cn be extended to, nd they re briefly being discussed below. Choose the GA prmeters in GEAM, nd GGAM, nd UART through experimenttion. At this disserttion only the GFAM prmeters were chosen through experimenttion. The GEAM, GGAM nd UART use s GA prmeters the ones found from the GFAM experimenttion with the GA prmeters. Compre GFAM, nd other geneticlly engineered ART networks with some of the stte-of-the ART clssifiers, such s support vector mchines (SVMs). Preliminry results hve shown tht GFAM is very competitive, compred to the SVM pproch. Extend the ides presented in UART to other combintions of ART networks, such s FAM, GAM, or EAM, GAM, nd finlly FAM, EAM, GAM. Extend the nlysis of the order of serch of FAM versus EAM ctegories to orders of serch of FAM versus GAM ctegories, EAM versus GAM ctegories, nd finlly FAM, versus EAM, versus GAM ctegories. Conduct study to uncover the effect of using genetic lgorithms on the size of the trining nd vlidtion sets needed for the ccurte trining of ART rchitectures. 185

201 APPENDIX A: TERMINOLOGY 186

202 FAM: Fuzzy ARTMAP. EAM: Ellipsoidl ARTMAP. GAM: Gussin ARTMAP. ssfam, sseam, ssgam: Semi-suprvised Fuzzy ARTMAP, Semi-supervised Ellipsoidl ARTMAP, Semi-supervised Gussin ARTMAP. M : The dimensionlity of the input ptterns in the trining, vlidtion nd test sets provided to us by the clssifiction problem under considertion. Trining Set: The collection of input/output pirs used in the trining of FAMs tht constitute the initil FAM popultion in GFAM. Vlidtion Set: The collection of input/output pirs used to vlidte the performnce of the FAM networks during the evolution of FAMs from genertion to genertion. Test Set: The collection of input/output pirs used to ssess the performnce of the chosen FAM network, fter the evolution of FAMs is completed. PT : Number of points in the trining set. PV : Number of points in the vlidtion set. min ρ : This is the lower limit of the bseline vigilnce prmeter used in the trining of the FAM networks tht comprise the initil popultion of the FAM networks. mx ρ : This is the upper limit of the bseline vigilnce prmeter used in the trining of the FAM networks tht comprise the initil popultion of the FAM networks. β : The choice prmeter used in the trining of the FAM networks tht comprise the initil popultion of the FAM networks. This prmeter is fixed, nd chosen equl to 1.0. Pop size : The number of chromosomes (FAM trined networks) in ech genertion. 187

203 N ( p) : The number of ctegories in the FAM networks in genertion. th p FAM network from the Pop size trined c w ( p) = ( u ( p), ( v ( p)) ) : the weight vector corresponding to ctegory of the th p FAM network from the u Pop size trined FAM networks in genertion; ( p) corresponds to the lower endpoint of the hyperbox tht the weight vector w ( p) defines nd v ( p) corresponds to the upper endpoint of this hyperbox. l ( p) : The lbel of ctegory of the networks in genertion. th p FAM network from the Pop size trined FAM PCC ( p) : The percentge of correct clssifiction on the vlidtion set exhibited by the th p FAM network from the Pop size trined FAM networks in genertion. Gen mx : The mximum number of genertions llowed for the FAM networks to evolve. When this mximum number is reched evolution stops nd the FAM with the best fitness vlue on the vlidtion set is reported. NC best : Number of best chromosomes tht the GFAM trnsfers from the old genertion to the new genertion (elitism). Ct min, Ctmx : The minimum nd the mximum number of ctegories tht chromosome is llowed to hve during the evolutionry process tht GFAM undergoes. Ct, Ct : New genetic opertors tht dd nd delete ctegory from dd del chromosome. P Ct ), P( Ct ), P (Mut): The probbilities of dding, deleting nd mutting ( dd del ctegory. PT : Number of points in the trining set. 188

204 PV : Number of dt-points in the vlidtion set. PTes : Number of points in the test set. PS : Number of network prmeter settings to produce the best ART network (ART is ssfam, sseam, ssgam nd sfe micro-artmap). r I : Input pttern from your trining or test collection. r O : Output pttern of your trining or test collection. Output pttern to input pttern r I. S( I r ) : The set of committed nodes in the r r the input/output pir ( I, O ). S ( I r ) : The set of committed nodes in the C F 2 F 2 r O corresponds lyer of GAM during the presenttion of lyer of GAM during the presenttion r r of the input/output pir ( I, O ) tht re still competing to represent the input pttern r I. The set ( r C ) S I is subset of S( I r ) consisting of nodes tht pss the vigilnce threshold nd they hve not been dectivted during to mtch-trcking. ρ : Bseline vigilnce prmeter vlue. This vlue determines of how high the level of mtch between input pttern nd F 2 be n cceptble cndidte of encoding the input pttern. ρ : Vigilnce prmeter. It strts t the vlue of node templte should be for this templte to ρ nd then s trining of n input/output pir progresses it might chnge (mybe increse) its vlue due to mtchtrcking. Epsilon: A smll positive constnt tht designtes of how much the vigilnce prmeter vlue will increse beyond the weighted vigilnce vlue of the dectivted nodes fter mtch trcking is enforced. 189

205 w : Templte of node in the comprised of two vectors: () the vector F 2 lyer of dfam. A templte w is vector µ tht is the men of ll the input ptterns tht chose node s their representtive node, nd were coded by this node, nd (b) the vector σ tht is the stndrd devition vector corresponding to of ll the input ptterns tht chose node s their representtive node, nd were coded by this node. r ρ I w ) : Mtch function vlue of input pttern ( r I nd node with templte w. r T I w ) : Choice function (bottom-up input) vlue of node when the dfam input ( is input pttern r I. b W : The inter-art weight vector from node in node is first committed by (sy) input pttern F 2 to ll the nodes in b F 2. When r I, this vector becomes equl to nd its vlue does not chnge ny more during trining. In essence, only one of the components of r O, b W is 1, nd the rest of the its components re 0. The component of b W tht is 1 identifies for us the lbel (output pttern) tht node is mpped to. v : A vector whose components correspond to the second moment of the components of the input ptterns tht chose node s their representtive node nd were encoded by node. γ : The initil vlue of the stndrd devition tht hs encoded single input pttern. σ is equl to t the time tht node n : This prmeter is equl to the number of times tht node ws ctivted by n input pttern, nd node coded this input pttern. 190

206 APPENDIX B: FAM STEP-BY-STEP TRAINING & TESTING 191

207 Trining Phse of FAM The step-by-step implementtion of the off-line trining in FAM is presented below: Step 1: Set the bseline vigilnce prmeter ρ to vlue from the intervl [0, 1]. Also, initilize the prmeter α. The weight vlues corresponding to node in F 2 re: w ( c weight vector equl to ( u,( ) ), where u corresponds to the vector of the minim of the v components of ptterns tht chose this node s their representtive while v corresponds to the vector of the mxim of the components of ptterns tht chose this node s their representtive node, nd the inter-art weights b W (with components W ; = 1,..., N, k = 1,..., N ). As trining progresses every vector b k b b W tht hs been committed hs one of its components equl to 1 nd the other components equl to zero. The component of number of nodes in the b W tht is equl to 1 designtes the lbel tht node is mpped to. The F 1 lyer is denoted by M. The number of committed nodes in the F 2 lyer is denoted by N. The number of nodes in the b F 2 lyer (denoted by corresponds to the number of output clsses. If there re 4 output clsses the number of nodes in the N ) b F 2 output lyer is 4. The index r of the input/output pirs is set to 1. The set S( I r ) represents the set of committed nodes in FAM, nd is initilly set to be equl to the empty set. Step 2: Present the r th r r input pttern ( I, O ) to FAM. Tht is, the input pttern presented t the input lyer F 1 nd the output r O is presented t the output lyer b r I is b F 2. The vigilnce prmeter ρ is set to the bseline vigilnce vlue ρ. Step 3: S( I r ) designtes the set of ll the committed nodes in F 2. If the set S( I r ) is the empty set (i.e., there re no committed nodes) go to Step 7 to do lerning of the input/output 192

208 pir by n uncommitted node. Otherwise, clculte the mtch function for ll the committed nodes in ( r r S I ). The mtch function ρ I w ) ( is clculted s follows: ρ ( I r w I ) = r w I r I = r w M Step 4: Find from the set S( I r ), subset of nodes, designted s S ( I r ), which represents the subset of nodes in the set S( I r ) whose mtch function exceeds the vigilnce ρ. If the set S I ) is the empty set you go to Step 7 to do lerning of the input/output pir by n ( r C uncommitted node. Otherwise, for every node ( I r ) we clculte the choice function (if it hs not been clculted before), s follows: T ( I r r I w ) = α + w Step 5: Find the node J tht hs the mximum choice function (bottom-up input) vlue. Tht is, find S C C J = mx { T ( I r )} Note: If there re more thn one node indices tht mximize the choice function choose the lowest index. Step 6: Find the prediction of the ctivted node J. The prediction K of the input ptterni r, b is the node K for which W = 1. JK We now distinguish two cses. If the lbel K is the correct lbel, then we move to Step 7b to do lerning. If lbel K is the incorrect lbel ρ ρ r = ( I w J ) + epsilon where epsilon is very smll positive vlue nd we lso redefine the set S( I r ), s follows: 193

209 r r S( I ) = S( I ) J nd then we go bck to Step 4. Step 7: We now distinguish two cses. An uncommitted node (sy node J; J is lwys one index higher thn the highest index of committed node) is chosen to lern the input/output pir. Then, w = I J r W = O b J r S( I r+1 ) = set of ll committed nodes in F 2 (Note: if r is the lst index r+1 is the first index) Go to Step 8. A committed node J is chosen to lern the input/output pir; then for this node J we hve w J = I r w J S( I r+1 ) = set of ll committed nodes in F 2 (Note: if r is the lst index r+1 is the first index) Go to Step 8. Step 8: If we re not t the end of n epoch (i.e., one complete presenttion of ll the input/output pirs in the trining set), r is incremented to r+1 nd we go bck to Step 2 to present the r+1 th input/output pir. If we re t the end of n epoch (i.e., ll input/output pirs in the trining set hve been presented once) then two cses cn be distinguished. In the previous list presenttion t lest one component of the top-down weights or the inter-art weights hs been chnged nd we hve not reched the mximum number of list presenttions llowed. In this cse we go bck to Step 2 nd present the first input/output pir in the set of input/output pirs, by setting r to 1. In the previous list presenttion no weight chnges occurred in the top-down weights nd the inter-art weights or we hve reched the mximum number of list presenttions 194

210 llowed. Hence trining is considered to be complete nd the network is considered to hve lernt the trining ptterns perfectly. Performnce Phse of FAM The step-by-step implementtion of FAM s performnce phse is described below: Step 1: Initilize the weights b w ; = 1,... N, W k ; = 1,..., N, k = 1,..., N b, to the vlues tht they hd t the end of the trining phse of FAM. Step 2: Present the r th r r input pttern ( I, O ) to FAM. Tht is, the input pttern presented t the input lyer F 1 nd the output r O is presented t the output lyer r I is b F 2. The vigilnce prmeter ρ is set to the bseline vigilnce vlue ρ. Step 3: For every node we clculte the choice function, s follows: T ( I r r I w ) = α + w Step 4: Find the node J tht hs the mximum choice function (bottom-up input) vlue. Tht is find J = mx { T ( I r )} Note: If there re more thn one node indices tht mximize the choice function choose the lowest index. Step 5: Find the prediction of the ctivted node J. The prediction K of the input ptterni r, b is the node K for which W = 1. JK Step 6: If ll the test ptterns in the test set hve not been pplied to the network then go bck to Step 2 nd present the next input/output test pir in the sequence. If we hve presented ll the input/output test pirs then the results cn be nlyzed to find the misclssifiction error nd other such sttistics. 195

211 APPENDIX C: EAM STEP-BY-STEP TRAINING & TESTING 196

212 Trining Phse of EAM The step-by-step implementtion of the off-line trining in EAM is presented below: Step 1: The weight vlues corresponding to node in ellipsoid corresponding to node ), corresponding to node ), nd F 2 re: m (the center of the d (the direction vector of the mor xis of the ellipsoid R (hlf of the length of the mor xis of the ellipsoid corresponding to the node ); ll these weight vlues re represented by the generic vector w. We lso hve inter-art weights b W (with components W ; = 1,..., N, k = 1,..., N ). As trining progresses every vector b k b b W tht hs been committed hs one of its components equl to 1 nd the other components equl to zero. The component of number of nodes in the b W tht is equl to 1 designtes the lbel tht node is mpped to. The F 1 lyer is denoted by M. The number of committed nodes in the F 2 lyer is denoted by N. The number of nodes in the b F 2 lyer (denoted by corresponds to the number of output clsses. If there re 4 output clsses the number of nodes in the N ) b F 2 output lyer is 4. The index r of the input/output pirs is set to 1. The set S( I r ) represents the set of committed nodes in FAM, nd is initilly set to be equl to the empty set. Set the bseline vigilnce prmeter ρ to vlue from the intervl [0, 1]. Set the prmeter µ to vlue from the intervl (0, 1]. Actully use defult vlue for µ equl to 0.5. Set the prmeter vlue α to vlue between ( 0, ) ; typicl vlues for α re smll b positive constnts. Also, set the prmeter M D = (in UART D = M µ ). You need to normlize your dt so tht they ll hve component vlues in the intervl [0, 1]. For EAM you do no need to complement encode your inputs. 197

213 Step 2: Present the r th r r input pttern ( I, O ) to EAM. Tht is, the input pttern presented t the input lyer F 1 nd the output r O is presented t the output lyer r I is b F 2. The vigilnce prmeter ρ is set to the bseline vigilnce vlue ρ. Step 3: S( I r ) designtes the set of ll the committed nodes in F 2. If the set S( I r ) is the empty set (i.e., there re no committed nodes) go to Step 7 to do lerning of the input/output pir by n uncommitted node. Otherwise, clculte the mtch function for ll the committed nodes in ( r r S I ). The mtch function ρ I w ) ( is clculted s follows: ρ ( I r w ) = D R mx{ R D, I r m C } where r 1 r I m C = I m µ µ 2 2 T r 2 2 (1 ) ( d ( I m )) nd I r m 2 2 stnds for the squre of the Eucliden distnce of r I nd m. Step 4: Find from the set S( I r ), subset of nodes, designted s S ( I r ), which represents the subset of nodes in the set S( I r ) whose mtch function exceeds the vigilnce ρ. If the set S I ) is the empty set you go to Step 7 to do lerning of the input/output pir by n ( r C uncommitted node. Otherwise, for every node ( I r ) we clculte the choice function (if it hs not been clculted before), s follows: S C C T ( I r ) = D R mx{ R D 2R, I r + α m C } Step 5: Find the node J tht hs the mximum choice function (bottom-up input) vlue. Tht is find J = mx { T ( I r ) 198

214 Note: If there re more thn one node indices tht mximize the choice function choose the lowest index. Step 6: Find the prediction of the ctivted node J. The prediction K of the input ptterni r, b is the node K for which W = 1. JK We now distinguish two cses. If the lbel K is the correct lbel, then we move to Step 7b to do lerning. If lbel K is the incorrect lbel ρ ρ r = ( I w J ) + epsilon where epsilon is very smll positive vlue nd we lso redefine the set S( I r ), s follows: r r S( I ) = S( I ) J nd then we go bck to Step 4. Step 7: We now distinguish two cses. An uncommitted node (sy node J; J is lwys one index higher thn the highest index of committed node) is chosen to lern the input/output pir. Then, m = I J r d = 0 J r J = 0 W = O b J r S( I r+1 ) = set of ll committed nodes in F 2 (Note: if r is the lst index r+1 is the first index) Go to Step 8. A committed node J is chosen to lern the input/output pir; then the network weights re updted s follows : m J = m old J mx{ I + 2 ( I r old old old old 1 m J C RJ } R J J r old m r old old J I m J C J ) 199

215 nd for the cse where r I is not the second pttern encoded by node J we hve: d J = d J R J r old old old ( mx{ I m J old, RJ RJ ) old 1 = RJ + } CJ 2 while for the cse where r I is the second pttern encoded by node J we hve: d J I = I r r m m old J old J R J r I m = 2 old J S( I r+1 ) = set of ll committed nodes in F 2 (Note: if r is the lst index r+1 is the first index) Go to Step 8. Step 8: If we re not t the end of n epoch (i.e., one complete presenttion of ll the input/output pirs in the trining set), r is incremented to r+1 nd we go bck to Step 2 to present the r+1 th input/output pir. If we re t the end of n epoch (i.e., ll input/output pirs in the trining set hve been presented once) then two cses cn be distinguished. In the previous list presenttion t lest one component of the top-down weights or the inter-art weights hs been chnged nd we hve not reched the mximum number of list presenttions llowed. In this cse we go bck to Step 2 nd present the first input/output pir in the set of input/output pirs, by setting r to 1. In the previous list presenttion no weight chnges occurred in the top-down weights nd the inter-art weights or we hve reched the mximum number of list presenttions llowed. Hence trining is considered to be complete nd the network is considered to hve lernt the trining ptterns perfectly. 200

216 Performnce Phse of EAM The step-by-step implementtion of EAM s performnce phse is described below: Step 1: Initilize the weights m, d, R, b W k, to the vlues tht they hd t the end of the trining phse of EAM. Initilize lso ll the other EAM prmeters (bseline vigilnce, choice prmeter, µ, etc ) Step 2: Present the r th r r input pttern ( I, O ) to EAM. Tht is, the input pttern presented t the input lyer F 1 nd the output r O is presented t the output lyer r I is b F 2. The vigilnce prmeter ρ is set to the bseline vigilnce vlue ρ. Step 3: For every node we clculte the choice function, s follows: T ( I r ) = D R mx{ R D 2R, I r + α m C } Step 4: Find the node J tht hs the mximum choice function (bottom-up input) vlue. Tht is find J = mx { T ( I r )} Note: If there re more thn one node indices tht mximize the choice function choose the lowest index. Step 5: Find the prediction of the ctivted node J. The prediction K of the input ptterni r, b is the node K for which W = 1. JK Step 6: If ll the test ptterns in the test set hve not been pplied to the network then go bck to Step 2 nd present the next input/output test pir in the sequence. If we hve presented ll the input/output test pirs then the results cn be nlyzed to find the misclssifiction error nd other such sttistics. 201

217 APPENDIX D: GAM STEP-BY-STEP TRAINING & TESTING 202

218 Trining Phse of GAM The step-by-step implementtion of the off-line trining in GAM is presented below: Step 1: Set the bseline vigilnce prmeter ρ to vlue from the intervl [0, 1]. Also, initilize the prmeter γ. The weight vlues corresponding to node in of the dt tht hve ctivted nd were encoded by node ), vector of the dt tht hve ctivted nd were encoded by node ), trining input ptterns tht were encoded by node in F 2 F 2 re: µ (men σ (the stndrd devition n (the number of ), nd the inter-art weights b (with components W ; = 1,..., N, k = 1,..., N ). As trining progresses every vector k b W tht hs been committed hs one of its components equl to 1 nd the other components equl to zero. The component of mpped to. The number of nodes in the committed nodes in the b b W tht is equl to 1 designtes the lbel tht node is F 1 lyer is denoted by M. The number of F 2 lyer is denoted by N. The number of nodes in the b W b F 2 lyer (denoted by N ) corresponds to the number of output clsses. If there re 4 output clsses the number of nodes in the b b F 2 output lyer is 4. The index r of the input/output pirs is set to 1. The set S( I r ) represents the set of committed nodes in GAM, nd is initilly set to be equl to the empty set. Note: There is nother prmeter vector keep trck of. The vector v for every committed node in F 2 tht we need to v is the vector with components the experimentl men of the squred vlues of the input ptterns tht choose node s their representtive node. The reltionships re: 203

219 σ 2 i v i µ i = nd v µ 2 2 i = σ i + i for every component i of vectors v, σ, µ Step 2: Present the r th r r input pttern ( I, O ) to GAM or GAM. Tht is, the input pttern is presented t the input lyer F 1 nd the output r O is presented t the output lyer r I b F 2. The vigilnce prmeter ρ is set to the bseline vigilnce vlue ρ. Step 3: S( I r ) designtes the set of ll the committed nodes in F 2. If the set S( I r ) is the empty set (i.e., there re no committed nodes) go to Step 7 to do lerning of the input/output pir by n uncommitted node. Otherwise, clculte the mtch function for ll the committed nodes in ( r r S I ). The mtch function ρ I w ) ( is clculted s follows: ρ( I r w M r r 1 I i µ i ) = G( I ) = exp 2 i= 1 σ i 2 Note: It might be more computtionlly ttrctive to clculte the logrithm of the mtch function, s follows: log ( ρ( I e r w r ) = log ( G( I e 1 )) = 2 M i= 1 2 r I i µ i σ i When we use the logrithm of the mtch function, we need to be compring r log e ( ρ( I w ) of every node with log e ( ρ ) to determine if node s mtch function vlue exceeds the vigilnce threshold (see beginning of Step 4). Step 4: Find from the set S( I r ), subset of nodes, designted s S ( I r ), which represents the subset of nodes in the set S( I r ) whose mtch function exceeds the vigilnce ρ. If the set S I ) is the empty set you go to Step 7 to do lerning of the input/output pir by n ( r C uncommitted node. Otherwise, for every node ( I r ) we clculte the choice function (if it hs not been clculted before), s follows: S C C 204

220 r T ( I w ) = g( I r ) = N n /( nl ) l= G( I M Π σ i= 1 1 r i ) r Note: In the clcultion of T I w ) we hve omitted term in the denomintor. This term is equl to M / 2 ( 2 ) ( r π. So the correct expression for the clcultion of T I w ) is s follows: ( r T ( I w ) = g( I r ) = n Π /( N n ) 1 l l= 1 r G( I M M i= 1σ i 2 (2π ) ) In the expression bove, the terms 1 M Π i = 1 σ i 1 (2π ) M 2 G( I r ) nd n N l= 1 n l, represent the clss conditionl probbilities of the input pttern belonging to clss, nd the -priori estimte of the probbility tht pttern belongs to clss, respectively. But note tht since we re using the T (or g) vlues in order to clculte normlized ctivtions, the common terms in the evlution of T (or g) cn be ignored. These terms re: π nd M / 2 ( 2 ) N n l l= 1. In our clcultion of T (or g) we hve ignored the term π M / 2 ( 2 ), but we hve N n l l= 1 included the term. Step 5: Find the node J tht hs the mximum choice function (bottom-up input) vlue. Tht is find J = mx { T ( I r ) Note: If there re more thn one node indices tht mximize the choice function choose the lowest index. 205

221 Step 6: Find the prediction of the ctivted node J. The prediction K of the input ptterni r, b is the node K for which W = 1. JK We now distinguish two cses. If the lbel K is the correct lbel, then we move to Step 7b to do lerning. If lbel K is the incorrect lbel ρ ρ r = ( I w J ) + epsilon where epsilon is very smll positive vlue nd we lso redefine the set S( I r ), s follows: r r S( I ) = S( I ) J nd then we go bck to Step 4. Step 7: We now distinguish two cses. An uncommitted node (sy node J; J is lwys one index higher thn the highest index of committed node) is chosen to lern the input/output pir. Then, n J = 1 µ = I J r v J 2 2 = ( I r ) + γ σ J = γ W = O b J r S( I r+1 ) = set of ll committed nodes in F 2 (Note: if r is the lst index r+1 is the first index) Go to Step 8. A committed node J is chosen to lern the input/output pir; then for this node J we hve n J n = J

222 µ J 1 = 1 µ J + n J 1 n J I r v Ji = (1 ( n ) 1 ) v Ji + ( n J ) 1 ( I r i ) 2 σ 2 Ji = v Ji µ Ji S( I r+1 ) = set of ll committed nodes in F 2 (Note: if r is the lst index r+1 is the first index) Go to Step 8. Step 8: If we re not t the end of n epoch (i.e., one complete presenttion of ll the input/output pirs in the trining set), r is incremented to r+1 nd we go bck to Step 2 to present the r+1 th input/output pir. If we re t the end of n epoch (i.e., ll input/output pirs in the trining set hve been presented once) then two cses cn be distinguished. In the previous list presenttion t lest one component of the top-down weights or the inter-art weights hs been chnged nd we hve not reched the mximum number of list presenttions llowed. In this cse we go bck to Step 2 nd present the first input/output pir in the set of input/output pirs, by setting r to 1. In the previous list presenttion no weight chnges occurred in the top-down weights nd the inter-art weights or we hve reched the mximum number of list presenttions llowed. Hence trining is considered to be complete nd the network is considered to hve lernt the trining ptterns perfectly. Performnce Phse of GAM The step-by-step implementtion of GAM s performnce phse is described below: Step 1: Initilize the weights,, n ; = 1,... N b µ σ, W k ; 1,..., N, k = 1,..., N b =, to the vlues tht they hd t the end of the trining phse of Gussin ARTMAP. 207

223 Step 2: Present the r th r r input pttern ( I, O ) to GAM. Tht is, the input pttern presented t the input lyer F 1 nd the output r O is presented t the output lyer r I is b F 2. The vigilnce prmeter ρ is set to the bseline vigilnce vlue ρ. Step 3: For every node we clculte the choice function, s follows: r T ( I w ) = g( I r ) = N n /( nl ) l= G( I M Π σ i= 1 1 r i ) Step 4: Find the node J tht hs the mximum choice function (bottom-up input) vlue. Tht is find J = mx { T ( I r )} Note: If there re more thn one node indices tht mximize the choice function choose the lowest index. Step 5: Find the prediction of the ctivted node J. The prediction K of the input ptterni r, b is the node K for which W = 1. JK Step 6: If ll the test ptterns in the test set hve not been pplied to the network then go bck to Step 2 nd present the next input/output test pir in the sequence. If we hve presented ll the input/output test pirs then the results cn be nlyzed to find the misclssifiction error nd other such sttistics. 208

224 APPENDIX E: USER MANUAL 209

It is very importnt to mention tht ll the progrms developed in this reserch hve similr user interfces, nd hence, one user mnul should be sufficient to represent ll four of them.

225 It is very importnt to mention tht ll the progrms developed in this reserch hve similr user interfces, nd hence, one user mnul should be sufficient to represent ll four of them. The mnul is presented s series of steps s follow: Locte one of the following executble files (GFAM.exe, GEAM.exe, GGAM.exe or UART.exe) nd double click it. When The selected progrm runs it expect tht the user will supply vlues to t lest the number of fetures, the trining file, the vlidtion file, nd testing file, if ny of the bove is not supplied the progrm complins by popping up messge box to wrn the user s in Figure e-1, Figure e-1: Error messge All four progrms give defult vlues to ll of the other prmeters; the user though my chnge some of the defult vlues. The user needs to click the Crete Init Pop button, which prompts the progrm to tke the following ctions: Reds the files nd extrct the pttern informtion into obects clled PtrnNode s. Stores the PtrnNode obects into vectors. Goes into this loop (s long s chromosomes counter is less thn Pop size ): o Clculte the vlue of mx min inc ρ ρ ρ. In prticulr, we first define ρ =, Pop 1 size nd then the vigilnce prmeter of every network is determined by the inc eqution ρ min + i * ρ, where } i { 0, Popsize 1 o Invokes the network specific trining lgorithm (not discussed here), upon finishing, the specific network is creted, this network is bsiclly group of Ctegory obects long with its properties. 210

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed