Constrained-Storage Variable-Branch Neural Tree for. Classification

Similar documents
Solution in semi infinite diffusion couples (error function analysis)

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Variants of Pegasos. December 11, 2009

Clustering (Bishop ch 9)

Robust and Accurate Cancer Classification with Gene Expression Profiling

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Introduction to Boosting

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Robustness Experiments with Two Variance Components

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

TSS = SST + SSE An orthogonal partition of the total SS

An introduction to Support Vector Machine

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

( ) () we define the interaction representation by the unitary transformation () = ()

Testing a new idea to solve the P = NP problem with mathematical induction

Advanced Machine Learning & Perception

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Chapter 4. Neural Networks Based on Competition

Chapter 6: AC Circuits

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

Genetic Algorithm in Parameter Estimation of Nonlinear Dynamic Systems

On One Analytic Method of. Constructing Program Controls

CHAPTER 10: LINEAR DISCRIMINATION

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

MANY real-world applications (e.g. production

A Novel Iron Loss Reduction Technique for Distribution Transformers. Based on a Combined Genetic Algorithm - Neural Network Approach

Comb Filters. Comb Filters

Linear Response Theory: The connection between QFT and experiments

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Tight results for Next Fit and Worst Fit with resource augmentation

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Lecture VI Regression

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Lecture 6: Learning for Control (Generalised Linear Regression)

2/20/2013. EE 101 Midterm 2 Review

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

Cubic Bezier Homotopy Function for Solving Exponential Equations

Parameter Estimation of Three-Phase Induction Motor by Using Genetic Algorithm

Lecture 11 SVM cont

An Effective TCM-KNN Scheme for High-Speed Network Anomaly Detection

Computing Relevance, Similarity: The Vector Space Model

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

WiH Wei He

Neural Networks-Based Time Series Prediction Using Long and Short Term Dependence in the Learning Process

Machine Learning Linear Regression

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

Boosted LMS-based Piecewise Linear Adaptive Filters

Efficient Asynchronous Channel Hopping Design for Cognitive Radio Networks

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Detection of Waving Hands from Images Using Time Series of Intensity Values

( ) [ ] MAP Decision Rule

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

FTCS Solution to the Heat Equation

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

Anisotropic Behaviors and Its Application on Sheet Metal Stamping Processes

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

Theoretical Analysis of Biogeography Based Optimization Aijun ZHU1,2,3 a, Cong HU1,3, Chuanpei XU1,3, Zhi Li1,3

Relative controllability of nonlinear systems with delays in control

Math 128b Project. Jude Yuen

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Department of Economics University of Toronto

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

Advanced time-series analysis (University of Lund, Economic History Department)

M. Y. Adamu Mathematical Sciences Programme, AbubakarTafawaBalewa University, Bauchi, Nigeria

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

Li An-Ping. Beijing , P.R.China

Appendix to Online Clustering with Experts

Graduate Macroeconomics 2 Problem set 5. - Solutions

FI 3103 Quantum Physics

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

Using Fuzzy Pattern Recognition to Detect Unknown Malicious Executables Code

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Video-Based Face Recognition Using Adaptive Hidden Markov Models

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

Let s treat the problem of the response of a system to an applied external force. Again,

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Including the ordinary differential of distance with time as velocity makes a system of ordinary differential equations.

THEORETICAL AUTOCORRELATIONS. ) if often denoted by γ. Note that

Attributed Graph Matching Based Engineering Drawings Retrieval

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

Particle Swarm Optimization Algorithm with Reverse-Learning and Local-Learning Behavior

Modeling and Solving of Multi-Product Inventory Lot-Sizing with Supplier Selection under Quantity Discounts

ECE 366 Honors Section Fall 2009 Project Description

Machine Learning 2nd Edition

Lecture 2 L n i e n a e r a M od o e d l e s

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

January Examinations 2012

Algorithm Research on Moving Object Detection of Surveillance Video Sequence *

RADIAL BASIS FUNCTION PROCESS NEURAL NETWORK TRAINING BASED ON GENERALIZED FRÉCHET DISTANCE AND GA-SA HYBRID STRATEGY

Fuzzy Set Theory in Modeling Uncertainty Data. via Interpolation Rational Bezier Surface Function

Transcription:

Consraned-Sorage Varable-Branch Neural Tree for Classfcaon Shueng-Ben Yang Deparmen of Dgal Conen of Applcaon and Managemen Wenzao Ursulne Unversy of Languages 900 Mnsu s oad Kaohsng 807, Tawan. Tel : 886-7-34603 E-mal: 9800@mal.wzu.edu.w ABSTACT In hs sudy, he consraned-sorage varable-branch neural ree (CSVBNT) s proposed for paern classfcaon. In he CSVBNT, each nernal node s desgned as a sngle layer neural newor () ha s used o classfy he npu samples. The genec algorhm (GA) s proposed o search for he proper number of oupu nodes n he oupu layer of he. Furhermore, he growng mehod s proposed o deermne whch node has he hghes prory o spl n he CSVBNT because of sorage consran. The growng mehod selecs a node o spl n he CSVBNT accordng o he classfcaon error rae and compung complexy of he CSVBNT. In he expermens, CSVBNT has lower classfcaon error rae han oher NTs when hey have he same compung me. Key words: Neural rees, neural newor, genec algorhm. I. INTODUCTION Decson rees (DT) ([] - [6]) are commonly used echnues n machne learnng, recognon and classfcaon sysems. The advanage of DTs s ha decson nodes are easly desgned and have low compung complexy. However, he dsadvanage of DTs s ha he classfcaon error rae s hgh when a node s no a pure class. Therefore, large DTs are preferred o reduce he classfcaon error rae, and hen he DTs become deep rees and low performance. Neural rees (NT) ([7] []) provde a soluon for combnng boh decson rees and neural newors (NN), hus offerng he advanages of boh DTs and NNs. ecenly, NTs have been appled o he recognon of characer ses [], human faces [3], range mages [4], large paern ses [5] and complex scenes [6].

ecenly, he sae-of-he-ar NTs have been proposed. A neural ree wh mul-layer percepron (MLP) a he nernal nodes was proposed by Guo and Gelfand [7]. In [7], he expermenal resuls show ha he NT wh MLPs allows ha he mode uses a smaller number of nodes and leaves. However, he dsadvanage of NTs wh MLPs s ha he compung complexy s ncreased, snce large number of parameers o be uned and hgher rs of overfng. In [8], he adapve hgh order neural ree (AHNT) s proposed. Nodes are hgh-order perceprons (HOP) [9] whose order depends on he varaon of he global error rae. Frs-order nodes dvde he npu space wh hyperplanes, whle HOPs dvde he npu space arbrarly. The drawbac of he AHNT s ncreased complexy and, hus, hgher compuaonal cos. In [0], he MLP s used o desgn he neural newor ree (NNTree). Insead of usng nformaon gan rao as splng creron, a new creron s nroduced for NNTree desgn. The new creron capures well he goal of reducng he classfcaon error rae. However, he man drawbac s he necessy of an ad hoc defnon of some parameers for each conex and ranng se, such as he number of hdden layers, number of nodes for each layer n he neural newors. Alhough he NTs wh MLPs prevously descrbed allow o dvde he npu space wh arbrary hypersurfaces, he compung complexy of a MLP s abou several mes he compung complexy of he sngle-layer neural newor (). In [], he sngle-layer percepron ree was presened. In [], Sra and Nadal presened a bnary srucure of NT by usng he s o solve wo-class problems. Fores and Mchelon [3] proposed a generalzed neural ree (GNT), n whch each node of he GNT s composed of wo pars: he, and he normalzer. The learnng rule s calculaed by mnmzng a cos funcon whch represens a measure of he overall classfcaon error of he GNT. In 0, he balanced neural ree (BNT) was proposed o reduce ree sze and mprove classfcaon wh respec o classcal neural ree [4]. The s also used o desgn each node n he BNT. However, here are wo common drawbacs o hese sae-of-he-ar NTs, ncludng he AHNT, NNTree, GNT and BNT. Frs, he growh sraeges of hese NTs only consder how o reduce he classfcaon error rae, bu do no consder how o reduce he compung complexy. Then, each NT s desgned as a deep and large ree-srucured NT o reduce he classfcaon error rae, whch resuls n

he need for greaer compung complexy. Therefore, an opmal NT mus ae no accoun how o reduce boh he classfcaon error rae and compung complexy when he deph or he number of nodes s lmed n NT. The nex drawbac of hese exsng NTs s ha he number of oupu nodes n he neural newor (NN) s se o wo (such as BNT) or he number of classes needed o be dsngushed (such as GNT, AHNT and NNTree). In many cases, he number of oupu nodes n he oupu layer of NN s se o he number of classes needed o be dsngushed, s no always he bes sregy o desgn he NT. If he number of oupu node n NN s oo small, he samples belongng o dfferen classes may be classfed o he same oupu node n NN, and hen a narrower and deeper NT wll be generaed. Oherwse, he compung complexy of NN s ncreased when he number of oupu node n NN s oo many. Therefore, fndng he proper number of oupu nodes n NN s an mporan research wor n hs sudy. Wegh assgnmen s one of he mos mporan facors durng NN desgn. Bascally, he weghs are conrolled by boh newor archecure and he parameers of he learnng algorhm. In addon, usng several layers and nodes n hdden layers causes he newor o be much more complex. Oher NN parameers such as npus, he number of hdden layers and her nodes, he number of memory aps and he learnng raes also affec NN performance. Genec algorhm (GA) ([5]-[34]) has been a popular approach o adng neural newor learnng. In [30], GA s appled o he desgn of boh NN srucure evoluon and wegh adapaon. Mahmoudabad e al. opmzed an NN wh he Levenberg-Maruard (LM) and GA n order o mprove s performance for grade esmaon purposes [3]. Samana e al. appled smulaed annealng for NN ranng [3]. Chaeree e al. appled GA n NNs, and showed ha hs combnaon can resul n beer NN performance [33]. Tahmaseb and Hezarhan used GA o opmze NN parameers and opology, and obaned mproved resuls [34]. However, he above GAs assume ha he number of oupu nodes n an NN s nown and can be se by he user n advance. Ths sudy proposes he CSVBNT whch, unle prevously appled GAs, s able o auomacally generae he proper number of oupu nodes n an NN. Tha s, when usng he proposed GA, he user does no need o se he number of oupu nodes of an NN n advance. 3

The conrbuons of hs sudy are summarzed as follows: () Ths sudy proposes he CSVBNT. To mprove he frs drawbac of exsng NTs, descrbed above, he CSVBNT s desgned o be opmzed accordng o boh he classfcaon error rae and compung complexy of he CSVBNT. To farly compare he performance of he CSVBNT and oher NTs, he sorage (.e. he number of nodes n he NTs) s consraned. Because of he sorage lmaon, deermnng whch node has he hghes prory o spl n he CSVBNT s an mporan desgn ssue. The expermens conduced n hs research demonsrae ha he CSVBNT has a lower classfcaon error rae han oher exsng NTs when hey have he same compung me. () To mprove he second drawbac of he exsng NTs descrbed above, GA s proposed o desgn he for each nernal node n he CSVBNT. The proposed GA has he ably o auomacally generae he weghs and deermne he proper number of oupu nodes n he accordng o boh he classfcaon error rae and compung complexy of he CSVBNT. Then, he CSVBNT ends o be opmzed. The remander of hs paper s organzed as follows. The concep desgn of our proposed mehods s descrbed n Secon II, and he desgn of CSVBNT wh he sorage consran s presened n Secon III. Secon IV presens he desgn of he GA. The expermens are descrbed n Secon V, and Secon VI presens he conclusons. II. CONCEPT DESIGN OF OU POPOSED METHODS In order o undersand he enre desgn of he CSVBNT, he followng descrbes he concep desgn of CSVBNT. The CSVBNT s a ree-srucured classfer, n whch each nernal node s desgned as an node, and each leaf node denoes he oupu space. Noably, he s a fully conneced sngle-layer NN and each oupu node n he oupu layer of represens a branch n he CSVBNT. The wo man conrbuons of he CSVBNT are descrbed below: () The growh sraegy of he CSVBNT s desgned based on boh he classfcaon error rae and compung complexy, and hen he CSVBNT ends o be opmzed. Fgure shows an example of 4

he performance of wo NTs, NT and NT. In Fg., f he deph of he NT s ncreased, he average classfcaon error rae s reduced and he average compuaon me s ncreased. From Fg., alhough he same classfcaon error rae e can be acheved when NT and NT have suffcen deph, NT sll has beer performance han NT. Ths s because he classfcaon error rae of NT s smaller han ha of NT (e <e 3 ) when hey have he same compung me,. Tha s, NT ouperforms NT when he deph or he number of nodes s lmed. Ths suaon means ha f he sorage space (.e. he number of nernal nodes n he NT) s lmed, boh he classfcaon error rae and compung complexy mus be used o denfy he bes NT. Therefore, he growh sraegy of he CSVBNT aes no accoun how o reduce boh he classfcaon error rae and compung complexy, raher han he growh sraeges of oher NTs ha only emphasze reducng he classfcaon error rae. Fgure. An example o llusrae wo NTs. () The nex conrbuons of he CSVBNT s ha he proposed GA s enable o auomacally search for he proper number of oupu nodes for each accordng o boh he classfcaon error rae and compung complexy of he CSVBNT. In he exsng NTs, he number of oupu nodes n he oupu layer of NN s usually se o he number of classes needed o be dsngushed, s no always he bes growh sregy o desgn he NT. Fgure shows a smple example o llusrae hs suaon. Fgure (a) shows he orgnal daa se conssng of hree classes of samples, Fg. (b) shows he parons of he daa se by he exsng NTs, and Fg. (c) shows he correspondng ree classfer. Inally, he daa se ncludes he hree classes of samples ( a, b and c ) n Fg. (a). Snce he daa se has hree classes of samples, he NN conaned n he node s hen desgned o have hree 5

oupu nodes n Fg. (c). Smlarly, node 3 n Fg. (c) consss of wo classes of samples, a and c, and he NN conaned n he node 3 s desgned o have wo oupu nodes o dsngush he wo classes of samples. Therefore, an npu sample classfed as shown n Fg. (c) mus be calculaed wh an average of wo NNs. However, Fg. (d) shows anoher paron of daa se, and Fg. (e) shows he correspondng NT ha he node s desgned o have four oupu nodes. In Fg. (e), he classfcaon of he npu sample reures an average of.7 NNs (< NNs). Thus, he NT shown n Fg. (e) han he NT shown n Fg. (c) has a shorer compung me when hey have he same classfcaon error rae. Tha s, he NN conaned n he node ends o be desgned wh four oupu nodes nsead of hree oupu nodes. Snce he proposed GA can fnd he proper number of oupu nodes for he accordng o boh he classfcaon error rae and compung complexy of he CSVBNT, he CSVBNT becomes a varable-branch ree classfer ha ends o be opmzed. (a) Orgnal daa se. (b) Paron of he daa se by he exsng NTs. (c) The exsng NTs. (d) Oher paron of daa se. (e) Oher beer NT. Fgure. An example o llusrae he NTs. 6

The followng descrbes he enre desgn flow of CSVBNT. In he ranng phase, he Desgn_CSVBNT algorhm descrbed n Secon III s proposed o desgn he CSVBNT. Fgure 3(a) shows he flow char of he Desgn_CSVBNT. In Fg. 3(a), he Desgn_CSVBNT algorhm s appled o desgn he CSVBNT by employng he GA descrbed n Secon IV o desgn he s n he CSVBNT. In he Desgn_CSVBNT, he number of nernal nodes ( nodes) denoes he sorage space reured o sore he CSVBNT. The Desgn_CSVBNT s connued o desgn he CSVBNT from op o boom unl he sorage space s reached. Afer he Desgn_CSVBNT s fnshed n he ranng phase, he CSVBNT can be obaned. Furhermore, he Desgn_CSVBNT(recursve) algorhm s proposed o mprove he effcency of Desgn_CSVBNT n fndng he bes soluon for he GA. The Desgn_CSVBNT(recursve) s a recursve verson of Desgn_CSVBNT, and hey all use he same GA o desgn he s n he CSVBNT. Fgure 3(b) presens he flow char of he Desgn_CSVBNT(recursve). In Fg. 3(b), he GA_ s used o employ he GA n a recursve way. In he sudy, he Desgn_CSVBNT(recursve) nsead of he Desgn_CSVBNT s used o generae he CSVBNT. In he esng phase, he Tes_CSVBNT algorhm descrbed n Secon III s used o explan how he npu sample ravels o he CSVBNT and derves he fnal classfcaon resul. Fg. 3(c) shows an example of he CSVBNT. In Fg. 3(c), he nernal nodes, A, B and C, are desgned as s, and he leaf nodes, L, L, L3, L4, L5 and L6, are used o poson he oupu space. Each oupu node n he has a relaed branch n he CSVBNT. In Fg 3(c), node A consss of hree branches because he A consss of hree oupu nodes n he oupu layer. Also, B and C conss of wo and hree oupu nodes, respecvely. Le he npu sample be npu o node A and classfed o he oupu node, O 3, n he A. The npu sample hen races he relaed branch of O 3 and reaches node C. Smlarly, he npu sample connues o be classfed n he C, and reaches node L 4. Fnally, he npu sample s posoned n he same class as he arrvng leaf node, L 4. The deals of he esng algorhm, Tes_CSVBNT, are descrbed n Secon III. 7

(a) The flow char of he Desgn_CSVBNT (b) The flow char of he Desgn_CSVBNT(recursve) (c) The example of CSVBNT ( : node : leaf node) Fgure 3. The concep desgn of he CSVBNT. 8

In order o enable readers o easly undersand he proposed mehods n hs sudy, Table lss he summary of mporan symbols used n Secons III, IV, and V. Table. Summary of symbols used n hs sudy Symbols Descrpon T CSVBNT desgned for he nernal node, V( ) compung complexy of he leaf node, (E. ()) CC ( ) compung complexy of (E. ()) f number of npus n r number of oupus n H se of all leaf nodes n he CSVBNT, T P( ) probably of fallng a node (E. (3)) V(T) compung complexy of he CSVBNT, T (E. (3)) Class( X ) class of ranng sample, X h cluser C S cener of cluser C m number of ranng samples conaned n he node P ( C, X ) probably ha X s classfed o C (E. (5)) M oupu space of node (E. (6)) ) classfcaon error rae of node (E. (7)) ( (T) classfcaon error rae of he CSVBNT, T (E. (8)) λ ( ) slope of he classfcaon error rae and compung complexy for he leaf node, (E. (9)) sorage space (Desgn_CSVBNT),, M varables used o record he range of soluon space (Desgn_CSVBNT(recursve)) θ mnmum soluon space sze (Desgn_CSVBNT(recursve)) L generaed n he lef subspace (Desgn_CSVBNT(recursve)) generaed n he rgh subspace (Desgn_CSVBNT(recursve)) h acvaon hreshold w, wegh beween he h npu node and he h oupu node n he O value of he h oupu node n he (E. ()) N populaon sze encoded by he h srng, λ () changes n boh he compung me and he classfcaon error rae of CSVBNT afer he for node s generaed by he srng, (E. (3)) ε random value used o change he weghs n he muaon phase (E. (5)) v(, ) wavele coeffcen a locaon (, ) n he subband (E. (7)) n sze of he subband (E. (6) and E. (7)) P c crossover rae muaon rae P m 9

III. DESIGN OF THE CSVBNT In he desgn of CSVBNT, boh of he compung complexy and classfcaon error rae of CSVBNT are emphaszed o be as small as possble. Before he desgn of CSVBNT s descrbed, boh of he compung complexy and classfcaon error rae of CSVBNT are defned n he followng. The compung complexy of CSVBNT s defned as follows. Le T denoe he CSVBNT, and le H denoe he se of all leaf nodes n T, ncludng leaf node. Le L( ) denoe he se of nernal nodes ha are reured o be calculaed when he npu sample ravels he CSVBNT from he roo node o he leaf node. Then, he compung complexy of he leaf node, V( ), s defned as V( )= CC ( ) () L( ),where CC ( ) denoe he compung complexy of for he nernal node,. Le he conan f npus and r oupu nodes. Then, he CC ( ), s defned as CC( ) fr. () Le P( ) represen he probably on he ranng samples n he leaf node complexy of T, V(T), s defned as H. The compung V ( T) P( ) V ( ) (3) The classfcaon error rae of CSVBNT s descrbed as follows. Le he leaf node conan m ranng samples, X ( x, x,..., x ) f f, for m, and le Class( X ) denoe he class of ranng sample, X. Le C = {( X, Class( X )) m } be a cluser ha collecs he ranng samples conaned n he leaf node. Le S be he cener of cluser C. The cener S of cluser C s defned as follows. S m X. (4) m 0

Thus, he probably ha X s classfed o C, P C, X ), s defned as ( P( C, X K ), (5) S X K [ ] S X h H h K where denoes he Eucldean dsance. Noably, P C, X ) sasfes he consrans, ( 0 P ( C, X ) and P ( C, X ). Then, he represenave of posoned n he oupu space s gven as H h h M P( C, X ( X, Class( X )) C m ( X, Class( X )) C m ) Class( X P( C, X ) ). (6) Then, he classfcaon error rae of leaf node, ), s hus defned as ( ( ) P( C, X )( Class( X ) M ). (7) ( X, Class( X )) C m Fnally, he classfcaon error rae of T, (T), s defned as ( T) P( ) ( ). (8) H In he desgn of CSVBNT, how o grow he ree srucure of CSVBNT s descrbed as follows. The growng mehod of CSVBNT selecs he bes leaf node o spl a a me. Tha s, he GA descrbed n Secon III aemps o spl each leaf node, and fnds he bes leaf node o branch n he CSVBNT. The followng explans how o deermne whch one s he bes leaf node. Le he leaf node conan m ranng samples. If he leaf node wll be spl n he CSVBNT, he GA s appled o hese m samples o desgn an for he node. Each oupu node n he oupu layer of he dnoes he correspondng chld node of node. Le he ree T ndcae he ree T afer r chld nodes,,, of are generaed by he GA. Then, he slope of he classfcaon error rae and,,,...,, r

compung complexy for he leaf node,, beween T and T s: λ( ( T) - ( T ) ). (9) V V ( T ) -V ( T) In he growng mehod of CSVBNT, he bes leaf node s defned as he node wh he larges value n E. (9). Therefore, le he se H conss of he leaf nodes n T. Afer each leaf node n H s desgned as an, we can oban u arg max λ( ). (0) H Then, he node u s he bes leaf node ha has he hghes prory o be spl n T. Fgure 4 shows an example o llusrae he growng mehod of CSVBNT. In Fg. 4, T denoes he CSVBNT afer he leaf node s spl n T, and T denoes he CSVBNT afer he leaf node s spl n T. The growng mehod selecs one node a a me o spl n T. From Fg. 4, T s beer han T because he classfcaon error rae of T s smaller han ha of T when he compung me s he same. Thus, he am of he desgn of CSVBNT s o maxmze he slope of he classfcaon error rae and compung complexy. Fgure 4. The relaon beween he classfcaon error rae and compung me n CSVBNT. The ranng and esng phases of CSVBNT are descrbed below. Tranng Phase: In he ranng phase, wo algorhms, Desgn_CSVBNT and Desgn_CSVBNT(recursve), are proposed o desgn he CSVBNT n hs sudy. The Desgn_CSVBNT(recursve) s a recursve verson of Desgn_CSVBNT. Before descrbng he desgn of Desgn_CSVBNT(recursve), he Desgn_CSVBNT s gven as follows. Fgure 3(a) also shows he flow char of he Desgn_CSVBNT.

Noably, he sorage space s defned as he number of nernal nodes of he CSVBNT n he Desgn_CSVBNT. Algorhm: Desgn_CSVBNT Inpu: The sorage speac and he ranng daa se. Oupu: The CSVBNT wh he sorage consran. Sep. Le he roo node,, of ree T conan all of he ranng samples n he ranng daa se. Se STOAGE= 0 and H={}. Sep. Whle Sep.. STOAGE For each node H and ( ) 0, perform he followng. Sep.. The GA descrbed n Secon IV s appled o he samples conaned n he node, and hen he s produced for node. Sep.. Le he conan r oupu nodes n he oupu layer. Each oupu node n he denoes a chld node of node,. Calculae he value of λ ) as E. (9). ( Sep.. Calculae u arg max λ( ) as E. (0). The node, u, s he beas leaf noode ha H has he hghes prory o be desgned as an, and hen he new CSVBNT, u T u, s produced. Le here be w chld nodes for node u. Delee he node u n he se of H and add hese w new nodes o he se of H. Sep.3. Se T= T and STOAGE=STOAGE+. Sep 3. Oupu he CSVBNT, T. u In Sep, he roo node conans all he ranng samples. The se H represens a collecon ha conans all nodes ha can be branched n he CSVBNT. Inally, se H conans only he roo node, n Sep. In he frs loop of Sep, he se H={} conans only he roo node, and hen he roo node, s seleced o be desgned as an node by he GA algorhm n Sep.. In Sep., assume ha 3

he roo node s spl no hree chld nodes,, and 3. Then, he se H ={,, 3 } can be obaned. In Sep.3, he CSVBNT s bul no a ree, T, ha conans only one roo node and hree leaf nodes, and he STOAGE value s ncreased o one. Fgure 5(a) shows he CSVBNT, T, afer he frs loop of Sep s complee. In he second loop of Sep, he GA algorhm s appled o nodes, and 3, conaned n H, hen he hree nodes, and, and hree values, λ ( ) 3, λ ( ) and λ ) ( 3, are produced n Sep.. In Sep., he bes node s he one wh he maxmal value of λ. Assume ha he value of λ ) s larger han boh values of λ ) and λ ) ( ( ( 3. Node s seleced o be desgned as he bul no he CSVBNT. Assume ha node conans wo chld nodes, and. Then, se H s updaed o {,,, 3 }. In Sep.3, he STOAGE value s ncreased o wo, and he new CSVBNT ree, T, replaces ree T. Fgure 5(b) shows he CSVBNT, T, afer he second loop of Sep s complee. In Sep of each loop, he bes node s seleced from se H o be desgned as he node n he CSVBNT, and hs sep s repeaed unl he sorage space lm s reached. (a) T (b) T Fgure 5. The example llusraes he Desgn_CSVBNT ( : node : leaf node). In hs sudy, he Desgn_CSVBNT(recursve), s also proposed o mprove he effcency of he Desgn_CSVBNT. The desgn of Desgn_CSVBNT(recursve) s descrbed as follows. Fgure 6 shows he dfference n callng GA o fnd he bes beween Desgn_CSVBNT and Desgn_CSVBNT(recursve). In Sep.. of he Desgn_CSVBNT algorhm, he number of samples m conaned n node ndcaes ha ha number of oupu nodes n he can be up o m. 4

If he value of m s large, he GA should spend more me searchng for he proper number of oupu nodes n he from m possble soluons, as shown n Fg. 6(a). To avod he GA havng o search a large soluon space o fnd he soluon, he overall soluon space can be dvded no wo or more smaller subspaces n a recursve way, and hen he GA can effcenly fnd he bes soluon n he smaller subspaces. Therefore, he Desgn_CSVBNT algorhm can be rewren as a recursve verson, namely Desgn_CSVBNT(recursve), whch uses he dvde-and-conuer sraegy o desgn he by callng he recursve algorhm GA_, as shown n Fg. 6(b). Fgure 3(b) also shows he flow char of he Desgn_CSVBNT(recursve). (a) The Desgn_CSVBNT (b) The Desgn_CSVBN(recursve) Fgure 6. The dfference n callng GA o fnd he bes beween Desgn_CSVBNT and Desgn_CSVBNT(recursve). ( : Call, : eurn) The Desgn_CSVBNT(recursve) algorhm s descrbed as follows. Algorhm: Desgn_CSVBNT(recursve) Inpu: The sorage speac and he ranng daa se. Oupu: The CSVBNT wh he sorage consran. 5

Sep. Le he roo node,, of ree T conan all of he ranng samples n he ranng daa se. Se STOAGE= 0 and H={}. Sep. Whle Sep.. STOAGE For each node n H and ( ) 0, perform he followng. Sep.. Le m be he number of samples conaned n he node. Call he GA_( =, = m ) o oban he for he leaf node. Sep.. Le he conan r oupu nodes n he oupu layer. Each oupu node n he denoes a chld node of he node,. Calculae he value of λ ) as E. (9). ( Sep.. Calculae arg max λ( ) as E. (0). The node,, s he beas leaf noode ha H has he hghes prory o be desgned as an, and hen he new CSVBNT, T, s produced. Le here be w chld nodes for he node. Delee he node n he se of H and add hese w new nodes o he se of H. Sep.3. Se T= T and STOAGE=STOAGE+. Sep 3. Oupu he CSVBNT, T. Algorhm: GA_ (Inpu:,Inpu: ) Sep. If, hen GA s appled o hese m samples conaned n he leaf node, o desgn he whose number of oupu nodes n he oupu layer s whn he range [, ]. Se Bes_= and Bes_F= he value of he bes fness. eurn boh he Bes_ and Bes_F. End. Sep. If, do he followng. Sep. Calculae M. 6

Sep. Call GA_(, M -). Then, boh he L and Bes_F L can be obaned. Sep.3 Call GA_( M, ). Then, boh he and Bes_F can be obaned. Sep 3. If Bes_F > Bes_F L Then Bes_= and Bes_F= Bes_F. Oherwse Bes_= and Bes_F= Bes_F L. eurn boh he Bes_ and Bes_F. End. L In Sep.. of Desgn_CSVBNT(recursve), GA_( =, = m ) s a recursve algorhm. Afer callng he GA_( =, = m ), he wh he number of oupu nodes whn he range [ =, = m ] can be generaed by he GA. Noably, wo classes ( =) are he mnmum number of classes ha need o be dsngushed, and he maxmal number of oupu nodes n s eual o m ( = m ) such ha each sample s regarded as he only member of s own class. Therefore, he generaed by he GA sll ends o be opmal because he GA has a global search whn he range [, m ]. In Sep of he GA_, f he range of he soluon space, [, ], s small, he GA s drecly used o desgn he wh he number of oupu nodes n he oupu layer whn he range [, ]. Then, he GA_ reurns boh he bes and he maxmal fness value generaed by he GA. In he Sep of he GA_, f he range of he soluon space, [, ], s large, he soluon space, [, ], s dvded no wo subspaces, [, M -] and [ M, Tha s, he GA s used o desgn he s n hese wo subspace, [, M -] and [ M, 7 ], n a recursve way. ], nsead of he whole soluon space, [, ]. In Sep 3, he GA_ reurns boh of he varables, Bes_ and Bes_F, recordng he bes and he maxmal value of he fness generaed by GA n hese wo subspaces, respecvely.

Noably, n he GA_, he hreshold θ s no a crcal value for users. If he hreshold θ s small, he overall soluon space can be dvded no smaller (or more) subspaces, and hen he GA can effcenly fnd he bes soluon n he smaller subspaces. Oherwse, he soluon space wll be cu no larger (or fewer) subspaces. However, f he hreshold θ s large enough, he Sep wll only be performed n GA_, and hen Desgn_CSVBNT(recursve) and Desgn_CSVBNT become he same algorhm. Afer he ranng phase s fnshed, he CSVBNT can be obaned. In he CSVBNT, each leaf node s used o poson he oupu class of he npu samples. However, once here wll be many dfferen classes of ranng samples classfed o he same leaf node n he ranng phase. Then, he oupu class represenng a leaf node s defned as he class o whch he larges number of ranng samples belongs n he leaf node. Tesng Phase: The esng algorhm, Tes_CSVBNT, s proposed o classfy an npu sample n he CSVBNT. Before he Tes_CSVBNT s gven, he followng descrbes how o classfy an npu sample n he for he nernal node. Le he conan f npus and r oupu nodes. The values of for 0< <, r denoe he acvaon hresholds. The values of w, for 0< w, <, f, and r, ndcae he weghs beween he npu and oupu layers. Le he sample, X ( x, x,..., x ) f f, be he npu o he. In he, he value of he h oupu node, O, s he sum of s weghed npus as follows. O f ( w x ) r (), Le u arg max{ O r }. () The npu sample, X, s hen classfed o he uh oupu node n he. 8

The Tes_CSVBNT algorhm s gven as follows. Algorhm: Tes_CSVBNT Inpu: The npu sample X ( x, x,..., x ) f f and ree classfer, CSVBNT. Oupu: The class of X. Sep. Le be he roo node of he CSVBNT. Sep. Whle s no a leaf node Sep.. The npu sample, X, s he npu o he. Calculae he values, O, for r, as E. (). Sep.. Calculae u arg max{ O r } as E. (). Sep.3. The npu sample, X, s classfed o he uh oupu node n he. Then, he npu sample, X, owards he correspondng chld node, u, of he node,, n he Sep.3. Se = u CSVBNT. Sep 3. Oupu he represenave class of node. IV. DESIGN OF THE GA There are wo desgn ssues for he GA. Frs, he GA searches for he weghs n he. Nex, he GA can auomacally search for he proper number of oupu nodes n he oupu layer of he based on he classfcaon error rae and compung complexy of CSVBNT. Le T be he CSVBNT, and le be he node ha conans m npu samples n T. The man goal of GA s o desgn he for node. Each oupu node n he oupu layer of he s desgned as a correspondng chld node of node, n he CSVBNT. The GA s desgned based on he genec algorhm, whch ncludes he nalzaon sep and hree phases: reproducon, crossover, and muaon. They are descrbed as follows. 9

Inalzaon In he nalzaon sep of GA, a populaon of N srngs,,...,, N, s randomly generaed. The lengh of each srng s se o be smaller han, or eual o, he value of m,.e. he lengh of each srng s varan n GA. Tha s, he number of chld nodes of node generaed by he GA s whn he range [, m ]. Le he srng encode he, whch has f npus and r oupu nodes, for N. The values of denoe he h acvaon hresholds for 0< <, r. The values of w, ndcae he weghs beween he npu and oupu layers for 0< w, <, f and r. The srng hen encodes he, as follows. =(O_node (),O_node (),, O_node ( r )), where O_node ()=(, w,, r. w f ), for The followng s an example of he nalzaon sep. Le here be wo elemens (f=) for each npu sample, and le as follows. r =3 be a random value generaed whn [, m ]. hen encodes he soluon =(, w, w,, w, w, 3, w 3, w 3 ) =(O_node (), O_node (), O_node (3)),where O_node ()=(, w, w ), O_node ()=(, w, w ), and O_node (3)=( 3, w 3, w 3 ). Fgure. 7 also shows he correspondng encoded by he srng. Fgure 7. The correspondng encoded by he srng. 0

eproducon: In he reproducon phase, he defnon of fness depends on boh he classfcaon error rae and compung complexy of he CSVBNT. The followng descrbes he defnon of fness for he srng. Before he fness of srng s defned, he followng descrbes how o classfy a ranng sample n he. Le X ( x, x,..., x ) f f be a sample conaned n node. In he, he values, O f ( w x ), for r, are calculaed accordng o E. (), and he value of u arg max{ O r } s calculaed va E. (). Sample X s hen classfed o he uh oupu node n he classfed o hese. Afer all of hese m ranng samples conaned n node have been r oupu nodes, he ranng samples classfed o he same oupu nodes can be colleced o be desgned as chld nodes of. Tha s, r chld nodes,,..., of node are generaed based on hese r oupu nodes n he. Le he ree T ndcae he ree T afer he r chld nodes of node have been generaed. The fness funcon of srng s hen defned as:, Fness( )= λ (), for N, (3) where λ () s defned n E. (9) as represenave of changes n boh he compung me and he classfcaon error rae of CSVBNT. Afer he fness values of hese N srngs,,...,, N, n he populaon have been obaned, he probably of each srng beng seleced for he nex generaon can hen be calculaed as follows: Prob( )= ( ) N l ( ) l, for N. (4) Noably, Prob( ) sasfes 0 Prob ( ) for N, and Prob ( ). In he reproducon phase, he reproducon operaor selecs N srngs as he new populaon n he nex generaon accordng N

o he probably values, Prob( ) for N. The followng shows an example of he reproducon phase. Le here be fve probably values, Prob( )=0., Prob( )=0.5, Prob( 3 )=0., Prob( 4 )=0.5 and Prob( 5 )=0.4, for fve srngs, for 5, n he populaon. Fgure 8 shows he dsrbuon of hese fve probably values n he range [0, ]. In he reproducon phase, he reproducon operaor generaes fve random values whn [0, ] such as 0.5, 0.3, 0.5, 0.7 and 0.9, o deermne whch srngs are seleced for he nex generaon. From Fg. 8, he reproducon operaor selecs,, 4, 5, and 5, o be he new populaon n he nex generaon. Noably, he srng 5 s seleced wce and 3 s omed. The meanng of he reproducon phase s ha he srng wh hgher fness has a greaer probably of beng repeaedly seleced and he srngs wh lower fness may be removed n he nex generaon. Prob( )=0. Prob( )=0.5 Prob( 3 )=0. Prob( 4 )=0.5 Prob( 5 )=0.4 0 0. 0.35 0.45 0.6 0.5 0.3 0.5 0.7 0.9 Fgure 8. The dsrbuon of fve probably values n he range, [0. ]. Crossover: In he crossover phase, he crossover operaor s appled o he populaon of N srngs. Then, a par of srngs, x and y, s seleced o do crossover operaor. Then, wo random negers, a and b, are generaed o deermne whch peces of he srngs are o be nerchanged. Noably, f he number of oupu nodes conaned n he srng s ousde he range [, m ] afer he crossover operaor s compleed, he wo values, a and b, should be randomly generaed agan. Afer he crossover operaor s fnshed, wo new srngs, ˆ x and ˆ y, replace he srngs, x and y, n he populaon. The sgnfcance of he crossover phase s ha exchanges he oupu nodes ncludng he conneced weghs beween dfferen srngs, o yeld varous neural newors. The followng shows an example of he crossover phase. Le here be wo elemens (f=) for each sample. Le conan hree oupu nodes, and le conan four oupu nodes. Then, afer he

crossover operaor wh wo random negers (a= and b=) s appled o he par of srngs, and, wo new srngs, ˆ and ˆ, are generaed n he nex generaon., where, where a= = ( O _ node (), O _ node ()) O_node ()=(, w, w ), O_node ()=(, w, w ). b= =( O _ node (), O _ node (), O _ node (3)) O_node ()=(, w, w ), O_node ()=(, w, w ), O_node (3)=( 3, w 3, w 3). Afer he crossover operaor s appled o and, wo new srngs, ˆ and ˆ, are generaed as follows. ˆ = ( O _ node (), _ ( ) O node, O _ node()) ˆ = ( O _ node( ), O _ node(3)) Muaon: In he muaon phase, he weghs of he srngs n he populaon are randomly chosen wh a probably. Each chosen wegh s added by he mulplcaon of he chosen wegh and a random value (0<ε<). The followng shows an example of he muaon phase. Le be presened as follows: = ( O _ node (), O _ node ()), where O_node ()=(, w, w ), O_node ()=(, w, w ). 3

If he wegh w s chosen o perform he muaon, he new wegh w replaces wegh w as follows: w = w ±w *ε, (5) where he value, ε, s a random value whn he range [0, ]. Afer he muaon phase, he new srng can be obaned and replace he orgnal srng. The user may specfy he number of generaons over whch o run n he GA. Suppose ha he srng ˆ x wh he bes fness generaes he wh rˆ x oupu nodes. Then, rˆ x chld nodes of node are generaed n T, accordng o hese rˆ x oupu nodes conaned n he. V. EXPEIMENTS A. Expermenal seng In he expermens, he Desgn_CSVBNT(recursve) algorhm s used o desgn he CSVBNT wh he sorage consran. The proposed CSVBNT s compared wh oher NTs under he same number of nernal nodes. A run of weny mes s carred ou for accuraely proposng he resuls of he CSVBNT and oher NTs. In our proposed mehods, he hreshold, m 0, s appled o he GA_ snce he one-enh sze of he overall soluon space s suffcenly small for he GA o effcenly fnd he bes soluon. Also, he parameers used n he GA are as follows: populaon of 300, crossover rae, Pc = 80%, and muaon rae, Pm = 5%. Fve hundred generaons are run n he GA, and he bes soluon s reaned. All he expermens are carred ou on personal compuers. Four daa ses: speech, raffc sgn mages, naural mages, and chessboard daa ses are used o es he CSVBNT and oher NTs n he expermens. These four daa ses are descrbed as follows. () In he speech daa se, he ISOLET daabase usng he 6 leers of he Englsh alphabe s used n he solaed word recognon es. The speech daa se consss of 640 uerances ncludng 4000 ranng uerances and 40 esng uerances, from 0 speaers. Each uerance s sampled a 6 Hz wh a 6-b resoluon. A Hammng wndow of 0 ms wh 50% overlap s used o process each uerance furher by Fas Fourer Transform (FFT). Each uerance s dvded no 5 Hammng wndows, wh each represened by 3 FFT coeffcens; ha s, each uerance consss of 480 4

feaures. () The raffc sgn mages daa se s obaned from he GTSB daabase [35]. The ranng se consss of 5000 mages, and he oher 5000 mages are used o es he mehods n our expermens. All mages belong o fory classes. The acual raffc sgn s no always cenered whn he mage; s boundng box s par of he annoaons. We crop all he mages and process hem only whn he boundng box, and resze hem o acheve suare boundng boxes. All raffc sgn mages are reszed o 48x48 pxels. (3) The naural mages of 3x3 pxels are obaned from he CIFA0 daabase [36]. The CIFA0 daabase consss of en classes of mages, each wh 5000 ranng mages and 000 esng mages. Images vary grealy whn each class. They are no necessarly cenered, and may conan only pars of he obec, and show dfferen bacgrounds. (4) The symmercally dsrbued four-class chessboard daa se s used o es he CSVBNT and oher curren mehods. Fgure 4(a) shows he chessboard daa se ha consss of 400 paerns eually dsrbued among four classes. A fve-fold cross-valdaon s performed, and average resuls are presened. In boh he raffc sgn and naural mages daa ses, all color mages should be ransformed no he gray mages n he expermens. Then, each mage s dvded no blocs of 6 6 pxels. Each bloc s hen ransformed by a Haar wavele ransform [37] o oban four subbands. The mean values (mv) and sandard devaons (sd) of he four subbands are calculaed as follows: mv n n v(, ),, (6) n sd, (7) n v(, mv ), where n denoes he sze of he subband, whch s se o 8 n hs expermen, and v(, ) denoes he wavele coeffcen a locaon (, ) n he subband. Therefore, each bloc, whch conans four subbands, can be represened by a feaure vecor wh egh values snce each subband s assocaed wh wo values, sd and mv. 5

B. Performance of CSVBNT Before esng he performance of CSVBNT, he sensvy of hese wo parameers, Pc and Pm, n he GA s descrbed as follows. If Pc s se o 50%, he GA reures wce he number of generaons o ge a smlar or poor soluon n four daa ses, compared wh when Pc s se o 80% or 90%. Ths s because he use of a oo small Pc wll affec he effcency of he GA. Also, when he probably Pm s se o less han 0%, he GA can oban smlar resuls under he same number of generaons. However, f Pm s se o 5%, he GA usually does no converge. Ths s because he srngs are changed oo much n he GA, so ha he GA canno converge o a soluon. Therefore, boh parameers, Pc = 80% and Pm = 5%, are appled o he GA n he expermens. In Table, he dfferen sorages of are used o desgn he CSVBNT on four daases. In hs sudy, he sorage s defned as he number of nernal nodes n he CSVBNT. Noably, because each nernal node conans an, he sorage,, also denoes he oal number of s dsgned n he CSVBNT. In Table, he Num_O denoes he average number of chld nodes of an nernal node (or he average number of oupu nodes n he oupu layer of an ), he DEP denoes he average deph of he CSVBNT, and he CE denoes he average classfcaon error rae on he ranng daase. The classfcaon error rae s defned as E. (8). When he value of s se o, represens ha he sorage space s unlmed, and hen he CSVBNT s grown unl he classfcaon error rae s less han a small hreshold ( = %), or he ranng error rae exhbs no obvous decrease. In he expermens, he proposed CSVBNT s compared wh four NTs: GNT [3], AHNT [8], NNTree [0] and BNT [4]. To farly compare our proposed CSVBNT and oher NTs, he CSVBNT and oher NTs have he same sorage space (.e., he same number of nernal nodes) n he expermen. In Table 3, he TIME denoes he average compung me for a esng sample, and he CE denoes he average classfcaon error rae on he esng daase. In Table 3, we observe ha boh of he AHNT and NNTree han he proposed CSVBNT have lower classfcaon error rae when he sorage space s he same. The reason s ha he MLP allows o dvde he npu space wh arbrary hypersurfaces n he AHNT and NNTree. Tha s, f he dsrbuon of samples conaned n he node s 6

complex (non-lnear dsrbuon), he node s preferred n desgnng an MLP, and hen he classfcaon error rae of boh he AHNT and NNTree can be decreased. However, we also observe ha boh of he AHNT and NNTree usng he MLPs han he CSVBNT usng he s ae more compung me when hey have he same sorage n Table 3. The reason s ha he han he MLP has lower compung complexy. The compung complexy of an MLP s usually larger han wce he compung complexy of he. Fgure 9 shows he expermenal resuls proposed n Table 3, and Fgure 0 shows he classfcaon resuls obaned by he CSVBNT on he chessboard daa se. From Fg. 9, we observe ha he CSVBNT han oher NTs has lower classfcaon error rae when hey have he same compung me. Two reasons are offered as follows. () The proposed growng sraegy desgns he CSVBNT accordng o he classfcaon error rae and compung complexy of CSVBNT, whle he oher NTs ncludng GNT, AHNT, NNTree and BNT, only consder he reducon of he classfcaon error rae. () The GA s capable of searchng for he proper number of oupu nodes n he accordng o he classfcaon error are and compung complexy of CSVBNT. Fgures and has shown ha he characerscs of he CSVBNT dffer from hose of oher NTs. Table. The effcency of he CSVBNT on he ranng daases. Daa Ses Speech Traffc sgn mages Naural mages Chessboard Num_O DEP CE (%) 5 3.5 3.43.36 8 3.8 5.5.37 3.5 7.46 8.9 4 3.5 8.35 4. 3.33 4.5.5 5 5.35 3.43.4 8 5.5 5.54 6.3 5.53 6.34 8.74 4 5.54 8.5 3.7 5.5 3.78.39 5.44 3.36 5.43 8.64 4.46 8.48.5 6.64 3.35 4.56 8.63 7.6.67.38.78 3.00 3.47 8.35 6.00 4.53 3.48 9.00 5.35 7.48.00 7.45 3.49.00 0.58.9 7

Table 3. The performance he CSVBNT and oher NTs on he esng daases Daa Ses Speech Traffc sgn mages Naural mages Chessboard CSVBNT GNT [3] BNT [4] NNTree [0] AHNT [8] CE (%) TIME (sec) CE (%) TIME (sec) CE (%) TIME (sec) CE (%) TIME (sec) CE (%) TIME (sec) 5 7.4 5.5 3.8 5.43 3.8 5.3 4.5 3.37..63 8 8.64 0.7.5 0.5.35.8 4.3 0.38 3.6 8.34.3 8.5 6. 7.85 5.7 8.33 0.35 3.48 9.9 8.5 4 6.3 5. 8.6 3.3 8.3 5.36 5.33 48.37 4.3 4.3. 47.85.6 43.43.53 45.43. 65.83.08 59.3 5 5.4 6.33 8.3 6.3 7.33 6.5.4 3.3 8.4 3.3 8 9..4.44 0.34.65.34 4. 4.45 3.5 3.5 3.4 8.37 6.3 6.64 6. 7.4 8.53 39.4 8.33 37.5 4 7.4 5.38 9.8.3 9.63 3.33 4.7 58.34 4.4 54.84.9 48..5 47.3.3 5.5.9 68.3.86 64.5 5 33.4 6.38 34. 6. 34. 7.4 3.5 4.3 3.3 3. 8 5.84.9 7.8.44 7.7.94.85 9.34.5 8.35 8.65 9.66 0.4 8.4 0..3 4.36 4.3 3.6 40.3 4.8 6.3 5.95 5.73 5.5 7.74 9.8 60.44 9.3 57.4 3.9 58.7 3.9 57.3 3. 6.35.4 78.33. 73.3 3 5. 6.43 6. 6.3 6.4 6.3 0..34 9..54 6 8.54.9 0.34. 0.8.5 4.8 8.48 4.8 6.38 9.65 9.4 3.76 8.76 3.8 8.74 9.54 46.33 9.44 4.33 6.46 5.34 8.33 4.4 8. 4. 5.6 58. 5.3 53.9.9 54.8.94 53.63.7 55.3.5 79.54.5 77.4 8

(a) Speech daase (b) Traffc sgn mages daase (c) Naural mages daase (d) Chessboard daase Fgure 9. The performance of CSVBNT and oher NTs. (a) Chessboard daa se (b) The resul obaned by he CSVBNT. Fgure 0. The classfcaon resul by he CSVBNT on he chessboard daase. 9

VI. CONCLUSIONS Ths sudy proposes he CSVBNT based on GA. The CSVBNT ends o be opmal because s desgn aes no accoun how o reduce boh he classfcaon error rae and compung complexy. The CSVBNT s also a varable-branch neural ree because he number of oupu nodes n he oupu layer of each node s auomacally deermned by he GA accordng o boh he classfcaon error rae and compung complexy of he CSVBNT. Furhermore, hs sudy proposes Desgn_CSVBNT (recursve), whch operaes smlarly o he dvde-and-conuer sraegy for effcen desgn. Desgn_CSVBNT(recursve) s able o deermne whch node has he hghes prory o be seleced o spl n he CSVBNT under he sorage consran. The expermen resuls n hs sudy demonsrae ha he CSVBNT has lower classfcaon error rae han exsng NTs when hey have he same compung me. 30

Dsclosure of Poenal Conflcs of Ineres Shueng-Ben Yang s he correspondng auhor of hs manuscrp led " Consraned-Sorage Varable-Branch Neural Tree for Classfcaon". He declare ha here s no conflc of neres n hs manuscrp. 3

EFEENCES [] Gelfand,S. B., avshanar,c. S., and Delp, E. J. (99). an Ierave Growng and Prunng Algorhm for Classfcaon Tree Desgn. IEEE Trans. Paern Analyss and Machne Inell., 3(), 63-74. [] Yldz, O. T. and Alpaydn, E. (00). Omnvarae Decson Trees. IEEE Trans. Neural Newor, (6), 539-546. [3] Zhao, H. and S. am (004). Consraned Cascade Generalzaon of Decson Trees. IEEE Trans. Knowledgemen and daa Engneerng, 6(6), 77-739. [4] Gonzalo, M. M. and Albero, S. (004). Usng All Daa o Generae Decson Tree Ensembles. IEEE Trans. Sysems, Man, Cybernecs, C, Applcaons and evews, 34(4), 393-397. [5] Wold, P. and Zenon, A. S. (005). C-fuzzy Decson Trees. IEEE Trans. Sysems, Man, Cybernecs, C, Applcaons and evews, 35(4), 498-5. [6] Wang, X. B. Chen, G. Q., and Ye, F. (000). On he Opmzaon of Fuzzy Decson Trees. Fuzzy Ses and Sysems, (3), 7-5. [7] Deffuan, G., Neural uns recrumen algorhm for generaon of decson rees., Proceedngs of he nernaonal on conference on neural newors, (990) 637 64. [8] Lppmann,., An nroducon o compung wh neural nes., IEEE Acouscs, Speech, and Sgnal Processng Magazne. 4() (987) 4. [9] Sanar, A., & Mammone,., Neural ree newors. In Neural newor: heory and applcaon, San Dego, CA, USA: Academc Press Professonal, Inc. 99, pp. 8 30. [0] Seh, I. K. and Yoo, J., Srucure-drven nducon of decson ree classfers hrough neural learnng, Paern ecognon. 30() (997) 893 904. [] Sra, J., & Nadal, J., Neural rees: a new ool for classfcaon, Neural Newor. (990) 43 448. [] T. L, Y. Y. Tang, and F. Y. Fang, A srucure-parameer-adapve (SPA) neural ree for he recognon of large characer se, Paern ecogn. 8(3) (995) 35 39. [3] M. Zhang and J. Fulcher, Face recognon usng arfcal neural newors group-based adapve olerance (GAT) rees, IEEE Trans. Neural Newors. 7 (996) 555 567. [4] G. L. Fores and G. G. Peron, Explong neural rees n range mage undersandng, Paern ecogn. Le. 9(9) (998) 869 878. [5] H. H. Song and S.W. Lee, Aself-organzng neural ree for large se paern classfcaon, IEEE 3

Trans. Neural Newors. 9 (998) 369 380. [6] G. L. Fores, Oudoor scene classfcaon by a neural ree based approach, Paern Anal. Applc. (999) 9 4. [7] H. Guo and S. B. Gelfand, Classfcaon rees wh neural newors feaure exracon, IEEE Trans. Neural Newors. (99) 93 933. [8] G. L. Fores, An adapve hgh-order neural ree for paern recognon, IEEE Trans. Sysems, man, cybernecs-par B: cybernecs. 34 (004) 988-996. [9] G. L. Gles and T. Maxwell, Learnng, nvarance, and varable-branchzaon n hgh-order neural newors, 6 (987) 497 4978. [0] P. Ma, Effcen desgn of neural newor ree usng a sngle splng creron, Nerocompung. 7 (008) 787 800. [] P. E. Ugoff, Percepron ree: a case sudy n hybrd concep represenaon, Proc. VII Na. Conf. Arfcal Inellgence. (998) 60 605. [] J. A. Sra and J. P. Nadal, Neural ree: a new ool for classfcaon, Newor. (990) 43 438. [3] G. L. Fores and C. Mchelon, Generalzed neural rees for paern classfcaon, IEEE Trans. Neural Newors. 3 (00) 540 547. [4] C. Mchelon, A. an, S. Kumarb, G. L. Fores, A balanced neural ree for paern classfcaon, Neural Newors. 7 (0) 8-90. [5] D. Goldberg, Genec Algorhms n Search, Opmzaon, and Machne Learnng. eadng, MA: Addson-Wesley, 989. [6] J. Koza, Genec Programmng. Cambrdge, MA: MIT Press, 99. [7] S. Grossberg, Ed., Neural Newors and Naural Inellgence. Cambrdge, MA: MIT Press, 988. [8] D. umelhar and J. McClelland, Eds., Parallel Dsrbued Processng: Exploraons n Mcrosrucure of Cognon. Cambrdge, MA: MIT Press, 986. [9] J. M. Zurada, Ed., Inroducon o Neural Sysems. S. Paul, MN:Wes, 99. [30] P. J. Angelne, G. M. Saunders, and J. B. Pollac, An evoluonary algorhm ha consrucs 33

recurren neural newors, IEEE Trans. Neural New., vol. 5, no., pp. 54 64, Jan. 994. [3] Mahmoudabad H., Izad M., Menha M.B. A hybrd mehod for grade esmaon usng genec algorhm and neural newors. Compuaonal Geoscences. 009;3:9 0. [3] Samana B., Bandopadhyay S., Gangul. Daa segmenaon and genec algorhms for sparse daa dvson n Nome placer gold grade esmaon usng neural newor and geosascs. Mnng Exploraon Geology. 004;( 4):69 76. [33] Chaeree S., Bandopadhyay S., Machuca D. Ore grade predcon usng a genec algorhm and cluserng based ensemble neural newor model. Mahemacal Geoscences. 00;4(3):309 36. [34] Tahmaseb P., Hezarhan A. IAMG09. Sanford Unversy; Calforna: 009. (Applcaon of Opmzed Neural Newor by Genec Algorhm). [35] J. Sallamp, M. Schlpsng, J. Salmen, and C. Igel. The German Traffc Sgn ecognon Benchmar: A mul-class classfcaon compeon. In Inernaonal Jon Conference on Neural Newors, 0. [36] A. Krzhevsy. Learnng mulple layers of feaures from ny mages. Maser s hess, Compuer Scence Deparmen, Unversy of Torono, 009. [37]. C. Gonzalez and. E. Woods. Dgal Image Processng. Addson Wesley, Boson, MA, 99. 34