Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Similar documents
Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Kernel Methods and Support Vector Machines

CHARACTER RECOGNITION USING A SELF-ADAPTIVE TRAINING

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Pattern Recognition and Machine Learning. Artificial Neural networks

Bayes Decision Rule and Naïve Bayes Classifier

Pattern Recognition and Machine Learning. Artificial Neural networks

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

The Algorithms Optimization of Artificial Neural Network Based on Particle Swarm

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Ensemble Based on Data Envelopment Analysis

Multilayer Neural Networks

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

Combining Classifiers

Hand Written Digit Recognition Using Backpropagation Neural Network on Master-Slave Architecture

IAENG International Journal of Computer Science, 42:2, IJCS_42_2_06. Approximation Capabilities of Interpretable Fuzzy Inference Systems

Non-Parametric Non-Line-of-Sight Identification 1

ZISC Neural Network Base Indicator for Classification Complexity Estimation

A Novel Breast Mass Diagnosis System based on Zernike Moments as Shape and Density Descriptors

1 Proof of learning bounds

Support Vector Machines. Goals for the lecture

An improved self-adaptive harmony search algorithm for joint replenishment problems

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

PAC-Bayes Analysis Of Maximum Entropy Learning

Ch 12: Variations on Backpropagation

Soft-margin SVM can address linearly separable problems with outliers

Block designs and statistics

A Simple Regression Problem

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Pattern Recognition and Machine Learning. Artificial Neural networks

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

Support Vector Machines. Maximizing the Margin

Predicting FTSE 100 Close Price Using Hybrid Model

CS Lecture 13. More Maximum Likelihood

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

GREY FORECASTING AND NEURAL NETWORK MODEL OF SPORT PERFORMANCE

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

COS 424: Interacting with Data. Written Exercises

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

Defect-Aware SOC Test Scheduling

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Efficient Filter Banks And Interpolators

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

Estimation of ADC Nonlinearities from the Measurement in Input Voltage Intervals

Multi-view Discriminative Manifold Embedding for Pattern Classification

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

3D acoustic wave modeling with a time-space domain dispersion-relation-based Finite-difference scheme

Feature Extraction Techniques

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

Adapting the Pheromone Evaporation Rate in Dynamic Routing Problems

A Model for the Selection of Internet Service Providers

(6) B NN (x, k) = Tp 2 M 1

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

ANALYSIS ON RESPONSE OF DYNAMIC SYSTEMS TO PULSE SEQUENCES EXCITATION

Support Vector Machines MIT Course Notes Cynthia Rudin

Research Article Robust ε-support Vector Regression

A Note on the Applied Use of MDL Approximations

Kinematics and dynamics, a computational approach

Computational and Statistical Learning Theory

A note on the multiplication of sparse matrices

Domain-Adversarial Neural Networks

Understanding Machine Learning Solution Manual

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

CHAPTER 2 LITERATURE SURVEY, POWER SYSTEM PROBLEMS AND NEURAL NETWORK TOPOLOGIES

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

arxiv: v1 [cs.ds] 17 Mar 2016

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Computational statistics

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Estimating Average-Case Learning Curves Using Bayesian, Statistical Physics and VC Dimension Methods

Variations on Backpropagation

Artificial Intelligence

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

ACTIVE VIBRATION CONTROL FOR STRUCTURE HAVING NON- LINEAR BEHAVIOR UNDER EARTHQUAKE EXCITATION

Gary J. Balas Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, MN USA

DISSIMILARITY MEASURES FOR ICA-BASED SOURCE NUMBER ESTIMATION. Seungchul Lee 2 2. University of Michigan. Ann Arbor, MI, USA.

Machine Learning Basics: Estimators, Bias and Variance

Easy Evaluation Method of Self-Compactability of Self-Compacting Concrete

arxiv: v1 [cs.lg] 8 Jan 2019

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Forecasting Financial Indices: The Baltic Dry Indices

Comparison of Stability of Selected Numerical Methods for Solving Stiff Semi- Linear Differential Equations

1 Bounding the Margin

Reduced Length Checking Sequences

Optimal Control of Nonlinear Systems Using the Shifted Legendre Polynomials

Randomized Recovery for Boolean Compressed Sensing

Short Papers. Test Data Compression and Decompression Based on Internal Scan Chains and Golomb Coding

On Constant Power Water-filling

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

1 Generalization bounds based on Rademacher complexity

Transcription:

Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification, function approxiation and regression probles. This paper presents an approach for classifying patterns fro siplified NNs. Although the predictive accuracy of ANNs is often higher than that of other ethods or huan experts, it is often said that ANNs are practically black boxes, due to the coplexity of the networks. In this paper, we have an attepted to open up these black boxes by reducing the coplexity of the network. The factor akes this possible is the pruning algorith. By eliinating redundant weights, redundant input and hidden units are identified and reoved fro the network. Using the pruning algorith, we have been able to prune networks such that only a few input units, hidden units and connections left yield a siplified network. Experiental results on several bencharks probles in neural networks show the effectiveness of the proposed approach with good generalization ability. Keywords: Artificial Neural Network, Pattern Classification, Pruning Algorith, Weight Eliination, Penalty Function, Network Siplification. 1 Introduction In recent years, any neural network odels have been proposed for pattern classification, function approxiation and regression probles [2] [3] [18]. Aong the, the class of ulti-layer feed forward networks is ost popular. Methods using standard back propagation perfor gradient descent only in the weight space of a network with fixed topology [13]. In general, this approach is useful only when the network architecture is chosen correctly [9]. Too sall a network cannot learn the proble well or too large a size will lead to over fitting and poor generalization [1]. Artificial neural networks are considered as efficient coputing odels and as the universal approxiators [4]. The predictive accuracy of neural network is higher than that of other ethods or huan experts, it is generally difficult to understand how the network arrives at a particular decision due to the coplexity of a particular architecture [6] [15]. One of the ajor criticis is their being black boxes, since no satisfactory explanation of their behavior has been offered. This is because of the coplexity of the interconnections between layers and the network size [18]. As such, an optial network size with inial nuber of interconnection will give insight into how neural network perfors. Another otivation for network siplification and pruning is related to tie coplexity of learning tie [7] [8]. 2 Pruning Algorith Network pruning offers another approach for dynaically deterining an appropriate network topology. Pruning techniques [11] begin by training a larger than necessary network and then eliinate weights and neurons that are deeed redundant. Typically, ethods for reoving weights involve adding a penalty ter to the error function [5]. It is hoped that adding a penalty ter to the error function, unnecessary connection will have saller weights and therefore coplexity of the network can be significantly reduced. This paper ais at pruning the 1 Assistant Professor, Departent of Coputer Science and Engineering, Manarat International University, Dhaka- 1212, Bangladesh, Eail: sk_iiuc@yahoo.co 2 School of Counication, Independent University Bangladesh, Chittagong, Bangladesh. Eail: ryadh78@yahoo.co

network size both in nuber of neurons and nuber of interconnections between the neurons. The pruning strategies along with the penalty function are described in the subsequent sections. 2.1 Penalty Function When a network is to be pruned, it is a coon practice to add a penalty ter to the error function during training [16]. Usually, the penalty ter, as suggested in different literature, is 2 2 h n β ( w h o h n h o l ) β ( v ) p 2 2 P( w, v) = ε 1 + + ε 2 2 2 ( wl ) + ( vp ) 1 l 11 β ( w ) 1 p 11 ( ) = = 1 l 1 1 p 1 l = = β v (1) + + p = = = = Given an n-diensional exaple i x, iε { 1, 2,..., k} as input, let wl be the weight for the connection fro input unit ll, ε { 1, 2,..., n} to hidden unit, ε { 1, 2,..., h} unit to output unit p, pε { 1, 2,..., o} and v p be the weight for the connection fro hidden, the p th output of the network for exaple x i is obtained by coputing h i Sp = σ α vp, where = 1 (2) n i α = δ xlwl, ( ) ( x )/ ( δ x = e e e e x ) l= 1 (3) i The target output fro an exaple x that belongs to class C j is an o-diensional vector t i, where t i p = 0 if p = j i and t p = 1, j, p = 1,2 o. The back propagation algorith is applied to update the weights (w, v) and iniize the following function: θ wv, = F wv, + P wv, (4) ( ) ( ) ( ) where F( w, v ) is the cross entropy function as defined k o i i i i (, ) = ( plog p + (1 p)log(1 p) ) F w v t S t S (5) i= 1 p= 1 and P( w, v ) is a penalty ter as described in (1) used for weight decay. 2.2 Redundant weight Pruning Penalty function is used for weight decay. As such we can eliinate redundant weights with the following Weight Eliination Algorith as suggested in different literature [12][14][17]. 2.2.1Weight Eliination Algorith: 1. Let η 1 and η 2 be positive scalars such that η 1 + η 2 < 0.5. 2. Pick a fully connected network and train this network such that error condition is satisfied by all input patterns. Let (w, v) be the weights of this network. w l 3. For each, if v w l 4 η 2 (6) Then reove wl fro the network

4. For each v, If v 4 η 2 (7) Then reove v fro the network 5. If no weight satisfies condition (6) or condition (7) then reove v w l. wl with the sallest product 6. Retrain the network. If classification rate of the network falls below an acceptable level, then stop. Otherwise go to Step 3. 2.3 Input and Hidden Node Pruning A node-pruning algorith is presented below to reove redundant nodes in the input and hidden layer. 2.3.1 Input and Hidden node Pruning Algorith: Step 1: Create an initial network with as any input neurons as required by the specific proble description and with one hidden unit. Randoly initialize the connection weights of the network within a certain range. Step 2: Partially train the network on the training set for a certain nuber of training epochs using a training algorith. The nuber of training epochs, τ, is specified by the user. Step 3: Eliinate the redundant weights by using weight eliination algorith as described in section 2.2. Step 4: Test this network. If the accuracy of this network falls below an acceptable range then add one ore hidden unit and go to step 2. Step 5: If there is any input node x l with w l = 0, for = 1,2 h, then reove this node. Step 6: Test the generalization ability of the network with test set. If the network successfully converges then erinate, otherwise, go to step 1. 3. Experiental Results And Discussions In this experient, we have used three benchark classification probles. The probles are breast cancer diagnosis, classification of glass types and Pia Indians Diabetes diagnosis proble [10] [19]. All the data sets were obtained fro the UCI achine learning benchark repository. Brief characteristics of the data sets are listed in Table 1. Table 1: Characteristics of data sets. Data set Input Output Output Training Validation Test Total Attributes Units Classes Exaples Exaples exaples exaples Cancer1 9 2 2 350 175 174 699 Glass 9 6 6 107 54 54 215 Diabetes 8 2 2 384 192 192 768 The experiental results of different data sets are shown in table 2, figure 1, 2 and 3. In the experiental results of cancer data set, we have found that a fully connected network of 9-3-2 architecture has the classification accuracy of

97.143%. After pruning the network with Weight Eliination Algorith and Input and Hidden node Pruning Algorith, we have found a siplified network of 3-1-2 architecture with classification accuracy of 96.644%. The graphical representation of the siplified network is given in figure 3. It shows that only the input attributes I 1, I 6, I 9 along with a single hidden unit is adequate for this proble. Output Layer Hidden Layer O 1 O 2 W 1 = -21.992443 W 6 = -13.802489 W 9 = -13.802464 V 1 = 3.035398 V 2 = -3.035398 Input Layer I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 W i = Input to Hidden Weight V i = Hidden to Output Weight I i = Input signal O i = Output Signal Active Weight Pruned Weight Active Neuron Pruned Neuron Figure 1: Siplified Network for Breast Cancer Diagnosis proble. O 1 O 2 W12 = -204.159255 W 13 = 74.090849 Output Layer W 14 = -52.965123 W 18 = 52.965297 W 21 = 47.038678 W 23 = 52.469025 Hidden Layer W 24 = 46.967161 W 25 = 46.967161 W 26 = 46.967161 W 27 = -46.967363 Input Layer W 28 = -46.967363 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 V 11 = -1.152618 V 12 = 1.152618 V 21 = -32.078753 V 22 = 32.084780 W i = Input to Hidden Weight V i = Hidden to Output Weight Active Weight Pruned Weight I i = Input signal Active Neuron O i = Output Signal Pruned Neuron Figure 2: Siplified Network for Pia Indians Diabetes diagnosis proble.

In the experiental results of Pia Indians Diabetes data set, we have found that a fully connected network of 8-3-2 architecture has the classification accuracy of 77.344%. After pruning the network with Weight Eliination Algorith and Input and Hidden node Pruning Algorith, we have found a siplified network of 8-2-2 architecture with classification accuracy of 75.260%. The graphical representation of the siplified network is given in figure 2. It shows that no input attribute can be reoved but a hidden node along with soe redundant connection has been reoved which have been shown with a dotted line in figure 2. O 1 O 2 O 3 O 4 O 5 O 6 Output Layer Hidden Layer Input Layer I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I i = Input signal O i = Output Signal Active Weight Pruned Weight Active Neuron Pruned Neuron Figure 3: Siplified Network for Glass classification proble. In the experiental results of Glass classification data set, we have found that a fully connected network of 9-4- 6 architecture has the classification accuracy of 65.277%. After pruning the network with Weight Eliination Algorith and Input and Hidden node Pruning Algorith, we have found a siplified network of 9-3-6 architecture with classification accuracy of 63.289%. The graphical representation of the siplified network is given in figure 3. It shows that no input attribute can be reoved but a hidden node along with soe redundant connection has been reoved which have been shown with a dotted line in figure 3.

Table 2: Experiental Results Data sets Cancer1 Diabetes Glass Results Learning Rate 0.1 0.1 0.1 No. of Epoch 500 1200 650 Initial Architecture 9-3-2 8-3-2 9-4-6 Input Nodes Reoved 6 0 1 Hidden Nodes Reoved 2 1 2 Total Connection Reoved 24 13 16 Siplified Architecture 3-1-2 8-2-2 9-3-6 Accuracy (%) of fully connected network 97.143 77.344 65.277 Accuracy (%) of siplified network 96.644 75.260 63.289 Fro the experiental results discussed above, it can be said that not all input attributes and weights are equally iportant. Moreover, it is difficult to deterine the appropriate nuber of hidden nodes. By pruning approach we can autoatically deterine an appropriate nuber of hidden nodes. We can reove redundant nodes and connections without sacrificing significant accuracy using network pruning approach discussed in section 2.2 and section 2.3. As such we can reduce coputational cost by using the siplified networks. 4 Future Work In future we will use this network pruning approach for rule extraction and feature selection. These pruning strategies will be also exained for function approxiation and regression probles. 5 Conclusions In this paper we proposed an efficient network siplification algorith using pruning strategies. Using this approach we obtain optial network architecture with inial nuber of connections and neurons without deteriorating the perforance of the network significantly. Experiental results show that the perforance of the siplified network is quite significant and acceptable copared to fully connected network. This siplification of the network ensures both reliability and reduced coputational cost. Reference [1] T. Ash, Dynaic node creation in backpropagation networks, Connection Sci., vol. 1, pp. 365 375, 1989. [2] R. W. Brause, Medical Analysis and Diagnosis by Neural Networks, J.W. Goethe-University, Coputer Science Dept., Frankfurt a. M., Gerany. [3] J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., Johannes, R. S., Using the ADAP learning algorith to forecast the onset of diabetes ellitus, Proc. Syp. on Coputer Applications and Medical Care (Piscataway, NJ: IEEE Coputer Society Press), pp. 261 5, 1988.

[4] S. E. Fahlan and C. Lebiere, The cascade-correlation learning architecture, in Advances in Neural Inforation Processing Syste 2, D. S. Touretzky, Ed. San Mateo, CA: Morgan Kaufann, pp. 524-532, 1990. [5] Sion Haykin, Neural Networks- A Coprehensive Foundation, Second Edition, Pearson Edition Asia, Third Indian Reprint, 2002. [6] T. Y. Kwok and D. Y. Yeung, Constructive Algorith for Structure Learning in feed- forward neural network for regression probles, IEEE Trans. Neural Networks, vol. 8, pp. 630-645, 1997. [7] M. Monirul. Isla and K. Murase, A new algorith to design copact two hidden-layer artificial neural networks, Neural Networks, vol. 4, pp. 1265 1278, 2001. [8] M. Monirul Isla, M. A. H. Akhand, M. Abdur Rahan and K. Murase, Weight Freezing to Reduce Training Tie in Designing Artificial neural Networks, Proceedings of 5 th ICCIT, EWU, pp. 132-136, 27-28 Deceber 2002. [9] R. Parekh, J.Yang, and V. Honavar, Constructive Neural Network Learning Algoriths for Pattern Classification, IEEE Trans. Neural Networks, vol. 11, no. 2, March 2000. [10] L. Prechelt, Proben1-A Set of Neural Network Benchark Probles and Bencharking Rules, University of Karlsruhe, Gerany, 1994. [11] R. Reed, Pruning algoriths-a survey, IEEE Trans. Neural Networks, vol. 4, pp. 740-747, 1993. [12] R. Setiono and L.C.K. Hui, Use of quasi-newton ethod in a feedforward neural network construction algorith, IEEE Trans. Neural Networks, vol. 6, no.1, pp. 273-277, Jan. 1995. [13] R. Setiono, Huan Liu, Understanding Neural networks via Rule Extraction, In Proceedings of the International Joint conference on Artificial Intelligence, pp. 480-485, 1995. [14] R. Setiono, Huan Liu, Iproving Backpropagation Learning with Feature Selection, Applied Intelligence, vol. 6, no. 2, pp. 129-140, 1996. [15] R. Setiono, Extracting rules fro pruned networks for breast cancer diagnosis, Artificial Intelligence in Machine, vol. 8, no. 1, pp. 37-51, 1996. [16] R. Setiono, A penalty function approach for pruning Feedforward neural networks, Neural Coputation, vol. 9, no. 1, pp. 185-204, 1997. [17] R. Setiono, Techniques for extracting rules fro artificial neural networks, Plenary Lecture presented at the 5th International Conference on Soft Coputing and Inforation Systes, Iizuka, Japan, October 1998. [18] R. Setiono W. K. Leow and J. M. Zurada, Extraction of rules fro artificial neural networks for nonlinear regression, IEEE Trans. Neural Networks, vol. 13, no.3, pp. 564-577, 2002. [19] W. H. Wolberg and O.L. Mangasarian, Multisurface ethod of pattern separation for edical diagnosis applied to breast cytology, Proceedings of the National Acadey of Sciences, U SA, Volue 87, pp 9193-9196, Deceber 1990.