COMPARISON STUDY OF SENSITIVITY DEFINITIONS OF NEURAL NETWORKS

Size: px

Start display at page:

Download "COMPARISON STUDY OF SENSITIVITY DEFINITIONS OF NEURAL NETWORKS"

Thomas Neal
6 years ago
Views:

1 COMPARISON SUDY OF SENSIIVIY DEFINIIONS OF NEURAL NEORKS CHUN-GUO LI 1, HAI-FENG LI 1 Machine Learning Center, Facult of Mathematics and Computer Science, Hebei Universit, Baoding 07100, China Department of educational administration, Hebei Universit, Baoding 07100, China licg@hbu.cn, lihf@hbu.cn Abstract: his paper compares the sensitivit definitions of neural networs output to input and weight perturbations. Based on the essence of the sensitivit definitions, the authors classif these sensitivit definitions into 3 categories: Noise-to- Signal Ratio, Geometrical propert of derivative, Angle perturbation in Geometr space. he characteristics of these 3 categories of sensitivit definition are discussed respectivel. It is sensible to classif these sensitivit definitions based on the essence of them for researchers can find other new sensitivit definitions of neural networs. Kewords: Sensitivit definition; categories; neural networs; comparison 1. Introduction Sensitivit analsis is an important issue of neural networs research. here had been large amount of sensitivit definitions of different neural networs emerged in the past decades [1-15]. Sensitivit analsis of neural networ s output to input and weight perturbation is expected to settle some appling problems [16-19]. It can be used to predict the effect of perturbations on the networ s output, quantif the error-tolerance and generalization abilities, guide designing a robust networ, and so on. herefore, there are more and more researchers mae much effort to do the sensitivit analsis of neural networ. hese proposed sensitivit definitions are based on distinguished methodologies and on different neural networs. In one hand, some sensitivit definitions are raised to appl on an ensemble of neural networs, while some on one special neural networ. Man sensitivit definitions are brought forward. For one special neural networ (eg. Multilaer perceptron), It is necessar to find some wa to classif these definitions. Consequentl, people can adopt different definition to settle down different problem to get the best results. here are man was to classif the sensitivit definitions. Some people thought that the sensitivit is defined in two approaches [5]: analtical approach and statistical approach. his classification is based on the measure space of the sensitivit definition. Another wa to classif the sensitivit definitions is based on the neural networs that the definition is applied to. Some definitions can be applied to MLP while some limited on Madalines. Some sensitivit definitions can be used for linear neural networs while some for nonlinear neural networs. hese was to classif the sensitivit definitions did not distinguish the essences of those definitions. In this paper, we studied the sensitivit definitions of the neural networs output to input and weight perturbations. According to the essence of the sensitivit definitions, we classified the sensitivit definitions into 3 categories: Noise-to- Signal Ratio, Geometrical propert of derivative, Angle perturbation in Geometr space. Some people proposed the ideas separatel, and some others defined the sensitivit definitions following and improving the idea. his would be sensible for people to find other new sensitivit definitions of neural networs.. Categories of sensitivit definition of neural networs Since the sensitivit of the neural networs emerged, man ideas are proposed to define the sensitivit of neural networs output to input and weight perturbations. Some of these ideas improved the original ideas, while some of them are applied to practical problems. he ideas were reformulated before application. Subsequentl, man new expression formulae appeared. Gathered these formulae together and detected their essences, we classif these sensitivit definitions into 3 approaches according to the borrowed ideas. e describe them in detail in the following 3 sections..1. Noise-to-Signal Ratio

2 Piché [1] raised the idea of Noise-to-Signal for the selection of weight accuracies for Madalines. his idea of sensitivit definition is based on the ratio of noise and signal. Noise, produced from the environment of the signal, affects the transformation of the signal. he Noise-to-Signal Ratio qualifies how much the noise affects the transformation of signal. Based on this idea, Piché regarded the noise to the signal as the perturbations to inputs and weights of the neural networs. He too into account the mathematical characteristics of random variables and proposed the sensitivit definition of neural networs. He defined the sensitivit definition on an Adaline and extended to Madaline. Yeung et al extended Piché s wor on Madalines to MLP and RBFNN [-8]. Given a structured neural networ, all the parameters of the networ are considered as random variables. If the output of the networ can be expressed b, the sensitivit of the networ to input and weight perturbations could be defined as: where NSR (1) is the variance of the output errors caused b weight or first laer input perturbations, and is the variance of the output. Piché derived the calculating formulae of Eq.(1) of Madaline for the input and weight sensitivit in the condition of the following issues: 1. It is assumed that an ensemble of possible ideal neural networs can be identified. hese ideal networs ma represent a general set of Madalines.. he architecture of all ideal neural networs is assumed to be identical. he networ has L laers with N l neurons full connected to the next laer. here is no bias input in the networ. he output of the networ is nonlinear, and all output operators are odd functions. 3. All the weights associated with each node of the networ are modeled as a vector of random variables. Over the ensemble of networs, the weights of a particular laer all have the same variance. he mean value of each weight of the networ over the ensemble is zero. All in all, the components of vector are considered to be zero-mean independent identicall distributed (iid) random variables. he distribution of these random variables is not particularl important. 4. he inputs of the networ are also modeled as a vector of random variables X. he inputs of the networ are also zero-mean independent identicall distributed (iid) random variables. he distribution of these random variables is not particularl important. (Combined the above points 3 and 4, the components of the input and output vector of each laer are independent identicall distributed (iid) random variables.) 5. he mathematical expectation of noise is assumed to be E ln 0 zero. Yeung et al extended Piché s wor on Madalines to MLP and RBFNN [-8]. Although the calculating formulae are reformulated, the sensitivit definition is based on the noise and signal ratio. he differences between them are the procedure to derive the expressions of the numerator and denominator of Noise-to-Signal Ratio. For example, ing et al [] extended the sensitivit definition from Madaline to RBFNN. he considered the sensitivit definition of neural networs output to input perturbations. he relative sensitivit is defined as the following formulae: D X X X SRD () D X where X is the input vector of the neural networ, X the perturbation of the input, D the variance of, the -th output of the networ. ing et al educed the calculating formulae of Eq.() b the calculating properties of variance for RBFNN. Yeung et al [7] reformulated the sensitivit definition and considered Multilaer perceptron with antismmetric squashing activation function. he introduced both the input and weight perturbation of the neural networ. he sensitivit is defined as the following formula: S RD Dg X D g X X D g X where g is the function expression of the networ. Eqs.(-3) were relative sensitivit as the said. Actuall, the two sensitivit formulae are defined in the same wa. he are the expression formulae of Eq.(1). Although these definitions were defined based on the same idea as Eq.(1), the other authors improved the formulae in different expression. For example, beside relative sensitivit definitions, ing et al [,7] also defined the absolute sensitivit definition without the denominator in Eq.(-3), shown as the following two formulae: S D X X X (4) D D (3) S D g X X g X (5) he above absolute sensitivit definition is defined according to the meaning of the variance as mathematical characteristic of random variable. Variance measures the expected deviation amplitude of the value of the random variable from the expected value of the random variable.

3 .. Geometrical propert of derivative First, let us remind some definition in analtical mathematics. Given a differentiable continuous function f x, its first moment can be derived to be f x with respect to x. f x 0 is the slope of function f x in the point x 0, denoting the change rate around x 0. Correspondingl, f x is the function of slope in the definition field of x. Based on this meaning, Zurada et al [9] proposed that the sensitivit of neural networs output to input perturbation can be defined as the derivative of the neural networs. Zurada et al considered a MFNN with a single hidden laer. he networ is regarded as a nonlinear and f x, where differentiable function,,,,,,, 1 x x1 x x n, and n are the number of output and input neurons respectivel. Since is differentiable at x, Jocobian matrix Eq.(6) is derived firstl. he matrix can be evaluated over the entire training set according to several was such as mean square average, absolute value average, and maximum value. Following this sensitivit definition, similarl definitions are proposed to improve and extend the sensitivit definition. ang et al [8] extended this definition to random measure space and defined the sensitivit of RBFNN s output to input regarding the input vector as random vector x1 x x n J x1 x xn (6) x1 x x n he characteristics of this sensitivit definition is listed as the following issues: 1. he neural networs are required to be differentiable and feedforward. his sensitivit definition is unsuitable for perceptron. Because the active function of a perceptron is non-differentiable.. he sensitivit is defined for the neural networs with continuous valued input vector. If the input vector is random vector, this sensitivit definitions would be modified, just as in [8]. 3. Zurada et al defined the sensitivit of the networs outputs to input perturbations. It can be extend to the sensitivit to weight perturbations if the networ is differentiable with respect to the weight variables. 4. his definition sufficientl maes use of the training set to calculate the sensitivit of each input variable. 5. he sensitivit of the input of the networ is calculated for trained networ. herefore, the training algorithm of the networ affects the absolute sensitivit of the input..3. Angle perturbation in Geometr space Marhelen Stevenson et al proposed this idea to define the sensitivit in the geometr space. he too the input and weight perturbation as errors to input and weight. he defined the sensitivit as the probabilit of decision error due to input and weight error. he sensitivit definition is extended from Adaline to Madaline. For an Adaline with n+1 variable inputs X x0 x1 x x n including a threshold input:,,,,, the inputs tae on binar values of either +1 or 1. Associated with the Adaline are n+1 adjustable analog weights: w0, w1, w,, w n. he input-output map of the Adaline can be summarized as 1, 0 output of Adaline X (7) 1, X 0 A Madaline is a laered networ of Adaline elements. herefore the sensitivit definition of Adaline can be extended to Madaline. Marhelen Stevenson et al firstl defined the sensitivit of Adaline. Assuming binar-valued inputs, there are n possible input patterns for an Adaline with n variable inputs. Each input pattern corresponds to a point in n space, which lies on a hpersphere of radius n centered at the origin. here is a hperplane perpendicular to which passes through the origin, where 0 1 x, x, x,, x : X 0. his hperplane, w n, separates the X for which X 0 from those X for which X 0 (i.e. it separates those input vectors which ield a +1 response from those input vectors which ield a 1 response). divides the hpersphere into two hemi- hperspheres. If a randoml oriented perturbation vector is added to, the perturbed weight vector corresponds to another hperplane p p. p would probabl produce decision errors. A decision error is made when an arbitrar input vector is mapped into opposite output categories b and p. he probabilit of a decision error due to an weight perturbation is defined:

4 P Decision Error / (8) where is the angular between and p. It is the sensitivit definition to weight errors. For practical applications, the probabilit of a decision error is derived in terms of the weight perturbation ratio. his definition is based on the Hoff hpersphere-area approximation. he Hoff hpersphere-area approximation states that as n gets large, the points corresponding to the n-dimensional input patterns are approximatel uniforml distributed over the surface of a hpersphere in n space. Consequentl, the percentage of input patterns, which correspond to points on a selected region of the hpersphere, can be approximated as the ratio of the surface content of the selected region to the surface content of the entire hpersphere. his sensitivit definition is convenienc to calculate the sensitivit to weight perturbations. he calculating formulae do not depend on the structure of the networ. hat the sensitivit utilizes is the geometr distribution of the training patterns. he characteristics of this sense of sensitivit definition is listed as the following points: 1. he input vectors are required to be binar-valued, and the output active function of the networ is required to be hard limited function. he active function is non-differentiable. Reminding the limitation of the sensitivit definition, the sensitivit definition here is an alternative definition when the active function of the networ is non-differentiable.. hen n is small (eg. n=10), the points corresponding to the n-dimensional input patterns are hardl possible uniforml distributed over the surface of a hpersphere in n space. In this condition, the calculated probabilit b Eq.(8) would be hard to believe. 3. his stud focused on the sensitivit of Adaline and Madaline output to weight error. For the sensitivit of Adaline and Madaline output to input error, it leaves as the further research, or taes the other wa to define the sensitivit. And for the other neural networs, we should thin over whether this wa to define the sensitivit is suitable or not. 4. he approximating computation formulae are just for large enough n and small enough. Cheng and Yeung [14] also used a geometrical technique to investigate the sensitivit of Neocognitions b modifing the hpersphere model. Unfortunatel the hpersphere model the used fails for the case of the MLP because its input and weight vectors generall do not span a hpersphere surface or volume. 3. Comparison of sensitivit categories he above 3 categories of the sensitivit definitions are modeled based on distinguished methodologies. here are advantages and disadvantages when these definitions are used in applications. he table below gives a straight sight into the comparison of these 3 categories of the sensitivit definitions. Methodolog Noise-to-Signal Ratio Geometr propert of derivative Angle perturbation able 1. Comparison of sensitivit definitions of different methodologies. Input or weight Ensemble of neural perturbation Requirement of Input vector networ the networ Input eight Madaline Adaline Multilaer Madaline Adaline Input and output vector are iid respectivel. Differentiable Binar valued input and output Random Continuous valued Discrete valued From the table above, we can see clearl that different sensitivit definitions are suitable to deal with different ensembles of networs. In application, mabe of the 3 sensitivit definitions are suitable to define the sensitivit definitions of the networs output to input perturbations or weight perturbations. For example, if we are studing the sensitivit of the Adaline s output to weight perturbations, both Noise-to-Signal ratio and angle perturbation can be used to help to sensitivit analsis. But which of them is better to sensitivit analsis is left to further research. 4. Conclusion his paper compares several sensitivit definitions of

5 neural networs output to input and weight perturbations. he sensitivit definitions are classified into 3 categories according to the essence of the sensitivit definition. hese 3 categories of sensitivit definitions can deal with different ensemble of neural networs. his wa of classif the sensitivit classification can help to detect the new sensitivit definition of neural networs output to input and weight perturbations. here are still man points to focus on. From able 1, we can see that it is suitable to define the sensitivit definition of a feedforward neural networ s output to weight perturbation based on all the 3 categories. So, which is the best for the sensitivit analsis in this situation is left for the further research. It is necessar to carr out plent of experiments to help discover what is the best answer of the above problem. References [1] Stephen. Piché: he Selection of eight Accuracies for Madalines. IEEE ransactions on Neural Networs. Vol.6, No., March 1995, [] ing.y. Ng, Daniel S. Yeung, Q. Ran, Eric C.C. sang: Statistical Output Sensitivit to Input and eight Perturbations of Radial Basis Function Neuron Networs. IEEE Proceedings of the International Conferences on Sstem, Man and Cbernetics, 00, [3] Jin Young Choi, Chong-Ho Choi: Sensitivit Analsis of Multilaer Perceptron with Differentiable Activation Functions. IEEE ransactions on Neural Networs. Vol.3, No.1, Januar 199, [4] ing.y. Ng, D.S. Yeung: Electronics Letters, 15 th Ma 003, Vol.39, No. 10, [5] Xiaoqin Zeng, Daniel S. Yeung: Sensitivit Analsis of Multilaer Perceptron to Input and eight Perturbations. IEEE ransactions on Neural Networs. Vol.1, No.6, November 001, [6] Xiaoqin Zeng, Daniel S. Yeung: A Quantified Sensitivit Measure for Multilaer Perceptron to Input Perturbation. Nural Computation, 15, 003, [7] Daniel S. Yeung, Xuequan Sun: Using Function Approximation to Analze the Sensitivit of MLP with Antismmetric Squashing Activation Function. IEEE ransactions on Neural Networs, Vol. 13, No. 1, Januar 00, [8] Xizhao ang, Chunguo Li: A New Definition of Sensitivit for RBFNN and Its Applications to Feature Reduction. Second International Smposium on Neural Networs, LNCS 3496, 005, pp [9] Jace M. Zurada, Alesander Malinowsi, Shiro Usui. Perturbation method for deleting redundant inputs of perceptron networs. Neurocomputing 14 (1997) [10] Sherif Hashem: Sensitivit Analsis for Feedforward Artificial Neural Networs with Differentiable Activation Functions. Proceedings of the 199 International Joint Conference on Neural Networs, Baltimore, IEEE, New Jerse, Vol. I, pp [11] Masato Koda: Neural Networ Learning Based on Stochastic Sensitivit Analsis. IEEE ransactions on Sstems, Man and Cbernetics-Part B: Cbernetics. Vol. 7, No.1, Februar 1997, [1] Lale Ozilmaz, ula Yildirim: Sensitivit Analsis for Conic Section Function Neural Networs. IJCNN (4) 000,44-5 [13] Marhelen Stevenson, Rodne inter, Bernard idrow: Sensitivit of Feedforward Neural Networs to eight Errors. IEEE ransactions on Neural Networs, Vol. 1, No. 1, March 1990, [14] Anton Y. Cheng, Daniel S. Yeung: Sensitivit Analsis of Neocognitron. IEEE ransactions on Sstems, Man and Cbernetics-Part C: Applications and Reviews, Vol. 9, No., Ma 1999, [15] Cesare Alippi, Vincenzo Piuri, Mariagiovanna Sami: Sensitivit to Errors in Artificial Neural Networs: A Behavioral Approach. IEEE ransactions on Circuits and Sstems-I: Fundamental heor and Applications, Vol. 4, No. 6, June 1995, [16] D. Shi, D.S. Yeung, J. Gao. Sensitivit analsis applied to the construction of radial basis function networs. Neural Networs, 18 (005), [17] ing.y. Ng, Daniel S. Yeung. Input dimensionalit reduction for Radial Basis Neural Networ classification problems using sensitivit analsis. Proceedings of the First International Conference on Machine Learning and Cbernetics, Beijing, 00, [18] ing.y. Ng, Roc K.C. Chang, Daniel S. Yeung: Dimensionalit Reduction for Denial of Service Detection Problems Using RBFNN Output Sensitivit. Proceedings of the Second International Conference on Machine Learning and Cbernetics, Xi an, 003, [19] Nicolaos B. Karaiannis: Reformulated Radial Basis Neural Networs rained b Gradient Decent. IEEE ransactions on Neural Networs, Vol. 10, No. 3, Ma 1999,

Simple Neural Nets For Pattern Classification

CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification