Mixed Transfer Function Neural Networks for Knowledge Acquisition

Mixed Transfer Function Neural Networks for Knowledge Acquisition M. Iad Khan, Yakov Fraan and Saeid Nahavandi Abstract Modeling hels to understand and redict the outcoe of colex sstes. Inductive odeling ethodologies are beneficial for odeling the sstes where the uncertainties involved in the sste do not erit to obtain an accurate hsical odel. However inductive odels, like Artificial Neural Networks (ANNs, a suffer fro a few drawbacks involving over-fitting and the difficult to easil understand the odel itself. This can result in user reluctance to accet the odel or even colete rejection of the odeling results. Thus, it becoes highl desirable to ake such inductive odels ore corehensible and to autoaticall deterine the odel colexit to avoid over-fitting. In this aer, we roose a novel te of ANN, a Mixed Transfer Function Artificial Neural Network (MTFANN, which ais to irove the colexit fitting and corehensibilit of the ost oular te of ANN (MLP - a Multilaer Percetron. Index Ters inductive odeling, neural networks, ixed transfer functions, over-fitting, odel colexit I I. INTRODUCTION nductive odeling ethodologies are useful where the uncertainties involved in the sste do not erit to obtain an accurate hsical odel. Artificial Neural Networks (ANNs are one of these ethods successfull alied to odel a wide variet of robles exloiting their universal aroxiation roert [, 2, 3]. The ost coonl used te of ANN is a Multi-Laer Percetron (MLP which although owerful could ake it difficult to answer basic questions like how the odel was learnt and what it has learnt due to the usage of colex transfer functions like sigoids which can also lead to overfitting. This can result (and has been observed in ractice in user reluctance to accet the odel or even a colete rejection of odeling results. The lack of the ease of understanding of what and how the MLP odel has learnt about the roble stes fro the fact that the knowledge reresented b the MLP odel is concentrated in the weights This work was suorted b the Co-oerative Research Center for Allo and Solidification Technolog (CAST I. M. Khan is with the Institute of Technolog Research and Innovation (ITRI, Deakin Universit, Waurn Ponds Caus, Geelong 327, Australia (e-ail: ik@deakin.edu.au Y. Fraan is with the Institute of Technolog Research and Innovation (ITRI, Deakin Universit at Burwood Caus, Elgar Road, Burwood, VIC, 325, Australia (e-ail: fra@deakin.edu.au S. Nahavandi is with the Institute of Technolog Research and Innovation (ITRI, Deakin Universit, Waurn Ponds Caus, Geelong 327, Australia (e-ail: nahavand@deakin.edu.au and the transfer functions (TF of its neurons. However, weights being nubers are not easil interretable. TFs used in the current MLP networks being soe sort of colex functions, while having a caabilit of aroxiating colex robles in a fair nuber of neurons and laers are also not easil interretable. Norall, all neurons in a MLP network use the sae transfer function (e.g. sigoid/herboloid tangent which also liits the odel flexibilit and can lear to over-fitting. It is highl desirable to ake such neural network odels ore corehensible and to autoaticall deterine the aroriate colexit of the odel to avoid over-fitting. An iroveent in corehensibilit has a otential to hel understanding of underling relationshis between the inuts and the oututs of the sste that can irove existing knowledge about the odeled sste. In the reainder of this aer we will describe the current ono-transfer function MLP and the roosed MTFANN. We will also resent an overview of the existing knowledge extraction ethods fro MLP networks and deonstrate exected benefits of using MTFANN on the nuerical exale fro DELVE. II. MONO-TRANSFER FUNCTION MULTI-LAYER PERCEPTRON Multi-Laer Percetron, the ost coon te of Artificial Neural Networks, are a nueric data rocessing structure consisting of connections and nodes that uses divide and conquer strateg to rocess the inforation it receives fro its environent. Matheaticall this data structure can be reresented as: i= n γ = Lη= NL = Φ xiθ ij γ = η= i= ( (2. Here is an outut, is attern nuber, r reresents noise and n is the nuber of outut coonents (neurons in the outut laer, θ is the araeter of the odel, Φ is the transfer function and i and j are the indexes reresenting neurons fro current to next laer under consideration, and θ ij is the araeter associated with i and j th neuron. The equation 2. reresents a connection fro i to j with the araeter value (weight which reresents the strength of the connection between the two neurons. Authorized licensed use liited to: DEAKIN UNIVERSITY LIBRARY. Downloaded on June 07,200 at 05:04:3 UTC fro IEEE Xlore. Restrictions al.

The notation can be exlained further. is a r n cobination of and = E( n + r (2.2 = R( x + r, n r E( n (2.3 The atheatical exectation is regressed [4] as a ( R x regression function while the noise r is added as it is, to each odeled outut for both the exectation and regression equations the equations 2.2 and 2.3. Ideall, the roble is odeled as the regression exression R in equation 2.3 and the noise generall results in a rediction error in an MLP odel for the roble. The MLP uses divide-and-conquer strateg and al TFs on the weighted su fro one laer to the next laer to reach the solution after being trained. Several exales of data eleents or atterns are shown during the training session to the network until the network reaches a desired level of accurac deterined b the energ function b aling a training/learning algorith. In the case of the ost coonl used feed-forward MLP, back-roagation training algorith [5] is norall used. The weights are odified during learning as deterined b the training algorith. The Energ Functions coonl used are Su of Squared Error (SSE, Mean Squared Error (MSE and Root Mean Squared Error (RMSE: i N = = 2 ( on i RMSE = N (2.4 Here N is the total nuber of data eleents assed through the network. The current tes of MLPs norall have a fixed TF in an one laer. Φ in equation 2. is the fixed TF. The fixed TFs generall used are either sigoid: f ( x = x (2.5 + e or herbolic tangent function: f ( x = Tanh( x (2.6 The ajor roble with these kinds of networks is their black-box behavior. These MLPs have the caabilit to odel a roble quite well but their accetabilit is often questioned b the doain exerts due to difficult to corehend the network. III. EXISTING KNOWLEDGE EXTRACTION METHODS FROM NEURAL NETWORKS To address the lack of corehensibilit of the MLPs, several knowledge extraction ethods have been develoed to convert ono-transfer function MLP odels into a ore userfriendl forat. Most of such work is based on acquisition of sbolic IF-THEN, and/or a kind of redicate logic rules although attets have been also ade to extract ore equation like rules [6, 7]. For exale, attets have been ade to extract the knowledge fro MLPs [7] and to redesign MLPs to silif the [8]. Sentiono et al. [6] for exale have extracted linear regression rules b linear aroxiation of sigoid TF. The obtained rules consisted of an antecedent in data doain and a consequent in the for of linear regression related to the cluster of data reresented b the antecedent. Towel and Shavlik s [9] subset algorith atteted to find out all subsets of weights that are ore than the bias of a given neuron. However, this algorith faces a coutational colexit roble, which essentiall eans that an increase in size of the network is accoanied b an increase in nuber of weights and hence an increase in search sace. To tackle the search sace colexit roble, heuristics were alied that while able to substantiall reduce algorithic search colexit, can result in incolete and unsound rules. Additionall a good heuristic requires a sound knowledge of the roble doain which is not alwas available and is a riar otivator for alication of inductive odeling aroaches. Gracez, Broda and Gabb [0] have alied artial ordering to the inut vector set and then used a edagogical extraction with runing of the search sace to reduce the search sace colexit. A edagogical aroach to rule extraction still treat the network as a black box and extracts rules b quering the network. The authors suort their aroach with strong theores that rove that the extraction algorith is sound and colete for regular networks which are defined as the network with entire set of weights fro hidden to outut laer being all ositive or negative which is not alwas the case in realit. The authors further devise a ethodolog to extend their algorith for non-regular networks. The define Basic Neural Structures (BNS and then rove a series of theores to conclude that their ethod is sound and colete. BNS is a sile for of MLP with no hidden neurons and onl one outut neuron. BNS is basicall defined to be extracted as a subart of a MLP. Matheaticall it is defined as: let N be a neural network with inut neurons, r hidden neurons and q outut neurons. A sub-network N 0 of N is a Basic Neural Structure (BNS if and onl if either N 0 exactl contains inut neurons, hidden neuron and 0 outut neurons of N, or N 0 contains exactl 0 inut neurons, r hidden neurons and outut neuron of N, deending on which laer of N is under consideration to be odelled as a BNS. Note that BNS itself is considered not to have an hidden laers of its own, but a contain hidden laer neurons fro the arent network. The nuber of BNS in a network is a su of the nuber of hidden Authorized licensed use liited to: DEAKIN UNIVERSITY LIBRARY. Downloaded on June 07,200 at 05:04:3 UTC fro IEEE Xlore. Restrictions al.

laer and outut laer neurons r + q. The work b Setiono and Azcarraga [], Sentiono, Leow, and Zurada [6] and Saito and Nakano [2] is interesting fro the function aroxiation oint of view. Algorith of Sentiono et al. [6, ] consists of runing the final network to obtain a silified odel, clustering the data and then aroxiating transfer function with iecewise linear functions with IF-THEN rules caturing non-onotonicit of the network and the roble doain. A finer division of inut sace results in ore accurate rules but significantl increases the nuber of rules. Norall a balance should be aintained between the nuber of rules and their colexit. The authors assued that the corehensibilit of the rules lies in their silicit easured b the nuber of rules versus their accurac however silicit can be difficult to define in the context of rule extraction. For exale a degree of silicit deends on the target audience of the extracted rules. The corehensibilit of the rules a also deend on the rule for, such as for exale cris logic roositional/redicate logic rules [7], fuzz rules [3, 4], regression equation based rules [6], eal achine autoata [5], finite state autoata [6] and decision trees [7]. Hence it can be concluded that the corehensibilit of the rules deends on their for as well as audience. Local function networks like Radial Basis Function (RBF networks have a suitable architecture for rule extraction, but the task of extraction is not trivial because of nonlinear functions or high inut diensionalit [8] and the fact that hidden units are shared across several outut classes or a not contribute to an of the. The REX and hrex algoriths of rule extraction resented b the authors in [8], extracts rules fro hidden neurons associated with a single outut class and shared hidden neurons resectivel. It is ossible to irove these algoriths b negating the sign of a negative weight (b taking the absolute value of and al a negation on the activation value of the hidden neuron fro where the negative weight connection originated which is siilar to the regularization of network weights [0]. The idea of droing negative a weight altogether is not ver elegant as it can result of the loss of soe valuable knowledge reresented b MLP odel. In the case of RBF networks with shared hidden units, this can have an effect on all the forats of the rules. For exale, considering the case where hidden units H n are associated with the class A, viz., we know that the attern belongs to class A if unit N is activated. The current hrex algorith [8] if used for extraction of inequations will result in inequations of the for: If (H AND H 2 AND H n Class A (3. If (w h + w 2 h 2 + w n h n T A Class A (3.2 Here h n reresents activations of the resective neurons H n and w n are weights connecting the shared neurons H n with the class A outut neuron and T A is a threshold required to classif a attern as class A. Now, we argue in contrar to [8] that negative weights are iortant because the can lead to the reservation of soe iortant knowledge. To show our arguent, let s suose that w 2 is negative, and then the last rule becoes: Let V = - w 2 (3.3 If (w h + Vh 2 + w n h n T A Class A (3.4 If (w h + w 2 (-h 2 + w n h n T A Class A (3.5 Here it can be seen that a negative value in the activation (or in the inut data itself can be excitator rovided that there is a resence of negative weight values connecting the with the next laer. It is an iortant discover in the knowledge about network s inut and negative activation values that has been disregarded b the current hrex extraction algorith. We can further silif the inequalit b writing in a ore foral and siler for if we substitute variables x,, z in the lace of activation values (or inuts and relace the weight values with constants a, b, c. If (ax+b+cz>= T A Class A (3.6 It can be noticed that the knowledge that a negative inut (or activation can be excitator is of significant iortance because it could be the case that network has never seen a negative value during training. If we can extract this inforation fro a trained network, it can serve as a warning that an incoing negative value can result in an unreliable odel rediction. It can be seen fro above discussion that the area of extracting knowledge fro MLP networks uses advanced algoriths which are coutationall intensive that can lead to sacrificing soe odelling accurac for coutational feasibilit. It can be argued, that there is a need for a schee that can trivialize knowledge extraction fro the network so that there is a inial or no need for a secial knowledge extraction algorith. It is also necessar to use an autoatic network construction echanis since it is iortant to achieve a network odel the lowest colexit that can add to high corehensibilit of the odel and also will hel to avoid over-fitting. The roosed neural network and the construction algorith to address the above ais are resented in the next section. IV. MIXED TRANSFER FUNCTION ARTIFICIAL NEURAL NETWORK If we reove the liitation of using onl ono-transfer function in an MLP network and use instead a nuber transfer functions with various colexit within the sae network, the knowledge extraction algorith could becoe inial and has the otential to be ore corehensible than the ethods discussed in the revious section. Consequentl, we roose a novel te of MLP, a Mixed Transfer Function Artificial Neural Network (MTFANN, which ais to Authorized licensed use liited to: DEAKIN UNIVERSITY LIBRARY. Downloaded on June 07,200 at 05:04:3 UTC fro IEEE Xlore. Restrictions al.

irove the colexit fitting and corehensibilit of the ost oular te of MLP the ono-transfer function feedforward network (FFN described reviousl. The ain otivation for MTFANN is to create a neural network which is corehensible, caable of odeling a wide range of robles, and at least coarable to current MLP in ters of accurac and generalization. One iortant goal is to aintain the accurac of the odel as oosed to existing knowledge extraction ethods fro neural networks which generall coroise accurac for higher corehensibilit. The nature of the roble is that it is generall ver difficult to obtain a greater corehensibilit of a neural network odel without losing soe accurac. Thus it is ver iortant that MTFANN are suleented with an autoatic network construction algorith to ensure network architecture with inial colexit (nuber of nodes and laers that can solve roble with a high degree of accurac [9] (referred to as otial architecture in ost of the existing literature. The otial network architecture ensures a good generalization [9] and a inial nuber of coonents of acquired knowledge (equations, rules, etc. in knowledge reresentation, which in fact gives a boost to corehensibilit. MTFANN essentiall have the sae roerties as described in the section II with the use of TFs of different aroxiation colexities a hidden laer. i= n γ = = Lη = NL r, N Φη( xiθij γ = η= i= (4. Φ η in the above equation reresents the TFs of various aroxiation colexities, while all the other sbols of the equation are essentiall the sae as in section II. The TFs that can be used are for exale linear, olnoials, logarithic, exonential, sigoid and herbolic tangent and so on. Sil seaking, an coutable atheatical function can be used as a TF. The roosed network construction algorith is a Transfer Function Selection and Allocation Algorith. The algorith begins with a dataset D defined as D = {( x, }, : N, N = Size( D (4.2 = The doain and range sets can be searated fro D: For inuts: I = { x =,2,3,..., l} (4.3 For oututs: Y = n,2,3,..., } (4.4 { = o The outut set is an exectation of outut Y and the noise = E( n r (4.5 + = R( x + r (4.6 The roble can be forulated as a search roble over the set of available transfer functions, which in general is infinite. T = φ, φ,,...} (4.7 { 2 φ3 We can take a artiall ordered finite set of T over the colexit doain, which eans that the set is ordered according to the colexit of TF which can be a atheatical colexit or corehensibilit. Generall, the corehensibilit and the atheatical colexit go together. T = φ, φ, φ,..., φ } (4.8 q { 2 3 q We define the TF selection oerator Γ that is oerating on its four araeters that results in a TF, and which on the general case is defined as: Γ ( I, Y,, T, ε = φ φ, φ, 0 l q (4.9 r n x, r, n, T, S( q a l Γ ( ε = φ _ (4.0 Γ... Γ k ( x,, T k, S( ε = φ (4. k+ ( x, r, n, T k+, S( ε = φ + l k + q (4.2 If required, a network silification can be done on the basis of following criteria: φ φ k + 0 N( φ + = φ k φ + 0 (4.3 φ k+ φ 0 φ φ 0 l q, φ 0 (4.4 k k+ z Thus, MTFANN contains neurons with transfer functions (TF of a different aroxiation colexities in contrast to TFs with the sae aroxiation colexit as widel used in sigoid and herbolic tangent based FFNs. MTFANN uses an add-node based algorith that autoaticall construct the network starting with sile transfer functions like linear and iterativel adds nodes with increasing TF colexit as required. Due to this kind of add-node strateg MTFANN are less likel to over-fit the data in coarison with the current sigoid and herbolic-tangent FFN. The usage of ixed TFs erits to obtain inut-outut relationshis in the for of regression equations without the need for an secific knowledge extraction algorith. It also enables to coare the influence of inuts on the oututs with a degree of transfer colexit which cannot be done if the all the neurons in the network have the sae TFs as it is the case with current FFNs. Thus, MTFANN rovides us with a fraework to i fit the colexit of the sste being odeled fro lower to higher ii straightforward knowledge acquisition to obtain relationshis between inuts and oututs in the roble doain iii increased corehensibilit and thus increased user accetabilit of inductive odeling results. Authorized licensed use liited to: DEAKIN UNIVERSITY LIBRARY. Downloaded on June 07,200 at 05:04:3 UTC fro IEEE Xlore. Restrictions al.

V. A NUMERICAL EXAMPLE To evaluate both the erforance of MTFANN and the qualit of the extracted knowledge, a Puadn fail of data sets fro DELVE [20] was used. Puadn fail of datasets is a realistic siulation of the dnaics of a Pua 560 robot ar. The task is to redict angular acceleration of one of the robot ar's links. The inuts include angular ositions, velocities and torques of the robot ar. These data sets have been secificall generated for the DELVE environent so the individual data sets san the corners of a cube whose diensions reresent: (a nuber of inuts (8 or 32; (b degree of non-linearit (fairl linear or non-linear; (c aount of noise in the outut (oderate or high. DELVE defines the aount of noise in the outut as a fraction of the variance that would reain unexlained if the universal aroxiator is used on an infinite training set. If this residual variance exceeds 25%, the noise is considered high. A task is defined b DELVE as highl non-linear if the linear ethod would leave unexlained ore than 40% residual variance on an infinite training set. In this work, we have used Puadn-8nh dataset. Here 8 stand for 8 inuts and nh stands for high non-linearit and high noise. The MTFANN construction algorith of the revious Section was alied beginning with a sile linear node. Then another linear node was added. Addition of ore linear nodes did not resulted in further error reduction. Then a linear node with bias was added. The addition of linear with bias node did not result in a significant reduction of error so it was reoved. Now the first non-linear Sin node was added that resulted in a significant decrease in error. After addition of two further Sin nodes, the fourth Sin node did not result in an significant reduction in error and was reoved. The addition of Exonential node also did not result in an significant reduction in error. Thus the final odel contained two linear and three Sin neurons. The erforance of MTFANN was also coared with nine other achine learning ethods fro DELVE. The ethods used are e-ese- (Mixtures-of-exerts trained with earl stoing, he-ese- (Hierarchical ixtures-of-exerts trained with earl stoing, lin- (Linear regression with least squares, e-el- (Mixtures-of-exerts trained with Baesian ethods (enseble learning, l-ese- (ensebles of ultilaer ercetrons using earl stoing, with single hidden laer and herbolic tangent transfer function with linear outut neurons and conjugate gradient learning echanis, ars3.6-bag- (Multivariate Adative Regression Slines (MARS version 3.6 with Bagging, hegrow- (Hierarchical ixtures-of-exerts trained using growing and earl stoing and he-el- (Hierarchical ixtures-of-exerts trained with Baesian ethods (enseble learning. The Table shows that MTFANN is erforing better than alost all the other achine learning ethods on the Pua dataset with high non-linearit and high noise. The onl excetion is an MLP enseble. The MTFANN residual error is still lower than that of an MLP enseble however the t-test does not deonstrate a significant difference between the two ethods. However, an MLP enseble consists of ultile neural networks of highl colex neurons (herbolic tangents and thus is ver difficult to corehend in contrast to roosed MTFANN which we deonstrate below. Methods SSE t-test MTFANN 9.968650 N/A 2 e-ese- 2.837039 0.000342092448 3 he-ese- 4.82068 0.000602500055 4 lin- 9.9536449 0.000078556 5 e-el-.3862586 0.0558037 6 l-ese- 0.8294249 0.099372097 7 Mars3.6-bag- 0.8294249 0.023404077 8 He-grow- 6.8052 0.002628025 9 He-el-.4985066 0.00423040496 Table : Perforance of MTFANN against other ethods on Puadn-8x dataset fro DELVE To convert the obtained network for Pua-8nh roble into the corehensible regression equation forat, it is useful to first divide the network to obtain a regression equation fro each regressor (a hidden neuron and then to add those equations together to obtain the colete reresentation of the acquired odel in the regression equation forat. For exale, to obtain equation fro the st linear regressor (the st hidden neuron, the network can be reresented as in Figure. Figure : The network view to obtain the regression equation for the outut of the first hidden neuron. Fro Figure, the following exression (equation for first neuron can be extracted: = 0.22073(-0.47556x8-0.5226x + 0.62305x - 0.59345x 2 7 + 0.2887x - 0.35547x + 0.42708x 6 5 4 + 0.84282x Thus we can obtain a regression equation that links a first coonent of the angular acceleration of the robot ar with the angular ositions of the resective links x, x2, x 3 and the angular velocities for links -3 x, x 4 5, x 6 3 Authorized licensed use liited to: DEAKIN UNIVERSITY LIBRARY. Downloaded on June 07,200 at 05:04:3 UTC fro IEEE Xlore. Restrictions al.

resectivel and the torques at joints and 2 resectivel x, x 7 8. The equation for second linear hidden neuron can be obtained in the sae fashion. The regression equation obtained fro the third hidden neuron is a sine-based equation since neurons starting fro this osition contains sine transfer functions: [2] M. I. Khan, Y. Fraan and S. Nahavandi, High-Pressure Die-Casting Process Modeling Using Neural Networks, in J. Karuzzaan, R. Begg and R. Sarker (eds, Artificial Neural Networks in Finance and Manufacturing,. 82-98, 2006. [3] I. Gabriel and A. Dobnikar, On-line identification and reconstruction of finite autoata with generalized neural networks, Neural Networks, vol. 6, 0-20, 2003. [4] I. Rivals and L. Personnaz,, Neural-Network Construction and Selection in Nonlinear Modeling, IEEE Transactions on Neural Networks, Vol. 4, No. 4,. 804-89, 2003. 3 =.357 SIN (0.(0.03297x8-0.585x7-0.04995x6-0.5068x5-0.24x4 [5] Ruelhart, Hiton and Willias, Learning Internal Reresentations b + 9.95252x + 0.5742 + 0.0606x + 0.0575 3 3 x 2 Here reresent a third coonent of the angular acceleration of the robot ar. The factor of 0. is a sine factor in the ileentation of the sine function in Stuttgart Neural Network Siulator (SNNS used in the exerients. Two further equations based on Sine function can be obtained fro other Sin based neurons in a siilar fashion. The final equation which is an additive cobination of the equations for each hidden neuron, while being long, consists of sile, easil corehensible eleents. Additionall we have coared MTFANN with a sigoid MLP of the sae colexit (5 hidden neurons as the Table aarentl shows that MTFANN siilar in odeling accurac to that of a enseble of MLPs with a uch lower corehensibilit than MTFANN. The results of these exerients are suarized in Table 2. Method SSE t-test MTFANN 9.968650 0.000658435 Sigoid 25.02799 Table 2: Coarison between MTFANN and MLP of the sae colexit (5 hidden nodes on Puadn-8x dataset fro DELVE VI. CONCLUSION In this aer, we have roosed a novel te of ANN, a Mixed Transfer Function Artificial Neural Network (MTFANN, which ais to irove the colexit fitting and corehensibilit of the ost oular te of ANN (MLP - a Multilaer Percetron. The results of the alication of MTFANN on realistic exale of the dnaics of a Pua 560 robot ar with a high degree of non-linearit and high noise have deonstrated its abilit to obtain a highl accurate odel which is in ost cases uch ore accurate than a ore colex achine learning ethods used. At the sae tie the obtained sile regression equations fro the MTFANN odel has also show the effectiveness of a sile knowledge extraction aroach resented that is able to convert the odel into a sile and corehensible forat without an losses in odeling accurac. Finall, it was shown that MTFANN is able to obtain a odel with a uch better accurac that coarable MLP odel of the sae colexit. Error Proagation, MIT Press. [6] R. Sentiono, W.K. Leow, J.M. Zurada, Extraction of Rules fro Artificial Neural Networks for Nonlinear Regression, IEEE Transactions on Neural Networks, Vol. 3, No. 3, 2002. [7] R. Andrews, J. Diederich, and A. Tickle, Surve and critique of techniques for extracting rules fro trained artificial neural networks, Knowledge Based Sstes., vol. 8, no. 6,. 373 389, 998. [8] B. J. Park, S. K. Oh, and W. Pedrcz, "The Hbrid Multi-laer Inference Architecture and Algorith of F'PNN Based on FNN and PNN", 9th IFSA World Congress,. 36-366, 200 [9] G. Towell and J.Shavlik, The extraction of refined rules fro knowledge based neural networks, Machine Learning, Vol. 3, No.,. 7-0, 993. [0] A.S. d Avila Garcez, K. Broda, D.M. Gabb, Sbolic knowledge extraction fro trained neural networks: A sound aroach, Artificial Intelligence, vol. 25,.55 207, 200 [] R. Setiono and A. Azcarraga, Generating concise sets of linear regression rules fro Artificial Neural Networks, International Journal of Artificial Intelligence Tools, vol., no. 2,. 89-202, 2002. [2] K. Saito, and R. Nakano, Extracting regression rules fro neural networks, Neural Networks, vol. 5,. 279 288, 2002 [3] W. Duch, R. Adaczak and K. Grabczewski, A new ethodolog of extraction, otiization and alication of cris and fuzz logical rules, IEEE Transactions on Neural Networks, Vol., No. 2, 2000 [4] J. M. Besada-Juez, M. A. Sanz-Bobi, Extraction of Fuzz Rules Using Sensibilit Analsis in a Neural Network. ICANN 2002,. 395-400 [5] P. Tino and J. Saja, Learning and extracting initial eal autoata with a odular neural network odel", Neural Coutation, vol. 7 no. 4,. 822-44, 995 [6] C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun, and Y.C. Lee Learning and extracting finite state autoata with second-order recurrent neural networks, Neural Coutation, vol. 4, no. 3,. 393-405, 992. [7] M. W. Craven and J. W. Shavlik., Extracting Tree-Structured Reresentations of Trained Networks, Advances in Neural Inforation Processing Sstes, Vol. 8. MIT Press, 996. [8] K. McGare, S. Werter and J. MacIntre, The Extraction and Coarison of Knowledge fro Local Function Networks, International Journal of Coutational Intelligence and Alications, vol. 8, no. 0, 200 [9] N. K. Treadgold and T. D. Gedeon, Exloring Constructive Cascade Networks, IEEE Transactions on Neural Networks, Vol. 0, No. 6, 999. [20] htt://www.cs.toronto.edu/~delve/ VII. REFERENCES [] K. Hornik, M. Stinchcobe and H. White, Universal aroxiation of an unknown aing and its derivatives using ultilaer feedforward networks, Neural Networks, vol. 3,. 55-560, 990. Authorized licensed use liited to: DEAKIN UNIVERSITY LIBRARY. Downloaded on June 07,200 at 05:04:3 UTC fro IEEE Xlore. Restrictions al.