Journal of Mathematics and Statistics 2 (2): 395-4, 26 ISSN 1549-3644 26 Science Publications Fuzzy Automata Induction using Construction Method 1,2 Mo Zhi Wen and 2 Wan Min 1 Center of Intelligent Control and Develoment, Southwest Jiaotong University, China 2 College of Mathematics and Software Science, Sichuan Normal University, China Abstract: Recurrent neural networs have recently been demonstrated to have the ability to learn simle grammars. In articular, networs using second-order units have been successfully at this tas. However, it is often difficult to redict the otimal neural networ size to induce an unnown automaton from examles. Instead of just adjusting the weights in a networ of fixed toology, we adot the dynamic networs (i.e. the toology and weights can be simultaneously changed during training) for this alication. We aly the idea of maximizing correlation in the cascade-correlation algorithm to the second-order single-layer recurrent neural networ to generate a new construction algorithm and use it to induce fuzzy finite state automata. The exeriment indicates that such a dynamic networ erforms well. Key words: Fuzzy automation, construction method, dynamic networ INTRODUCTION Choosing an aroriate architecture for a learning tas is an imortant issue in training neural networs. Because general methods for determining a good networ architecture rior to training are generally lacing, algorithms that adat the networ architecture during training have been develoed. Constructive algorithm determines both the architecture of the networ in addition to the arameters (weights, thresholds, etc) necessary for learning the data. Comared to algorithms such as bac roagation they have the following otential advantages: * They grow the architecture of the neural networ in addition to finding the arameters required. * They can reduce the training to single-node learning hence substantially simlifying and accelerating the training rocess. Constructive algorithms such as cascade correlation have at least an order of magnitude imrovement in learning seed over bac roagation. * Some constructive algorithms are guaranteed to converge unlie the bac roagation algorithm, for examle. * Algorithms such as bac roagation can suffer from catastrohic interference when learning new data, that is, storage of new information seriously disruts the retrieval of reviously stored data. The incremental learning strategy of constructive algorithms offers a ossible solution to this roblem. Rather than growing neural networs destructive or runing [1] algorithms remove neurons or weights from large and trained networs and retain the reduced networs in an attemt to imrove their generalization erformance. Aroaches include removing the smallest magnitude or insensitive weights or by adding a enalty term to the energy function to encourage weight decay. However, it is difficult to guess the initial networ which is bound to solve the roblem. Recurrent neural networs have been demonstrated to have the ability to learn simle grammars [2-6]. However, the roblem associated with recurrent networs of fixed toology is aarent. We have to emirically choose the number of hidden neurons, there is no general methods for determining an aroriate toology for secific alication. Consequently, the convergence can t be guaranteed. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might offer a solution to this roblem though it is comlementary. We roose a construction algorithm for second-order single-layer recurrent neural networ which adds a second-order recurrent hidden neuron at a time to adat the toology of the networ in addition to the change of the weights and use it to learn fuzzy finite state machines and exerimentally find it effective. Fuzzy finite state automata: We begin by the class of fuzzy automata which we are interested in learning: Definition: A fuzzy finite-state automaton (FFA) is a 6-tule < Σ, Q, Z, q,δ,ω > where Σ is a finite inut alhabet and Q is the set of states; Z is a finite outut alhabet, q is an initial state, δ : Σ Q [,1] Q is the fuzzy transition ma andω : Q Z is the outut ma. Corresonding Author: Mo Zhi Wen, 1 Center of Intelligent control and Develoment,Southwest Jiaotong University 395
J. Math. & Stat., 2 (2): 395-4, 26 It should be noted that a finite fuzzy automaton is reduced to a conventional (cris) one when all the transition degrees are equal to 1. The following result is the basis for maing FFAs into the corresonding deterministic recurrent neural networs [7] : Theorem: Given a FFA, there exists a deterministic finite state automaton (DFA) M with outut alhabet Z { θ : θ is a roduction weight} {} which comutes the membershi function µ : Σ [,1] of the language L ( M ). An examle of FFA-to-DFA transformation is shown in Fig. 1a and b [8]. recurrent layer with N neurons and two outut layers both with M neurons, where M is determined by the level of accuracy desired in the outut. If r is the accuracy required(r.1 if the accuracy is decimal, r.1 if the accuracy is centesimal, etc.), M(1 log( r) +1). The hidden recurrent layer is activated by both the inut neurons and the recurrent layer itself. The first outut layer receives the values of the recurrent neurons at time m, where m is the length of the inut string. The neurons in this layer then comete to have a winner to determine the outut of the networ. The dynamic of the neural networ is as follows: S t i + 1 g( Ξ + ) Ξ t t Wij S ji i1, N (1) j, O g( σ ) σ N-1 u i S i m i 1, M (2) 1 if O max{o,..., O M-1} out otherwise (3) Fig. 1: Examle of a transformation of a secific FFA into its corresonding DFA. (a) A fuzzy finite-state automaton with weighted state transitions. State 1 is the automaton s start state; acceting states are drawn with double circle. A transition from state q j to q i on inut symbol a with weightθ is reresented as a direct arc from q j to q i labeled a /θ. (b) corresonding deterministic finite-state automata which comutes the membershi function strings. The acceting states are labeled with the degree of membershi. Notice that all transitions in the DFA have weight one Second-order recurrent neural networ used to induce FFA: A second-order recurrent neural networ lined to an outut layer is shown to be a good technique to erform fuzzy automaton induction in [9], it consists of three arts (Fig. 2): inut layer, a single 396 Where g is a sigmoid discriminant function and b is the bias associated with hidden recurrent state neurons S. u i connected S i to the neuron O of the first outut layer, W ij is the second-order connection weight of the recurrent layer and inut layer. The membershi degree associated to the inut string is determined by the last outut layer: µ L (x)i*r, where r is the desired accuracy, i is the index of the non-zero neuron of the last outut layer. The weights are udated by the seudo-gradient decent algorithm: The error of each outut neuron is 1 2 E i (Ti O i ), 2 Where T i is the exected value for the ith outut neuron and O i is the rovisional value given by equation (2). The total error for the networ is M-1 E Ei ( 4 ) i When E>,modify the weights as follows: E u ij - α i u ij E w lon - α wlon, Whereis the error tolerance of the neural networ, is the learning rate. Further ste: E i (T i -O i ) O i 1-O i S m j u ij
Fig. 2: E wlon Neural networ used for fuzzy graininar inference M-1 N-1 N-1 L-1 S m-1 m-1 m-1 m-1 m-1 j (Tr O r )g'( σr ) urg'( θ ) δls In + wji w lon r j Initialize j wlon Dynamic adatation of recurrent neural networ architecture: Constructive algorithms dynamically grow the structure during the networ s training. Various tyes of networ growing methods have been roosed [1-14]. However, a few of them are for recurrent networs. The aroach we roose in this aer is a constructive algorithm for second-order single-layer recurrent neural networs. In the course of the induction of an unnown fuzzy finite-finite state automata from training samles, our algorithm toologically changes the networ in addition to adjusting the weights of the second-order recurrent neural networ. We utilize the idea of maximizing correlation in cascade-correlation algorithm to get a rior nowledge at each stage of the dynamic training which is then used to initialize the new generated networ for the following learning. Before the descrition of the constructive learning, the following criteria need to be et in mind: * When the networ structure needs to change J. Math. & Stat., 2 (2): 395-4, 26. 397 * How to connect the newly created neuron to the existing networ * How to assign initial values to the newly added connection weights We initialize the networ with only one hidden unit. The size of the inut and outut layer are determined by the I\O reresentation the exerimenter has chosen. We resent the whole training examles to the recurrent neural networ. When no significant error reduction has occurred or the training eochs have achieved a reset number, we measure the error over the entire training set to get the residual error signal according to (4). Here an eoch is defined as a training cycle in which the networ sees all training samles. If the residual error is small enough, we sto; if not, we attemt to add new hidden unit one by one to the single hidden layer using the unit-creation algorithm to maximize the magnitude of the correlation between the residual error and the candidate unit s outut. For the second criterion, the newly added hidden neuron receives second-order connection from the inuts and re-existing hidden units and then oututs to the outut layer and recurrent layer as shown in Fig. 3. Now, we turn to the unit-creation algorithm in our method, it is different from the case of cascade-correlation due to the different structure we use. We define S as in [1] : S ( )(, ) (5) where V is the candidate unit s outut value when the networ rocessing the th training examle and E,o is the residual error observed at outut neuron o when the networ structure needs to change. and E are the values of V and E averaged over the whole training examles. During the maximization of S, all the re-exiting weights are frozen other than the inut weights of the candidate unit and its connection with the hidden units. The following training algorithm udates the weights at the end of the resentation of the whole training examles. Suose we are ready to add the hth neuron, denote the length of the th examle, then V S, i.e. the outut of the hth hidden neuron at time. w hj V w hj w ih β β w hj, σ (E, - ) j1, h, 1, Σ ( 6 ) β w ih β, σ (E, - )
J. Math. & Stat., 2 (2): 395-4, 26 Error Eochs Fig. 4: The evolution of the training rocess for 1 hidden neuron networ Fig. 3: The networ starts with neuron 1 and grows incrementally to n neurons V w ih i1, h-1, 1, Σ (7) Error where σ o is the sign of the correlation between the candidate s value and outut o due to the modulus in (5), β is the learning rate, w hj are the trainable inut weights to the newly-added neuron h; w ih connect h to the re-existing ith hidden unit and Σ is the number of the inut units. According to (1), we further induce: l V h g xs w hfm w hfm V l h g x(s w hhm w hhm hh + w I l l h w fhm I 1 1 I I where f h l 1 h ) where fh (8) W hhm g [ W hf I (g x(s l 2 h I l 1 m + l 2 1 f W fhm Initialize w ))] f fhm (9) W ff Where g is the derivation for the th examle of the re-existing or candidate unit s activation function with resect to the sum of its inuts,α is the learning rate. Eochs Fig. 5: The evolution of the training rocess for current networ We resent the whole training examles to the current networ and train W and W according to the udate rules as described in (6)-(9) until we achieve a re-set number of training eochs or S sto increasing. At this time, we fix the trained W hj and W jh as the initial weights of the newly-added neuron, its connections to the outut layer are set to zero or very small random numbers so that they initially have little or no effect on training. The old weights start with their reviously trained values. Comared with the cascade-correlation in which only the outut weights of the new neuron is trainable, we allow all the weights are trainable. In order to reserve as much nowledge we ve learned as ossible, we set different learning rate to udate the weights we obtained and the outut weights of the new neuron connecting to the outut layer where the latter is larger than the former. U to now, the third criterion is resolved. Instead of a single candidate unit, a ool of candidates is ossibly used for the unit-creation algorithm which are trained in arallel from different random initial weights. They all receive the same inut signals and see the same residual error for each training examle. Whenever we decide that no further rogress of S is being made, we install the candidate whose 398
J. Math. & Stat., 2 (2): 395-4, 26 Fig. 6: The extracted automaton from learned networ correlation score is the best with the trained inut weights. Simulation exeriment: In this art, we have tested the construction algorithm on fuzzy grammar inference where the training examles are generated from the fuzzy automaton shown in Fig. 1a. The exeriment has been designed in the same way as in [9]. 1. Generation of examles: We start from a fuzzy automaton M as shown in Fig. 1a, from which we generate a set of 2 examles recognized by M. Any examle consist of a air (P i, µ i ) where P i is a random sequence formed by symbols of the reset number of eochs are achieved or S sto increasing, we install the candidate whose correlation score is the best. After 225 eochs, no significant increasing of S has aeared. Initial the new networ as described earlier and train it with learning rate.15 for the weights connecting the newly-added neuron to the outut layer and.8 for the rest resectively. The neural networ learned all the examles in eochs 348. Figure 5 shows the evolution of the error on the training rocess. We adot extraction algorithm 1 in [8] -artitioning of the outut sace to extract fuzzy finite automaton from learned networ: alhabet σ {a, b} and µ i is the membershi degree CONCLUSION to fuzzy language L(M) [9] for detail. The samles strings are ordered according to their length. An aroriate toology made u of a 2. Forgetting M and train a dynamic networ to second-order recurrent neural networ lined to an induce an unnown automaton which recognizes outut layer is roosed in [8] to erform fuzzy the training examles: grammatical inference. However, an aroriate The initial recurrent neural networ is comosed of number of the hidden unites is difficult to be redicted. 1 recurrent hidden neuron and 11 neurons in the outut Instead of just adjusting the weights in a networ of layer, i.e., a decimal accuracy is required. The fixed toology, we adot the construction method for arameters for the networ learning are: learning this alication. We aly the idea of maximizing rateα.3, error toleranceε 2.5x1-5. Figure 4 shows correlation of the cascade-correlation algorithm to the the evolution of the training rocess for 1-hidden unite second-order recurrent neural networ to generate a networ. After 285 eochs, no significant error new construction method and use it to infer fuzzy reduction has occurred and the residual error signal is grammar from examles. At every stage of dynamical 77.3695. It s much larger thanε. So we add a neuron to training, we obtain a rior nowledge to initialize a the hidden layer. newly generated networ and train it until no further We use 8 candidate units, each with a different set decrease of the error being made. It recovers the of random initial weights. Learning rate β is set to.3. absence of the emiric in the case of the fixed-toology According to (5)-(9) to maximize S in arallel until a networ and generates an otimal toology 399
J. Math. & Stat., 2 (2): 395-4, 26 automatically. An exeriment shows that this algorithm is ractical. REFERENCES 1. Mozer, M.C. and P. Smolensy, 1989. Seletonization: A technique for trimming the fat from a networ via relevance assessment. Connection Sci., 11: 3-26. 2. Giles, C.L., C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun and Y.C. Lee, 1992. Learning and extracting finite state automata with second-order recurrent neural networs. Neural Comutation, 4: 393-45. 3. Zeng, Z., R. Goodman and P. Smyth, 1993. Learning finite state machines with self-clustering recurrent networs. Neural Comutation, 5: 976-99. 4. Elman, J.L., 1991. Distributed reresentations, simle recurrent networs and grammatical structure. Machine Learning, 7: 195-225. 5. Cleeremans, A., D. Servan-Schreiber and J.L. McClelland, 1989. Finite state automata and simle recurrent networs. Neural Comutation, 1: 372-381. 6. Williams, R.J. and D. Ziser, 1989. A learning algorithm for continually running fully recurrent neural networs. Neural Comutation, 1: 27-28. 7. Thomason, M. and P. Marinos, 1974. Deterministic accetors of regular fuzzy languages. IEEE Trans. Systems, Man and Cybernetics, 3: 228-23. 8. Christian, W.O., K.K. Thornber and C.L. Giles, 1998. Fuzzy finite-state automata can be deterministically encoded into recurrent neural networs. IEEE Trans. Fuzzy Systems, 6: 1. 9. Blanco, A., M. Delgado and M.C. Pegalajar, 21. Fuzzy automaton induction using neural networ. Intl. J. Aroximate Reasoning, 27: 1-26. 1. S. Fahlman, the cascade-correlation learning architecture, in advances in neural information rocessing systems 2(D. Touretzy, ed.),(san Mateo, CA),. 524-532, Morgan Kaufmann Publishers, 199. 11. Ash, T., 1989. Dynamic node creation in bac-roagation networs. Connection Sci., 1: 365-375. 12. Frean, M., 199. The ustart algorithm: a method for constructing and training feedforward neural networs. Neural Comutation, 2: 198. 13. Gallant, S.I., 1986. Three constructive algorithms for networ learning. Proc. 8th Ann. Conf. Cognitive Science Society, : 652-66. 14. Hirose, Y., K. Yamashita and S. Hijiya, 1991. Bac-roagation algorithm which varies the number if hidden units. Neural Networs, 4: 61-66. 4