Fuzzy Automata Induction using Construction Method

Similar documents
An Evolution Strategy for the Induction of Fuzzy Finite-state Automata

Radial Basis Function Networks: Algorithms

Recent Developments in Multilayer Perceptron Neural Networks

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Principal Components Analysis and Unsupervised Hebbian Learning

An Algorithm of Two-Phase Learning for Eleman Neural Network to Avoid the Local Minima Problem

A Parallel Algorithm for Minimization of Finite Automata

Analysis of Group Coding of Multiple Amino Acids in Artificial Neural Network Applied to the Prediction of Protein Secondary Structure

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Neural network models for river flow forecasting

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

General Linear Model Introduction, Classes of Linear models and Estimation

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H.

One step ahead prediction using Fuzzy Boolean Neural Networks 1

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

Generation of Linear Models using Simulation Results

Parallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem

Feedback-error control

Factor Analysis of Convective Heat Transfer for a Horizontal Tube in the Turbulent Flow Region Using Artificial Neural Network

2-D Analysis for Iterative Learning Controller for Discrete-Time Systems With Variable Initial Conditions Yong FANG 1, and Tommy W. S.

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

Metrics Performance Evaluation: Application to Face Recognition

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics

Research Article Research on Evaluation Indicator System and Methods of Food Network Marketing Performance

4. Score normalization technical details We now discuss the technical details of the score normalization method.

arxiv: v1 [physics.data-an] 26 Oct 2012

An Analysis of Reliable Classifiers through ROC Isometrics

ADAPTIVE CONTROL METHODS FOR EXCITED SYSTEMS

Recursive Estimation of the Preisach Density function for a Smart Actuator

Robust Predictive Control of Input Constraints and Interference Suppression for Semi-Trailer System

Convex Optimization methods for Computing Channel Capacity

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Network-Flow Based Cell Sizing Algorithm

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing

Named Entity Recognition using Maximum Entropy Model SEEM5680

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin

Variable Selection and Model Building

Notes on Instrumental Variables Methods

A Unified 2D Representation of Fuzzy Reasoning, CBR, and Experience Based Reasoning

CONSTRUCTIVE NEURAL NETWORKS IN FORECASTING WEEKLY RIVER FLOW

APPLICATION OF THE TAKAGI-SUGENO FUZZY CONTROLLER FOR SOLVING THE ROBOTS' INVERSE KINEMATICS PROBLEM UDC

Design of NARMA L-2 Control of Nonlinear Inverted Pendulum

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

On Fractional Predictive PID Controller Design Method Emmanuel Edet*. Reza Katebi.**

q-ary Symmetric Channel for Large q

FE FORMULATIONS FOR PLASTICITY

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Distributed Rule-Based Inference in the Presence of Redundant Information

A New GP-evolved Formulation for the Relative Permittivity of Water and Steam

Approximating min-max k-clustering

A Recursive Block Incomplete Factorization. Preconditioner for Adaptive Filtering Problem

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Information collection on a graph

Fig. 4. Example of Predicted Workers. Fig. 3. A Framework for Tackling the MQA Problem.

Unit 1 - Computer Arithmetic

DIFFERENTIAL evolution (DE) [3] has become a popular

EXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES

Genetic Algorithm Based PID Optimization in Batch Process Control

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking

PCA fused NN approach for drill wear prediction in drilling mild steel specimen

NUMERICAL AND THEORETICAL INVESTIGATIONS ON DETONATION- INERT CONFINEMENT INTERACTIONS

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

COMMUNICATION BETWEEN SHAREHOLDERS 1

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search

Information collection on a graph

State Estimation with ARMarkov Models

Dynamic System Eigenvalue Extraction using a Linear Echo State Network for Small-Signal Stability Analysis a Novel Application

Research of PMU Optimal Placement in Power Systems

Multivariable PID Control Design For Wastewater Systems

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

Lorenz Wind Disturbance Model Based on Grey Generated Components

Estimating function analysis for a class of Tweedie regression models

Universal Finite Memory Coding of Binary Sequences

1-way quantum finite automata: strengths, weaknesses and generalizations

Re-entry Protocols for Seismically Active Mines Using Statistical Analysis of Aftershock Sequences

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

COMPLEX-VALUED ASSOCIATIVE MEMORIES WITH PROJECTION AND ITERATIVE LEARNING RULES

Best approximation by linear combinations of characteristic functions of half-spaces

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES

AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM. Radu Balan, Alexander Jourjine, Justinian Rosca. Siemens Corporation Research

An efficient method for generalized linear multiplicative programming problem with multiplicative constraints

Hidden Predictors: A Factor Analysis Primer

Transcription:

Journal of Mathematics and Statistics 2 (2): 395-4, 26 ISSN 1549-3644 26 Science Publications Fuzzy Automata Induction using Construction Method 1,2 Mo Zhi Wen and 2 Wan Min 1 Center of Intelligent Control and Develoment, Southwest Jiaotong University, China 2 College of Mathematics and Software Science, Sichuan Normal University, China Abstract: Recurrent neural networs have recently been demonstrated to have the ability to learn simle grammars. In articular, networs using second-order units have been successfully at this tas. However, it is often difficult to redict the otimal neural networ size to induce an unnown automaton from examles. Instead of just adjusting the weights in a networ of fixed toology, we adot the dynamic networs (i.e. the toology and weights can be simultaneously changed during training) for this alication. We aly the idea of maximizing correlation in the cascade-correlation algorithm to the second-order single-layer recurrent neural networ to generate a new construction algorithm and use it to induce fuzzy finite state automata. The exeriment indicates that such a dynamic networ erforms well. Key words: Fuzzy automation, construction method, dynamic networ INTRODUCTION Choosing an aroriate architecture for a learning tas is an imortant issue in training neural networs. Because general methods for determining a good networ architecture rior to training are generally lacing, algorithms that adat the networ architecture during training have been develoed. Constructive algorithm determines both the architecture of the networ in addition to the arameters (weights, thresholds, etc) necessary for learning the data. Comared to algorithms such as bac roagation they have the following otential advantages: * They grow the architecture of the neural networ in addition to finding the arameters required. * They can reduce the training to single-node learning hence substantially simlifying and accelerating the training rocess. Constructive algorithms such as cascade correlation have at least an order of magnitude imrovement in learning seed over bac roagation. * Some constructive algorithms are guaranteed to converge unlie the bac roagation algorithm, for examle. * Algorithms such as bac roagation can suffer from catastrohic interference when learning new data, that is, storage of new information seriously disruts the retrieval of reviously stored data. The incremental learning strategy of constructive algorithms offers a ossible solution to this roblem. Rather than growing neural networs destructive or runing [1] algorithms remove neurons or weights from large and trained networs and retain the reduced networs in an attemt to imrove their generalization erformance. Aroaches include removing the smallest magnitude or insensitive weights or by adding a enalty term to the energy function to encourage weight decay. However, it is difficult to guess the initial networ which is bound to solve the roblem. Recurrent neural networs have been demonstrated to have the ability to learn simle grammars [2-6]. However, the roblem associated with recurrent networs of fixed toology is aarent. We have to emirically choose the number of hidden neurons, there is no general methods for determining an aroriate toology for secific alication. Consequently, the convergence can t be guaranteed. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might offer a solution to this roblem though it is comlementary. We roose a construction algorithm for second-order single-layer recurrent neural networ which adds a second-order recurrent hidden neuron at a time to adat the toology of the networ in addition to the change of the weights and use it to learn fuzzy finite state machines and exerimentally find it effective. Fuzzy finite state automata: We begin by the class of fuzzy automata which we are interested in learning: Definition: A fuzzy finite-state automaton (FFA) is a 6-tule < Σ, Q, Z, q,δ,ω > where Σ is a finite inut alhabet and Q is the set of states; Z is a finite outut alhabet, q is an initial state, δ : Σ Q [,1] Q is the fuzzy transition ma andω : Q Z is the outut ma. Corresonding Author: Mo Zhi Wen, 1 Center of Intelligent control and Develoment,Southwest Jiaotong University 395

J. Math. & Stat., 2 (2): 395-4, 26 It should be noted that a finite fuzzy automaton is reduced to a conventional (cris) one when all the transition degrees are equal to 1. The following result is the basis for maing FFAs into the corresonding deterministic recurrent neural networs [7] : Theorem: Given a FFA, there exists a deterministic finite state automaton (DFA) M with outut alhabet Z { θ : θ is a roduction weight} {} which comutes the membershi function µ : Σ [,1] of the language L ( M ). An examle of FFA-to-DFA transformation is shown in Fig. 1a and b [8]. recurrent layer with N neurons and two outut layers both with M neurons, where M is determined by the level of accuracy desired in the outut. If r is the accuracy required(r.1 if the accuracy is decimal, r.1 if the accuracy is centesimal, etc.), M(1 log( r) +1). The hidden recurrent layer is activated by both the inut neurons and the recurrent layer itself. The first outut layer receives the values of the recurrent neurons at time m, where m is the length of the inut string. The neurons in this layer then comete to have a winner to determine the outut of the networ. The dynamic of the neural networ is as follows: S t i + 1 g( Ξ + ) Ξ t t Wij S ji i1, N (1) j, O g( σ ) σ N-1 u i S i m i 1, M (2) 1 if O max{o,..., O M-1} out otherwise (3) Fig. 1: Examle of a transformation of a secific FFA into its corresonding DFA. (a) A fuzzy finite-state automaton with weighted state transitions. State 1 is the automaton s start state; acceting states are drawn with double circle. A transition from state q j to q i on inut symbol a with weightθ is reresented as a direct arc from q j to q i labeled a /θ. (b) corresonding deterministic finite-state automata which comutes the membershi function strings. The acceting states are labeled with the degree of membershi. Notice that all transitions in the DFA have weight one Second-order recurrent neural networ used to induce FFA: A second-order recurrent neural networ lined to an outut layer is shown to be a good technique to erform fuzzy automaton induction in [9], it consists of three arts (Fig. 2): inut layer, a single 396 Where g is a sigmoid discriminant function and b is the bias associated with hidden recurrent state neurons S. u i connected S i to the neuron O of the first outut layer, W ij is the second-order connection weight of the recurrent layer and inut layer. The membershi degree associated to the inut string is determined by the last outut layer: µ L (x)i*r, where r is the desired accuracy, i is the index of the non-zero neuron of the last outut layer. The weights are udated by the seudo-gradient decent algorithm: The error of each outut neuron is 1 2 E i (Ti O i ), 2 Where T i is the exected value for the ith outut neuron and O i is the rovisional value given by equation (2). The total error for the networ is M-1 E Ei ( 4 ) i When E>,modify the weights as follows: E u ij - α i u ij E w lon - α wlon, Whereis the error tolerance of the neural networ, is the learning rate. Further ste: E i (T i -O i ) O i 1-O i S m j u ij

Fig. 2: E wlon Neural networ used for fuzzy graininar inference M-1 N-1 N-1 L-1 S m-1 m-1 m-1 m-1 m-1 j (Tr O r )g'( σr ) urg'( θ ) δls In + wji w lon r j Initialize j wlon Dynamic adatation of recurrent neural networ architecture: Constructive algorithms dynamically grow the structure during the networ s training. Various tyes of networ growing methods have been roosed [1-14]. However, a few of them are for recurrent networs. The aroach we roose in this aer is a constructive algorithm for second-order single-layer recurrent neural networs. In the course of the induction of an unnown fuzzy finite-finite state automata from training samles, our algorithm toologically changes the networ in addition to adjusting the weights of the second-order recurrent neural networ. We utilize the idea of maximizing correlation in cascade-correlation algorithm to get a rior nowledge at each stage of the dynamic training which is then used to initialize the new generated networ for the following learning. Before the descrition of the constructive learning, the following criteria need to be et in mind: * When the networ structure needs to change J. Math. & Stat., 2 (2): 395-4, 26. 397 * How to connect the newly created neuron to the existing networ * How to assign initial values to the newly added connection weights We initialize the networ with only one hidden unit. The size of the inut and outut layer are determined by the I\O reresentation the exerimenter has chosen. We resent the whole training examles to the recurrent neural networ. When no significant error reduction has occurred or the training eochs have achieved a reset number, we measure the error over the entire training set to get the residual error signal according to (4). Here an eoch is defined as a training cycle in which the networ sees all training samles. If the residual error is small enough, we sto; if not, we attemt to add new hidden unit one by one to the single hidden layer using the unit-creation algorithm to maximize the magnitude of the correlation between the residual error and the candidate unit s outut. For the second criterion, the newly added hidden neuron receives second-order connection from the inuts and re-existing hidden units and then oututs to the outut layer and recurrent layer as shown in Fig. 3. Now, we turn to the unit-creation algorithm in our method, it is different from the case of cascade-correlation due to the different structure we use. We define S as in [1] : S ( )(, ) (5) where V is the candidate unit s outut value when the networ rocessing the th training examle and E,o is the residual error observed at outut neuron o when the networ structure needs to change. and E are the values of V and E averaged over the whole training examles. During the maximization of S, all the re-exiting weights are frozen other than the inut weights of the candidate unit and its connection with the hidden units. The following training algorithm udates the weights at the end of the resentation of the whole training examles. Suose we are ready to add the hth neuron, denote the length of the th examle, then V S, i.e. the outut of the hth hidden neuron at time. w hj V w hj w ih β β w hj, σ (E, - ) j1, h, 1, Σ ( 6 ) β w ih β, σ (E, - )

J. Math. & Stat., 2 (2): 395-4, 26 Error Eochs Fig. 4: The evolution of the training rocess for 1 hidden neuron networ Fig. 3: The networ starts with neuron 1 and grows incrementally to n neurons V w ih i1, h-1, 1, Σ (7) Error where σ o is the sign of the correlation between the candidate s value and outut o due to the modulus in (5), β is the learning rate, w hj are the trainable inut weights to the newly-added neuron h; w ih connect h to the re-existing ith hidden unit and Σ is the number of the inut units. According to (1), we further induce: l V h g xs w hfm w hfm V l h g x(s w hhm w hhm hh + w I l l h w fhm I 1 1 I I where f h l 1 h ) where fh (8) W hhm g [ W hf I (g x(s l 2 h I l 1 m + l 2 1 f W fhm Initialize w ))] f fhm (9) W ff Where g is the derivation for the th examle of the re-existing or candidate unit s activation function with resect to the sum of its inuts,α is the learning rate. Eochs Fig. 5: The evolution of the training rocess for current networ We resent the whole training examles to the current networ and train W and W according to the udate rules as described in (6)-(9) until we achieve a re-set number of training eochs or S sto increasing. At this time, we fix the trained W hj and W jh as the initial weights of the newly-added neuron, its connections to the outut layer are set to zero or very small random numbers so that they initially have little or no effect on training. The old weights start with their reviously trained values. Comared with the cascade-correlation in which only the outut weights of the new neuron is trainable, we allow all the weights are trainable. In order to reserve as much nowledge we ve learned as ossible, we set different learning rate to udate the weights we obtained and the outut weights of the new neuron connecting to the outut layer where the latter is larger than the former. U to now, the third criterion is resolved. Instead of a single candidate unit, a ool of candidates is ossibly used for the unit-creation algorithm which are trained in arallel from different random initial weights. They all receive the same inut signals and see the same residual error for each training examle. Whenever we decide that no further rogress of S is being made, we install the candidate whose 398

J. Math. & Stat., 2 (2): 395-4, 26 Fig. 6: The extracted automaton from learned networ correlation score is the best with the trained inut weights. Simulation exeriment: In this art, we have tested the construction algorithm on fuzzy grammar inference where the training examles are generated from the fuzzy automaton shown in Fig. 1a. The exeriment has been designed in the same way as in [9]. 1. Generation of examles: We start from a fuzzy automaton M as shown in Fig. 1a, from which we generate a set of 2 examles recognized by M. Any examle consist of a air (P i, µ i ) where P i is a random sequence formed by symbols of the reset number of eochs are achieved or S sto increasing, we install the candidate whose correlation score is the best. After 225 eochs, no significant increasing of S has aeared. Initial the new networ as described earlier and train it with learning rate.15 for the weights connecting the newly-added neuron to the outut layer and.8 for the rest resectively. The neural networ learned all the examles in eochs 348. Figure 5 shows the evolution of the error on the training rocess. We adot extraction algorithm 1 in [8] -artitioning of the outut sace to extract fuzzy finite automaton from learned networ: alhabet σ {a, b} and µ i is the membershi degree CONCLUSION to fuzzy language L(M) [9] for detail. The samles strings are ordered according to their length. An aroriate toology made u of a 2. Forgetting M and train a dynamic networ to second-order recurrent neural networ lined to an induce an unnown automaton which recognizes outut layer is roosed in [8] to erform fuzzy the training examles: grammatical inference. However, an aroriate The initial recurrent neural networ is comosed of number of the hidden unites is difficult to be redicted. 1 recurrent hidden neuron and 11 neurons in the outut Instead of just adjusting the weights in a networ of layer, i.e., a decimal accuracy is required. The fixed toology, we adot the construction method for arameters for the networ learning are: learning this alication. We aly the idea of maximizing rateα.3, error toleranceε 2.5x1-5. Figure 4 shows correlation of the cascade-correlation algorithm to the the evolution of the training rocess for 1-hidden unite second-order recurrent neural networ to generate a networ. After 285 eochs, no significant error new construction method and use it to infer fuzzy reduction has occurred and the residual error signal is grammar from examles. At every stage of dynamical 77.3695. It s much larger thanε. So we add a neuron to training, we obtain a rior nowledge to initialize a the hidden layer. newly generated networ and train it until no further We use 8 candidate units, each with a different set decrease of the error being made. It recovers the of random initial weights. Learning rate β is set to.3. absence of the emiric in the case of the fixed-toology According to (5)-(9) to maximize S in arallel until a networ and generates an otimal toology 399

J. Math. & Stat., 2 (2): 395-4, 26 automatically. An exeriment shows that this algorithm is ractical. REFERENCES 1. Mozer, M.C. and P. Smolensy, 1989. Seletonization: A technique for trimming the fat from a networ via relevance assessment. Connection Sci., 11: 3-26. 2. Giles, C.L., C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun and Y.C. Lee, 1992. Learning and extracting finite state automata with second-order recurrent neural networs. Neural Comutation, 4: 393-45. 3. Zeng, Z., R. Goodman and P. Smyth, 1993. Learning finite state machines with self-clustering recurrent networs. Neural Comutation, 5: 976-99. 4. Elman, J.L., 1991. Distributed reresentations, simle recurrent networs and grammatical structure. Machine Learning, 7: 195-225. 5. Cleeremans, A., D. Servan-Schreiber and J.L. McClelland, 1989. Finite state automata and simle recurrent networs. Neural Comutation, 1: 372-381. 6. Williams, R.J. and D. Ziser, 1989. A learning algorithm for continually running fully recurrent neural networs. Neural Comutation, 1: 27-28. 7. Thomason, M. and P. Marinos, 1974. Deterministic accetors of regular fuzzy languages. IEEE Trans. Systems, Man and Cybernetics, 3: 228-23. 8. Christian, W.O., K.K. Thornber and C.L. Giles, 1998. Fuzzy finite-state automata can be deterministically encoded into recurrent neural networs. IEEE Trans. Fuzzy Systems, 6: 1. 9. Blanco, A., M. Delgado and M.C. Pegalajar, 21. Fuzzy automaton induction using neural networ. Intl. J. Aroximate Reasoning, 27: 1-26. 1. S. Fahlman, the cascade-correlation learning architecture, in advances in neural information rocessing systems 2(D. Touretzy, ed.),(san Mateo, CA),. 524-532, Morgan Kaufmann Publishers, 199. 11. Ash, T., 1989. Dynamic node creation in bac-roagation networs. Connection Sci., 1: 365-375. 12. Frean, M., 199. The ustart algorithm: a method for constructing and training feedforward neural networs. Neural Comutation, 2: 198. 13. Gallant, S.I., 1986. Three constructive algorithms for networ learning. Proc. 8th Ann. Conf. Cognitive Science Society, : 652-66. 14. Hirose, Y., K. Yamashita and S. Hijiya, 1991. Bac-roagation algorithm which varies the number if hidden units. Neural Networs, 4: 61-66. 4