IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Stable Dynamic Backpropagation Learning in Recurrent Neural Networks

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 1321 Stable Dynamic Backpropagation Learning in Recurrent Neural Networks Liang Jin Madan M Gupta, Fellow, IEEE Abstract The conventional dynamic backpropagation (DBP) algorithm proposed by Pineda does not necessarily imply the stability of the dynamic neural model in the sense of Lyapunov during a dynamic weight learning process A difficulty with the DBP learning process is thus associated with the stability of the equilibrium points which have to be checked by simulating the set of dynamic equations, or else by verifying the stability conditions, after the learning has been completed To avoid unstable phenomenon during the learning process, two new learning schemes, called the multiplier constrained learning rate algorithms, are proposed in this paper to provide stable adaptive updating processes for both the synaptic somatic parameters of the network Based on the explicit stability conditions, in the multiplier method these conditions are introduced into the iterative error index, the new updating formulations contain a set of inequality constraints In the constrained learning rate algorithm, the learning rate is updated at each iterative instant by an equation derived using the stability conditions With these stable DBP algorithms, any analog target pattern may be implemented by a steady output vector which is a nonlinear vector function of the stable equilibrium point The applicability of the approaches presented is illustrated through both analog binary pattern storage examples Index Terms Adaptive algorithm, dynamic backpropagation algorithm, dynamic neural networks, Lyapunov stability, nonlinear dynamics I INTRODUCTION DYNAMIC neural networks (DNN s) which contain both feedforward feedback connections between the neural layers play an important role in visual processing, pattern recognition, neural computing control [36], [37] In neural associative memory, DNN s which deal with a static target pattern can be divided into two classes according to how the pattern in the network is expressed [6] [8], [21]: 1) the target pattern (input pattern) is given as an initial state of the network or 2) the target pattern is given as a constant input to the network In both the cases, the DNN must be designed such that the state of the network converges ultimately to a locally or globally stable equilibrium point which depends only on the target pattern [10] [12] In an earlier paper on neural associative memory, Hopfield [3], [4] proposed a well-known DNN for a binary vector pattern In this model, every memory vector is an equi- Manuscript received September 15, 1998; revised May 28, 1999 L Jin is with the Microelectronics Group, Lucent Technologies Inc, Allentown, PA 18103 USA M M Gupta is with the Intelligent Systems Research Laboratory, College of Engineering, University of Saskatchewan, Saskatoon, Sask, Canada S7N 5A9 Publisher Item Identifier S 1045-9227(99)09400-X librium point of the dynamic network, the stability of the equilibrium point is guaranteed by the stable learning process Many alternative techniques for storing binary vectors using both continuous discrete-time dynamic networks have appeared since then [9], [13], [14], [17], [20], [21] For the analog vector storage problem, Sudharsanan Sundereshan [13] developed a systematic synthesis procedure for constructing a continuous-time dynamic neural network in which a given set of analog vectors can be stored as the stable equilibrium points Marcus et al [35] discussed an associative memory in a so-called analog iterated-map neural network using both the Hebb rule the pseudoinverse rule Atiya Abu-Mostafa [16] recently proposed a new method using the Hopfield continuous-time network, a set of static weight learning formulations was developed in their paper An excellent survey of some previous work on the design of associative memories using the Hopfield continuous-time model was given by Michel Farrell [22] A dynamic learning algorithm for the first class of DNN s the analog target pattern is directly stored at an equilibrium point of the network was first proposed by Pineda [1] for a class of continuous-time networks At the same time, a dynamic learning algorithm was described by Almeida [5] In order to improve the capability of storing multiple patterns in such an associative memory, a modified algorithm for the dynamic learning process was later developed by Pineda [2] Two dynamic phenomena in the dynamic learning process were isolated into primitive architectural components which perform the operations of continuous nonlinear transformation autoassociative recall The dynamic learning techniques for programming the architectural components were presented in a formalism appropriate for a collective nonlinear dynamic neural system [2] This dynamic learning process was named dynamic back propagation (DBP) by Narendra [38], [39] due to the application of the gradient descent method More recently, this method was applied to a nonlinear functional approximation with a dynamic network using a dynamic algorithm for both the synaptic somatic parameters by Tawel [15] Some control applications of the DBP learning algorithm in recurrent neural networks may be found in the survey papers [40], [41] However, the problem of a DBP learning algorithm for discrete-time dynamic neural networks has received little attention in the literature In the DBP method, the dynamic network is designed using a dynamic learning process so that each given target vector becomes an equilibrium state of the network The stability is easily ensured for a stard continuous-time Hopfield 1045 9227/99$1000 1999 IEEE

1322 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 network during the learning process, if the synaptic weight matrix is set as a symmetric matrix with the zero diagonal elements [3], [4] at each learning instant Generally speaking, the dynamic learning algorithm does not guarantee though the asymptotic stability of the equilibrium points, a checking phase must be added to the learning process by simulating the set of dynamic equations, or else by verifying the stability conditions, after the learning has been completed If some of the equilibrium points are unstable, the learning process must be repeated using a learning rate that is small enough, which is a very time-consuming process An interesting topic is thus to develop a systematic way of ensuring the asymptotic stability of the equilibrium points during such a DBP learning procedure It is important to note that another issue associated with a stable DBP learning process is the stability criteria of a dynamic neural network The stability conditions of continuoustime Hopfield neural networks have been extensively studied [24] [29] during the past few years Recently, several stability criteria were proposed for discrete-time recurrent networks A global stability condition for a so-called iterated-map neural network with a symmatric weight matrix was proposed in [34] by Marcus, Westervelt using Lyapunov s function method the eigenvalue analysis, the condition was used in the associative memory learning algorithms in [35] by Marcus et al Recently, the stability bifurcation properties of some simple discrete-time neural networks were analyzed by Blum Wang [32], [33], the stability of the fixed points was studied for a class of discrete-time recurrent networks by Li [29] using the norm condition of a matrix Changes in the stable region of the fixed points due to the changing of the neuron gain were also obtained More recently, the problem of the absolute stability of a general class of discrete-time recurrent neural networks was discussed in [30] [31] by Jin, Nikiforuk, Gupta, some absolute stability conditions which are directly expressed using the synaptic weights were derived From the stability theory point of view, the stability in discrete-time dynamic neural networks can be evaluated using Lyapunov s first or second methods The stability analysis method using the Lyapunov s first method the well-known Gersgorin s theorem [42] will be incorporated to develop a stable DBP learning process in this paper A stable equilibrium point learning problem associated with discrete-time dynamic neural network is studied in this paper using stable dynamic backpropagation (SDBP) algorithms Assume that the analog vector is desired to be implemented by a steady output vector which is a nonlinear vector function of the network state, not directly stored at an equilibrium point of the network Two new learning algorithms, 1) the multiplier method 2) the constrained learning rate method, are developed for the purpose of ensuring the stability of the network during DBP learning phase The dynamics Gersgorin s theorem-based global stability conditions for a general class of discrete-time DNN s are discussed in Section II A conventional DBP algorithm is extended to the discrete-time networks which have nonlinear output equations in Section III In Section IV, the explicit stability conditions are introduced into the weight updating formulations using the multiplier concept which has been used in optimal control theory [42], a set of stable DBP formulations is constructed by a gradient algorithm with the inequality constraints The constrained learning rate algorithm is proposed in Section V, the learning rate is adapted by an equation derived using the stability conditions The applicability of the approaches is illustrated using examples in Section VI, some conclusions are given in Section VII II DISCRETE-TIME DYNAMIC NEURAL NETWORKS A Network Models Consider a general form of a dynamic recurrent neural network described by a discrete-time nonlinear system of the form is the state vector of the dynamic neural network, represents the internal state of the th neuron, is the real-valued matrix of the synaptic connection weights, is a threshold vector or a so-called somatic vector, is an observation vector or output vector : is a continuous differentiable vector valued function, are, respectively, bounded uniformly bounded, : is a known continuous differentiable vector valued function The recurrent neural network consists of both feedforward feedback connections between the layers neurons forming complicated dynamics In fact, the weight represents a synaptic connection parameter between the th neuron the th neuron, is a threshold at the th neuron Hence, the nonlinear vector valued function on the right side of system (1) may be represented as (1) (2) (3) (4) Equation (2) indicates that the dynamics of the th neuron in the network is associated with all the states of the network, the synaptic weights the somatic threshold parameter The four main types of discrete-time dynamic neural models are given in Table I These neural models describe the different dynamic perporties due to the different neural state equations Models I II consist of complete nonlinear difference

JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING 1323 TABLE I FOUR DISCRETE-TIME DYNAMIC NEURAL MODELS equations Models III IV are, however, the seminonlinear equations which contain the linear terms on the right h side of the models In these neural models, is the synaptic connection weight matrix, is the neural gain of the th neuron, is the time-constant or linear feedback gain of the th neuron, is a threshold at the th neuron The neural activation function may be chosen as the continuous differentiable nonlinear sigmoidal function satisfying the following conditions: 1) as ; 2) is bounded with the upper bound the lower bound ; 3) at a unique point ; 4) as ; 5) has a global maximal value Typical examples of such a function are least one equilibrium point In fact, there exists at least one equilibrium point for every neural model given in Table I Moreover, one can estimate that the regions of the equilibrium points of the models I IV are respectively the -dimensional hypercubes, Gersgorin s theorem [43] has often been used to derive the stability conditions by several authors [13], [16], [31] For a known real or complex matrix, Gersgorin s theorem provides an effective approach for determining the positions of the eigenvalues of the matrix In order to analyze the stability of the equilibrium points of system (1), let be the Jacobian of the function with respect to Based on the Lyapunov s first method, if or (6) (7) is a sign function all the above nonlinear activation functions are bounded, monotonic, nondecreasing functions B Gersgorin s Theorem-Based Stability Conditions The equilibrium points of the system (1) are defined by the following nonlinear algebraic equation: Without loss of generality, one can assume that there exists at least one solution of the (5); that is, the system (1) has at (5) then, is a local state equilibrium point Furthermore, if the elements of the Jacobian are uniformly bounded, there then exist functions such that (8) In this case, if (9) or (10)

1324 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 TABLE II GLOBAL STABILITY CONDITIONS OF THE NEURAL MODELS IN TABLE I The equilibrium point is a unique global stable equilibrium point of the system Using the uniformly bounded property of the derivative, the global stability conditions of the neural models shown in Table I are summarized in Table II If is treated as an external input to the th neuron in these neural models, the stability conditions given in Table II are called the absolute stability conditions [24], [31] because is not involved in the stability conditions In the later sections, these stability conditions will be incorporated to develop the stable dynamic learning algorithms incremental change term of the weight is given as (12) III GENERALIZED DBP LEARNING ALGORITHM The DBP algorithm for a class of continuous-time recurrent neural networks was first proposed by Pineda [1], [2] A DBP learning algorithm for a general class of dynamic neural systems with nonlinear output equations will be developed in this section for the analog target pattern storage purpose Let be an analog target pattern which is desired to be implemented by a steady-state output vector which is a nonlinear vector function of an equilibrium point of the neural system (1); that is, The purpose of the learning procedure is to adjust the synaptic weights the somatic threshold parameter such that can be realized by the nonlinear function Define an error function as (11) is a learning rate associated with the synaptic weights On the other h, for the somatic parameter, the incremental formulation is easily given as (13) is a learning rate associated with the somatic parameters The incremental terms are derived [see the Appendix] to be Next, we are going to discuss the learning formulations of the synaptic weights somatic parameters After performing a gradient descent in, the (14) (15)

JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING 1325 TABLE III DYNAMIC BACKPROPAGATION (DBP) LEARNING ALGORITHM (16) is said to be the steady adjoint equation Hence, the synaptic somatic learning formulations are obtained as (17) (18) (19) (20) In fact, are the equilibrium points of the following dynamic systems: (21) (22) respectively, Equation (22) is said to be the adjoint equation associated with the (21) The updating rules (17) (18) are not able to guarantee the stability for both systems (21) (22), a checking procedure for the stability of both (21) (22) is needed in such a dynamic learning process Two primary approaches may be used to check the stability of the network during a dynamic learning process The first approach is verifying that the stability condition of the equilibrium may be carried out after the whole dynamic learning process has been completed In this case, if the network is unstable, the learning phase must be repeated, the steady states must be solved by the nonlinear algebraic equations (19) (20) at each iterative instant The second approach is verifying the known stability of the network at each iterative instant When the network is unstable, in other words, when (19) (20) do not converge to the stable equilibrium points, the iterative process needs to be repeated through adjusting the learning rates, until the solutions of (19) (20) converge to the stable equilibrium points as time becomes large Both of these methods for stability studies are very time consuming For the neural models given in Table I, the DBP learning algorithms for the synaptic weight, the threshold are, respectively, derived in Table III Meanwhile, the steady adjoint equations corresponding to the neural models are also given in Table III IV STABLE DBP LEARNING: MULTIPLIER METHOD The learning algorithm given in the last section performs a storage process in which the target pattern is stored at an

1326 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 equilibrium point of the neural system (1) In fact, the target pattern is desired to be stored at a stable equilibrium point of the neural system (1); that is, after finishing the learning procedure, the synaptic weights the somatic parameters would satisfy either one of the local stability conditions (6) (7) or one of the global stability conditions (9) (10) so that the stored pattern is a local or global stable equilibrium point For this goal, one may require that the systems (21) (22) are stable at each learning instant, or after finishing the dynamic learning procedure, the neural system with the learned synaptic somatic parameters is stable in the sense of Lyapunov The multiplier method was used effectively to deal with the inequality constraints on functions of the control state variables in optimal control by Bryson Ho [42] In dynamic optimal control problems, since the multipliers associated with the inequality appear in the augmented Hamiltonian which is similar to the error index in the neural parameter learning process, the difficulties are due to the inequality constraints on both the control state variables are overcome successfully For the sake of simplicity, let only the row sum local global stability conditions be addressed to deal with the stable DBP learning algorithms in this paper The multiplier method will now be used to develop the stable learning algorithm for the neural system (1) A Local Stable DBP (LSDBP) Learning Let the local stability condition (6) be incorporated so as to develop the local stable DBP (LSDBP) learning algorithm which guarantees the local stability of the equilibrium at the end of the learning It is seen that the the original neural equation (21) the corresponding adjoint equation (22) have the same Jacobian, the local global stability conditions (6) (7) (9) (10) of system (21) are thus the sufficient conditions for the local global stability of the adjoint system (22), respectively Hence, in order to guarantee the stability of the equilibrium point which is used to represent the desired analog vector, the stability condition (6) may be considered as a set of the inequality constraints on the synaptic weights the somatic parameters Using the multiplier method for such a constrained neural learning, an augmented learning error index is defined as (23) the second term in the right side of the above equation will guarantee that the equilibrium point becomes locally asymptotically stable as the iterative time In other words, the introduction of the second term in the error index will force the trained system to satisfy the local stability condition Let the multiplier satisfy the additional requirement if if Hence, the incremental term of the is derived from (23) as (24) (25) The partial derivatives are obtained from the Appendix as follows: (26) (27) The main difficulty with this algorithm is associated with the computing of the matrix at each iterative instant As the number of the neurons becomes large, the computing becomes very time consuming This shortcoming can be avoided using the global stable fixed point learning algorithm, which will be described now B Global Stable DBP (GSDBP) Learning It is to be noted that the global stability conditions (9) (10) of the neural system (21) are the global stability conditions of the adjoint system (22) If are guaranteed to be the globally stable equilibrium points of the systems (21) (22), respectively, after the learning process, the global stability condition (9) may then be used in the augmented error index; that is (28) the second term on the right side of the above equation will guarantee that the trained system satisfies the global

JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING 1327 TABLE IV LSDBP LEARNING ALGORITHMS WITH MULTIPLIERS stability condition (9) as the iterative time multiplier satisfies the additional requirement if if, the Hence, the incremental terms of the are, respectively, derived from (28) as (32) (29) (30) Using the formulations obtained in the last section, the stable synaptic somatic learning algorithms are described by the following updating equations: are determined by (19) (20), the last terms in the above updating equations are said to be the additional incremental terms which are due to the global stability condition (9) Since the stability of the network is gradually ensured in this algorithm as iterative time, the have to be solved by the nonlinear algebraic (19) (20) of the equilibrium states at each iterative instant Based on the global stability conditions shown in Table II, the additional incremental terms for the neural models given in Table I are presented in Table IV, the row sum conditions are only considered in the derivations Since the global stability conditions are independent of the threshold for the networks given in Table I, the updating formulations of the threshold derived in the last section remain It is obvious that the computational requirement of the GSDBP learning algorithm is not significantly increased compared with the DBP GSDBP algorithms (31) V CONSTRAINED LEARNING RATE METHOD The learning rate in the DBP algorithm not only plays an important role in the convergence of the updating process,

1328 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 but it also influences the stability of the network During a dynamic learning process, if learning rate is too large, the network may become unstable, while if learning rate is too small, the convergence of the learning phase will be too slow to be practical Indeed, because the incremental terms of the weights are time-varying during the leraning process, the scale of a suitable learning rate needs to be adjusted at each learning instant, so that the stability of the system is ensured In fact, this is the reason that if a fixed learning rate is used, then the unstable situation may occur in the conventional DBP learning For the purpose of stability, a criterion for determining an adaptive learning rate using the known stability conditions of the network at each iterative instant will therefore be considered in this section using the DBP formulations given in Section III Substituting the above result into (38) yields (41) the incremental terms used in the last section are modified using the time-varying learning rates as follows: (42) A LSDBP Learning First, an LSDBP learning algorithm will be considered using the local stability condition Let the equilibrium point of system (1) be stable at the iterative time, the row sum local stability condition (6) be satisfied; that is Let, (43) (33) In order to determine the learning rates such that the stability condition (6) is satisfied at the iterative time, it can be assumed that (34) Hence, if (35) (36) (37) Then, the fixed point (44) is local stable Moreover, let For convenience, the iterative time will not appear in the following derivation Using a Taylor series expansion of first order terms of,,, (34) may be represented as The stable learning rate is obtained as (45) (46) (38) Furthermore, based on the equilibrium point equation (5), (39) Hence, may be solved as (40) Obviously, the computational complexity of the above learning algorithm arises from the computation of the inverse of the matrix at the each iterative time B Global Stable DBP (GSDBP) Learning If the equilibrium point of the network at each iterative instant is required to be globally stable, another constraint formulation of the learning rate may be then derived using the global stability condition In Section II, one of the absolute stability conditions was given as (47)

JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING 1329 TABLE V GSDBP LEARNING WITH CONSTRAINED LEARNING RATE ALGORITHM Let the system at the iterative time condition; that is satisfy the above (48) Consider the stability of the system at the iterative time Let (49) are given by (36) (37) Exping the terms on the left side of the above inequality to first-order incremental terms produces (50) It is to be noted that the iterative time is omitted above in the following formulations Moreover, denote On the other h, let ; that is, both the synaptic learning somatic learning have the same learning rates, then, the stable learning rate at instant is obtained as (51) The computational formulations of the stable learning rates for the neural models given in Table I are summarized in Table V Since the threshold is not involved in the stability conditions given in Table II, the learning rate algorithms given in Table V are independent of the threshold For the each neural model, it is easy to see that the constraint equation of the learning rate has a simple form The computational phase of the learning rate can be easily implemented, therefore, with the DBP routines given in Section III at each iterative instant VI IMPLEMENTATION EXAMPLES Example 1 (Analog Vector Storage): In this example, the proposed global stable learning algorithms are used to study an analog vector storage procedure Let a two-dimensional analog vector storage problem be considered using a two-neuron model of the following form: (52) The target pattern is realized by a steady output vector whose components are, is a equilibrium state of the network The GSDBP algorithms with multiplier constrained learning rate methods are used to carry out the storage process In this example, the four possible desired equilibrium states can easily be obtained as, by the known target vector the output equations Hence, the target analog vector may be implemented by the one of the above equilibrium states of the network This shows that the storage capacity of a dynamic network may be increased by introducing some suitable nonlinear output equations The initial values of the weights thresholds were chosen romly in the interval [ 05,05], the initial learning rate were set as Using the multiplier method provided in Section IV, a dynamic learning process with a learning time was then achieved so that the analog vector was realized by an equilibrium point When the multipliers were selected as, the weights thresholds were obtained at the end of the learning as follows: Similar to the well-known static BP learning for feedforward network, the computational procedure shows that a suitable choice of the values of the learning rates the multipliers in

1330 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 Fig 1 The error index curves during both the multiplier constrained learning rate procedures Fig 3 The phase plane diagram of the equilibrium point of the network during the constrained learning rate learning, the global stable equilibrium point is x f 1 = 00:70714 x f 2 =1:22757 Fig 2 The phase plane diagram of the equilibrium point of the network during the multiplier learning, the global stable equilibrium point is x f 1 = 0:70705 x f 2 =0:54772 the multiplier method is the first step for a successful dynamic learning process On the other h, using the constrained learning algorithm, the analog vector was perfectly realized by an equilibrium state with, the total learning time was, the set of weights thresholds were computed as The decrement curves of the error indexes are given in Fig 1 for the dynamic learning processes, the trajectories of the equilibrium points at each iterative instant are depicted in Figs 2 3, respectively The results shows that even though the constrained learning rate method required more iteratives than the multiplier method, the computational time of the former was less than that of the latter The reason is that the equilibrium points of the network the adjoint equation at each iterative instant had to be solved by a set of the nonlinear algebraic equations in the multiplier method because of the instability of the network during the initial iterative steps However, the may go through a simple iterative procedure for the dynamic equations of the network the adjoint system with the fixed weights thresholds because both the network adjoint equations are stable at each iterative instant in the constrained learning rate method Indeed, the learning rate may become as very small value at the some iterative instant for the purpose of global stability of the network, the convergence speed may thus become somewhat slowly in the constrained learning rate algorithm This drawback appeared in these simulation studies Example 2 (Binary Image Storage): A binary image pattern storage process is discussed in this example to illustrate the applicability of the algorithms the target pattern is a10 10 binary image as shown in Fig 4(a) The equations of the dynamic network have the following form: (53) the double subscripts are introduced to represent the two-dimension binary image is a state of the neuron which corresponds the image unit is a weight between the neuron the neuron, is a threshold of the neuron The neural activation function was chosen as In this example, the binary target pattern is desired to be stored directly at a stable equilibrium point of the network The global stability conditions of the neural model (53) may be obtained as or Let the GSBDP algorithms be used to deal with this problem For the initial stability of the network, all the iterative initial values of the weights thresholds were chosen

JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING 1331 (a) (b) (c) (d) Fig 4 (e) (f) (g) (h) The binary patterns correspond to the equilibrium point of the network (53) during the learning process using the multiplier method (a) The target pattern; (b) k = 0; (c) k =50; (d) k = 100; (e) k = 150; (f) k = 200; (g) k = 250; (h) k = 300 romly in the interval [ 0001, 0001], the initial learning rate was selected as in both the multiplier constrained learning rate algorithms, the multipliers were selected as in the multiplier method In order to observe the changing of the binary image corresponding to the equilibrium state of the network during learning process, the pattern was employed to represent the binary image at the iterative instant,,if,,if Using the multiplier method, the dynamic learning process was completed at the iterative instant, the 10 10 binary patterns were perfectly stored at a global stable equilibrium point of the network The recovered binary images at some iterative instant from the analog vector using the instant pattern operator are depicted in Fig 4, the error index curve during the dynamic learning process is given in Fig 6 The results obtained in Figs 4 6 show that the multiplier method has a satisfactory convergence, even if a set of high-dimensional nonlinear algebric equations are needed to be solved at each iterative instant The binary image patterns were also perfectly stored at a global stable equilibrium point at the instant using the constrained learning rate method The changing procedure of the binary image corresponding to the steady-state vector of the network is shown in Fig 5 The error index curve is given in Fig 6 shows the converging procedure of both the weight threshold learning process The total computational time of the learning process using the constrained learning rate method was much smaller than that of the learning process using the multiplier method The results show that GSDBP learning algorithms can be used effectively for large-scale pattern storage problems Example 3 (Gray-Scale Image Storage): A 32 32 girl image with 256 gray-scale given in Fig 7 is used to show the effectiveness of the SDBP algorithm developed in the paper for a purpose of associative memory It is easy to show that the eural network has synaptic weights in order to store such an image To reduce the computational requirment, the weight matrix is simplified as follows: (54) a simplified version of the neural networks is given by the following equations: (55) The neural structure represented by (55) can be viewed as an multilayered neural network with two visible layers hidden layers, each layer has neural units It is assumed that the multilayered structure equation (55) has two visible layers six hidden layers; that is, Let the each layer have 128 neural units the network contain total neural units It is easy to show that this eight-layered structure involves only 131 072 synaptic weights, which is only 125 percent of the one-layered strucyure The significant reduction on the number of the connection weights may reduce the computational complexity such as computing time memory requirement of the weight learning For associative memory synthesis, the simulation results indicate that the computing time for the multilayered structure using the algorithm presented in the letter is only 5% of that for a single layer Hopfield network using the pseudoinverse algorithm Three blurred girl images given in Figs 8(a), 9(a), 10(a) are distorted respectively by removing two parts of the original image, 1 10 motion without noise, adding a white Gaussian noise with 0 db signal-to-noise ratio (SNR) These image patterns are inputed respectively to the multilayered network to test the capability of the recall of the associative memory The recalled results are given respectively

1332 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 (a) (b) (c) (d) (e) Fig 5 (a) (g) (h) (i) (j) The binary patterns correspond to the equilibrium point of the network (53) during the learning process using the constrained learning rate method (a) The target pattern; (b) k = 0; (c) k =50; (d) k = 100; (e) k = 150; (f) k = 200; (g) k = 250; (h) k = 300; (i) k = 350; (j) k = 400 Fig 8 (a) (b) (a) The girl image with two parts removed; (b) recalled image Fig 6 The error index curves in Example 2 during both the multiplier constrained learning rate learning procedures Fig 9 (a) (a) The girl image with 1 2 10 motion; (b) Fig 7 Original girl image pattern (32 2 32) in Figs 8(b), 9(b), 10(b) It is seen that the distributed associative memory was able to perfectly recall the stored image when presented with blurred images given in Figs 8(a) 9(a), however, the noise reduction capability of such a memory structure is somewhat poor as shown in Fig 10(b) VII CONCLUSION The conventional DBP algorithm which has been used exclusively for the adjustment of the parameters of dynamic neural networks is extended in this paper using two new stable learning concepts, the multiplier the constrained learning rate methods They are proposed for the purposes of LSDBP learning GSDBP learning These dynamic learning schemes make use of the conventional DBP version Fig 10 (a) (b) (a) The girl image with white Gaussian noise; (b) recalled image proposed by Pineda [1], [2] some additional formulations which are due to the known stability conditions of the network The stard DBP routines can be still used, therefore, the stability of the network is ensured after the dynamic learning has been completed It is important to show that the

JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING 1333 computational requirement of the GSDBP learning algorithms is not significantly increased as compared with the DBP GSDBP algorithms The effectiveness of the proposed schemes were tested using both the analog binary pattern memory problems The nonlinear dynamic neural models discussed in this paper have the potential for application to image restoration, adaptive control, visual processing Since the multiplier constrained learning rate formulations are derived based on the explicit expressions of the network, a better estimation of the stability condition of a dynamic network may be of benefit to the learning algorithms The sets of Gersgorin s theorem-based stability conditions for a general class of dynamic networks are applied to design the learning algorithms in this paper APPENDIX In this Appendix, the procedure for deriving (14) (16) is given Based on the equilibrium point equation (5), the partial derivative of with respect to results in following expression: is the Kronecker delta function (56) Furthermore, let a new variable be introduced It can be shown that that is Hence, (62) (63) can be represented by REFERENCES (63) (64) (65) (66) (67) (68) Moreover, let then (55) can be represented as For convenience, let the matrix elements are defined by Then (57) can be rewritten as (57) (58) be introduced whose (59) (60) Let be the inverse of the matrix Then may be solved as thresh- Hence, the incremental changes in the weight old are expressed as (61) (62) [1] F J Pineda, Generalization of backpropagation to recurrent neural networks, Phys Rev Lett, vol 59, no 19, pp 2229 2232, 1987 [2] F J Pineda, Dynamics architecture for neural computation, J Complexity, vol 4, pp 216 245, 1988 [3] J Hopfield, Neural networks physical systems with emergent collective computational abilities, in Proc Nat Academy Sci USA, vol 79, 1982, pp 2554 2558 [4] J Hopfield, Neurons with graded response have collective computational properties like those of two state neurons, in Proc Nat Academy Sci USA, vol 81, 1984, pp 3088 3092 [5] L B Almeida, A learning rule for asynchronous perceptrons with feedback in a combinatorial environment, in Proc IEEE 1st Conf Neural Networks, San Diego, CA, June 21 24, 1987, vol II, pp 609 618 [6] S Amari, Learning patterns pattern sequences by self-organizing nets of threshold elements, IEEE Trans Comput, vol c-21, pp 1197 1206, Nov 1972 [7] S Amari, Neural theory of association concept formation, Biol Cybern, vol 26, pp 175 185, 1977 [8] M A Arbib, Brains, Machines, Mathematics New York: McGraw-Hill, 1988 [9] T Kohonen, Associative Memory: A System Theoretical Approach New York: Springer-Verlag, 1977 [10] B Kosko, Bidirectional associative memories, IEEE Trans Syst, Man, Cybern, vol SMC-18, pp 42 60, 1988 [11] Y Chauvin, Dynamic behavior of constrained backpropagation networks, in Advances in Neural Information Processing System, D A Touretzky, Ed, vol 2 San Mateo, CA: Morgan-Kaufmann, pp 519 526, 1990 [12] A Guez, V Protopopsecu, J Bahren, On the stability, storage capacity, design of continuous nonlinear neural networks, IEEE Trans Syst, Man, Cybern, vol 18, pp 80 87, Jan/Feb 1988 [13] S I Sudharsanan M K Sunareshan, Equilibrium characterization of dynamical neural networks a systematic synthesis procedure for associative memories, IEEE Trans Neural Networks, vol 2, pp 509 521, Sept 1991 [14] R Kamimura, Activated hidden connections to accelerate the learning in recurrent neural networks, in Proc Int Joint Conf Neural Networks (IJCNN), 1992, pp I-693 700 [15] R Tawel, Nonlinear functional approximation with networks using adaptive neurons, in Proc Int Joint Conf Neural Networks (IJCNN), 1992, pp III-491 496

1334 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 [16] A Atiya Y S Abu-Mostafa, An analog feedback associative memory, IEEE Trans Neural Networks, vol 4, no 1, pp 117 126, Jan 1993 [17] Y S Abu-Mostafa J-M St Jacques, Information capacity of the Hopfield model, IEEE Trans Inform Theory, vol IT-31, pp 461 464, 1984 [18] M Morita, Associative memory with nonmonotone dynamics, Neural Networks, vol 6, no 1, pp 115 126, 1993 [19] A H Gee, S V B Aiyer, R W Prager, An analytical framework for optimizing neural networks, Neural Networks, Vol 6, no 1, pp 79 98, 1993 [20] S V B Aiyer, M Niranjan, F Fallside, A theoretical investigation into the performance of the Hopfield model, IEEE Trans Neural Networks, vol 1, pp 204 215, 1990 [21] J Farrell A Michel, A synthesis procedure for Hopfield s continuous-time associative memory, IEEE Trans Circuits Syst, vol 37, pp 877 884, 1990 [22] A Michel J Farrell, Associative memories via artificial neural networks, IEEE Contr Syst Mag, pp 6 17, Apr 1990 [23] S Grossberg, Nonlinear neural networks: Principles, mechanisms architectures, Neural Networks, vol 1, no 1, pp 17 61, 1988 [24] M A Cohen S Grossberg, Absolute stability of global pattern information parallel memory storage by competitive neural networks, IEEE Trans Syst, Man, Cybern, vol SMC-13, pp 815 826, 1983 [25] A Guez, V Protopopsecu, J Barhen, On the stability, storage capacity, design of nonlinear continuous neural networks, IEEE Trans Syst, Man, Cybern, vol SMC-18, pp 80 87, 1988 [26] D G Kelly, Stability in contractive nonlinear neural networks, IEEE Trans Biomed Eng, vol 37, pp 231 242, 1990 [27] J A Anderson, J W Silverstein, S A Ritz, R S Jones, Distinctive features, categorical perception, probability learning: Some applications of a neural model, Neurocomputing: Foundations of Research, J A Anderson E Rosenfeld, Eds Cambridge, MA: MIT Press, 1988 [28] A H Michel, J Si, G Yen, Analysis synthesis of a class of discrete-time neural networks described on hypercubes, IEEE Trans Neural Networks, vol 2, pp 32 46, 1991 [29] L K Li, Fixed point analysis for discrete-time recurrent neural networks, in Proc IJCNN, June 1992, vol IV, pp 134 139 [30] L Jin, P N Nikiforuk, M M Gupta, Absolute stability conditions for discrete-time recurrent neural networks, IEEE Trans Neural Networks, vol 5, pp 954 964, 1994 [31] L Jin M M Gupta, Globally asymptotical stability of discretetime analog neural networks, IEEE Trans Neural Networks, vol 7, pp 1024 1031, 1996 [32] L Jin M M Gupta, Equilibrium capacity of analog feedback neural networks, IEEE Trans Neural Networks, vol 7, pp 782 787, 1996 [33] E K Blum X Wang, Stability of fixed points periodic orbits bifurcations in analog neural networks, Neural Networks, vol 5, no 4, pp 577 587, 1992 [34] C M Marcus R M Westervelt, Dynamics of iterated map neural networks, Phys Rev A, Vol 40, no 1, pp 577 587, 1989 [35] C M Marcus, F R Waugh, R M Westervelt, Associative memory in an analog iterated-map neural network, Phys Rev A, vol 41, no 6, pp 3355 3364, 1990 [36] D E Rumelhart J L McCell, Learning internal representations by error propagation, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1 Cambridge, MA: MIT Press, 1986 [37] R Hecht-Nielsen, Theory of the backpropagation neural network, in Proc Int Joint Conf Neural Networks, June 1989, pp I-593 605 [38] K S Narendra K Parthasarathy, Identification control of dynamical systems using neural networks, IEEE Trans Neural Networks, vol NN, pp 4 27, Mar 1990 [39] K S Narendra K Parthasarthy, Gradient methods for the optimization of dynamical systems containing neural networks, IEEE Trans Neural Networks, vol 2, pp 4 27, 1991 [40] K J Hunt, D Sbarbaro, R Zbikowski, P J Gawthrop, Neural networks for control systems A survey, Automatica, vol 28, no 6, pp 1083 1112, 1992 [41] D R Hush B G Horne, Progress in supervised neural networks: What s new since Lippmann?, IEEE Signal Processing Mag, no 1, pp 8 39, Jan 1993 [42] A E Bryson Y C Ho, Applied Optimal Control New York: Blaisdell, 1969 [43] R A Horn C A Johnson, Matrix Analysis Cambridge, UK: Cambridge Univ Press, 1985 Liang Jin received the BS MSc degrees in electrical engineering from the Changsha Institute of Technology, China, in 1982 1985, respectively He received the PhD degree in electrical engineering from the Chinese Academy of Space Technology, China, in 1989 From 1989 to 1991, he was a Research Scientist of the Alexer von Humboldt (AvH) Foundation at the University of Bundeswher, Munich, Germany From 1991 to 1995, he was a Research Scientist in the Intelligent Systems Research Lab at University of Saskatchewan, Saskatoon, Canada He was with the SED Systems Inc in Saskatoon, Canada, from 1995 to 1996 as a Design Engineer From 1996 to 1999, he was a Member of Scientific Staff at Nortel Networks in Ottawa, Canada He has been with the Microelectronics Group of Lucent Technologies, Allentown, PA, since 1999 as a Member of Technical Staff He has published more than 30 conference journal papers in the area of neural networks, digital signal processing, control communication systems He holds four US patents (pending) His current research interests include intelligent information systems, digital signal processing with its applications to wireless communications, neural networks its applications to communication control systems Madan M Gupta (M 63 SM 76 F 90) received the BEng (Hons) the MSc in electronicscommunications engineering, from the Birla Engineering College (now the BITS), Pilani, India, in 1961 1962, respectively He received the PhD degree from the University of Warwick, UK, in 1967 in adaptive control systems In 1998, he received an Earned Doctor of Science (DSc) degree from the University of Saskatchewan, Canada for his research in the fields of adaptive control systems, neural networks, fuzzy logic, neuro-control systems, neuro-vision systems, early detection diagnosis of cardia ischemic disease He is currently Professor of Engineering the Director of the Intelligent Systems Research Laboratory the Centre of Excellence on Neuro-Vision Research at the University of Saskatchewan, Canada In addition to publishing over 650 research papers, he has coauthored two books on fuzzy logic with Japanese translation, has edited 21 volumes in the field of adaptive control systems, fuzzy logic/computing, neuro-vision, neuro-control systems His present research interests are exped to the areas of neuro-vision, neurocontrols integration of fuzzy-neural systems, neuronal morphology of biological vision systems, intelligent cognitive robotic systems, cognitive information, new paradigms in information processing, chaos in neural systems He is also developing new architectures of computational neural networks (CNN s), computational fuzzy neural networks (CFNN s) for applications to advanced robotic systems Dr Gupta has served the engineering community worldwide in various capacities through societies such as IFSA, IFAC, SPIE, NAFIP, UN, CANS- FINS, ISUMA He is a Fellow of the SPIE In June 1998, he was honored by the award of the very prestigious Kaufmann Prize of Gold Medal for Research into Fuzzy Logic He has been elected as a Visiting Professor a Special Advisor in the areas of high technology to the European Centre for Peace Development (ECPD), University of Peace, which was established by the United Nations