Input-Output Stability of Recurrent Neural Networks with Delays using Circle Criteria Jochen J. Steil and Helge Ritter, University of Bielefeld, Faculty of Technology, Neuroinformatics Group, P.O.-Box 3, D-335 Bielefeld, Germany, fjsteil,helgeg@techfak.uni-bielefeld.de Abstract We present a frequency domain analysis of additive recurrent neural networks based on the passivity approach to input-output stability. We apply graphical Circle Criteria for the case of normal weight matrices which result in eectively computable stability bounds, including systems with delay. Approximation techniques yield further generalisation to arbitrary matrices. Keywords: recurrent neural network, inputoutput stability, delay, circle criteria. Introduction One strong motivation for research on recurrent neural network (RNN) models is their capability to model arbitrary temporal behaviour. Because there are a number of learning procedures available, which incrementally adapt a RNN to perform a desired transform from time-varying inputs to timevarying outputs [], recurrent networks are found in a number of application areas and frequently used as components in larger systems [3, 9]. In this setting we regard a network as operator acting on inputs and then a basic requirement is to avoid unbounded responses. To assure this we develope criteria for input-output stability, i.e., we bound the function norm of the output relative to the input norm, where input and output functions usually are taken from the space L 2 of square integrable functions. Thus the concept of input-ouput stability is related to the whole time-development of the inputs and outputs and not to internal states of the system [,, 6, 6]. To regard a neural network as input-output system is natural from the point of view of technical application, on the other hand it is articial because neural network models are by denition given in state space with states corresponding to activity of formal neurons. The dynamics of the network are given by a set of dierential equations and we can use Ljapunov methods to nd a globally asymptotic stable (GAS) equilibrium in the state space [2, 4, 7, 8]. However, there is no contradiction between the two approaches, both use dierent mathematical methods to address dierent aspects of the system's stability behaviour. The Laplace transform allows to generate an input-output formulation of a state space model and it is possible to use equivalence theorems to switch between the two concepts [2, 6]. Therefore we believe that the introduction of input-output methods from system theory can enrich the recurrent network stability theory. We take this approach especially because it integrates un-delayed and delayed systems in one common framework, yields graphical Circle Criteria for stability conditions and can easily be applied to timevarying systems also. Time varying systems occur in RNN theory if on-line adaptation of the weight matrix introduces uncertain, changing parameters. To perform such an adaptation it is in general of large interest to know weight ranges in which networks operate stably despite of weight changes within that range. Including this aspect our problem can be stated as a classical absolute stability problem for recurrent networks: Find a class of nonlinear transfer functions and a set of weight matrices, such that for all choices of the transfer function within the class and all weight matrices within the set the resulting network is input-output stable. In Section 2 we derive the input-output formulation for the RNN model and its frequency domain description. In Sections 3 and 4 we introduce the basic tools from nonlinear feedback system theory and in Section 5 we apply them to simple and de-
u e G x = y linear convolution operator. In the time domain, the equation (2) becomes y(t) = (Ge)(t) = (G e)(t) = Z t G(t )e()d; and in either domains we can eliminate e and write the input-output feedback system equation as i x y = G(u (y)): ' i (x) i x Figure. The RNN as feedback circuit with linear forward path G and sector bounded feedback. layed RNN models. In Section 6 we highlight the connections to Lyapunov theory, in Section 7 we give an illustrative example and nally we discuss our results. 2 The input-output framework We derive an input-output formulation for the type of networks with state space equation _x = x + W(x) + ~u; () where W 2 R nn is the weight matrix and the vector of nonlinear transfer functions. The aim is to rewrite the system () as nonlinear feedback circuit (G; ) of the type shown in Fig.. The forward path in Fig. consists of a linear operator G, and is given either in the time domain by a convolution kernel G or in the frequency domain as transfer function G(s). The frequency function G(s) can directly be found from () using the Laplace transform y(s) = x(s) = We(s) = G(s)e(s); (2)! + where e = u (x) and u = W ~u. We denote the output by y = x for compatibility to the usual control theory notation. To obtain the time domain kernel G, which is the impulse response function describing the system's behaviour for a delta peak at t = as input, we apply the inverse Laplace transform G(t) = L (G(s)) and get a corresponding In the feedback path we require the nonlinear function (x(t); t) = (' (x (t); t); : : : ; ' n (x n (t); t)) T to be subject to 'incremental sector conditions' i ' i((t); t) ' i ( (t); t) (t) (t) i ; (3) for all (t); (t) 2 R n, ' i (; t) and i < i 2 R. The sector conditions (3) bound the slope of ' i and restrict its graph to lie between the straight lines y = i x and y = i x as indicated in the lower block in Fig.. We write 2 [A; B]; A = diagf i g; B = diagf i g if belongs to the function class dened by i ; i and (3) and abbreviate this as 2 [; ], if A = I; B = I. As larger sectors cover smaller ones, we always have 2 [A; B] ) 2 [min i ; max i ], i.e., the sector conditions can be made uniform. Apart from the sector conditions we do not require further properties for such as saturation limits, monotony or time-invariance. Finally, also by virtue of (3) solutions exist and are unique [6]. An input-output setting for recurrent networks has previously been used only in Guzelis & Chua [5], where algebraic conditions based on the small gain theorem are presented. The approach taken in [5] diers in the position of the weight matrix in the loop which it is situated in the feedback path and is included in a modied ~ = W. That approach simplies the evaluation of conditions on the forward path but does not yield the graphical methods and simplications we present in this paper. The problem is that sector conditions for ~ = W have to be stated in a multivariable fashion referring to matrix cones and do not simply follow from the sector conditions on, as was incorrectly assumed in [5]. 3 L 2 -stability, passivity and similarity transformations L 2 -stability In the sequel we assume that the input and output functions and the various operators are dened
for the function space L 2 of square integrable functions. L 2 -stability of the loop (G; ) in Fig. means that R L 2 -norm of the output dened by ky(t)k 2 2 = hy(t); y(t)i dt must not exceed the L 2 norm of the input by more than a constant gain factor : Passivity kyk 2 kuk 2 : (4) In the passivity approach to feedback system stability we analyse the loop (G; ) in terms of properties of the feedforward operator G and the feedback regarded as independent of each other. We require both paths to be passive, which in analogy to physical systems can be interpreted to dissipate the system's energy. Formally the passivity conditions on G and are stated as hgx; xi kxk 2 ; hx; xi kxk; (5) where > ; and h; i denotes the scalar product on L 2 : hx; yi = Z hx(t); y(t)i dt The conditions (5) are sucient for L 2 - stability of (G; ) if G is also L 2 -stable: kgxk 2 kxk 2. Similarity transformations It is well known that a coordinate transform z = Px of the system () does not change the sector conditions (3) if P = diagfp i g > [8]. It leads to the input-output system (P GP; ), because only represents the class dened by (3), which does not change. We show now that unitary similarity transformations dene a second important class of matrices which leave uniform sector conditions invariant and thus can be applied if 2 [; ]. Lemma. 2 [; ], UU U is an arbitrary unitary matrix. 2 [; ], where Proof. The Lemma follows from the fact that the sector conditions (3) can be expressed as scalar product D ((x) (x )) (x x ); ((x) (x )) (x x )E : Now we substitute U z = x and multiply both sides of the scalar product by U to get D ( (z) (z )) (z z ); ( (z) (z )) (z z )E where = UU. In view of Lemma we can analyse stability of the loop (G; ) in terms of any loop (G ; ) with unitary transformed G = UGU. b Re [] i(!) Im [] a Figure 2. The Circle Criterion: The eigenloci i (!) must be inside the critical circle for < <. 4 The Circle Criterion To evaluate passivity of G and graphically it is necessary to transform the loop (G; ) by scaling and addition of linear auxiliary operators into a loop (G ; ), which is equivalent with respect to stability in the sense that (G; ) is L 2 -stable if and only if (G ; ) is. Our further development relies on a set of loop transformations introduced in Harris & Valenca ([6], p.222), which result in G = (I + BG)(I + AG) ; = ( A)(B ) : In [6], it is further shown that is in the innite sector [; ] and therefore passive, whereas passivity of G remains to be proven graphically. As the Circle Criterion originally was developed for scalar frequency functions, its application to a multivariate G(!) requires to diagonalise G(!) into diagf i (G(!))g such that the criterion can be applied to the scalar functions i (!) = i (G(!)). The problem is to carry out the diagonalisation without aecting the sector conditions, which is in general only possible if G is normal, i.e. G G = GG. Then G has a full set of orthogonal eigenvectors, can be diagonalised by a unitary matrix
U and, according to Lemma (), we can perform the respective similarity transform UGU without changing the stability behaviour. To state the graphical stability condition it remains to connect the sector bounds ; to the critical circle C( b ; a ) in the complex Re [ i (!)] =Im [ i (!)]-plane shown in Fig.(2). It has its centre on the real line and passes through the points ( b ; ) and ( a ; ) on the real axis. For a! the critical circle degenerates to the abscissa y = b. Re [] jj i(!) Im [] Im [] 2 Theorem. (Circle Criterion) Let G be normal. Then G is passive if (i) all eigenloci i (!) = i (G(!)) lie inside and do not touch the critical circle C( b ; a ) for a < < b, (ii) all eigenloci i (!) lie outside the critical circle for < a < b, where 2 [a; b]. If G is normal, eg symmetric, the real numbers a,b can be chosen a = ; b =, i.e. the sector conditions (3) directly dene the critical circle. To extend the method to non-normal operators G we use an approximation method also proposed in [6]. It replaces in the forward loop the original G by a normal G n using a number of preliminary loop transformations. If the operator (G n G), representing the approximation error, can also be bounded by sector conditions of the type (3), i.e. (G n G) 2 [m; r], then we choose a = (m+ ); m < and b = (r + )( + ), r >, >. It follows that the transformed feedback path is passive by virtue of (3) and (G n G) 2 [m; r] and the circle criterion can be applied to the eigenloci of G n with modied a; b. The respective change in size of the critical circle C( b ; a ) is proportional to the estimation of the approximation error (G n G) by the sector bounds [m; r]. Though there is no systematic procedure to nd for a given G the best normal approximation G n, it can always be chosen as the diagonal, symmetric or antisymmetric part of G. This technique is especially well suited to account for uncertainties and noise in systems with symmetric matrices, because then the approximation error is small and can eg be estimated by the variance of the noise process. 5 Results for RNN The application of the Circle Criterion to the RNN case is especially simple because we make use of the fact that the eigenloci of the forward operator G(!) are circles in the complex plane dependent on the eigenvalues of W only. This graphically simple minre [(!)]! Re [] 2 Figure 3. The eigenvalue circle of the RNN model together with the part! > of its delayed version for = :3. form also allows conclusions for delayed systems of the form _x(t) = x(t) + W(x(t )) + ~u(t) (6) with frequency function G (!) = G(!)e!. The points of the delayed eigenloci are these of the undelayed i (!) rotated around the origin by an amount proportional to! and at every frequency!. The most important properties of the eigenloci are summarised in (i)-(iii), see also Fig.(3). (i) G(!) is normal if W is normal. (ii) The eigenloci i (G n (!)) are circles with centre ( 2 Re [ i(w n )] ; 2 Im [ i(w n )]) and radius 2 j ij denoted by C( i ). (iii) The delayed eigenloci i (!) of G n(!) lie inside circles centred at the origin with radius j i j for all. These properties together with the generalised circle criterion yield the following stability theorems for the recurrent networks in the input-output form of Fig.. Theorem 2. Consider the RNN system (2), where 2 [; ]. Then the system is L 2 -stable, if all circles C( i ) lie inside and do not touch the critical circle C( b ; a ), where a = ; b =, if W is normal, and a = ( r + ), b = (r + )( + ), if W is approximated by a normal W n and max i i (Wn W) T (W n W) r 2.
Theorem 3. (Delays) If all circles centred at the origin with radius j i j are entirely inside the critical circle, then the system is stable for all delays >. If the un-delayed system is stable according to Theorem 2, but not all circles with radius j i j are inside the critical circle, then the system is stable for all delays smaller then a nite max. The main drawback of the method is the need to choose the largest sector [min i ; max i ] for regularisation of the sector conditions. If =, which can be assumed for RNN applications, then Theorem 2 can be modied in order not to loose the information contained in coordinate-wise upper sector bounds i in (3). We rewrite the system () as _x = x + WBB (x) + ~u: Now the modied feedback = B is in the sector [; ] and it holds Theorem 4. Consider the input-output RNN (2) ( the delayed system (6)) with 2 [; B]. Then the system is L 2 -stable if Theorem 2 (3) holds with W replaced by WB, i.e., if WB is normal and the eigenloci i ( i ) is to the left of the abscissa y =, or if WB is approximated by a normal W n, the approximation error is in [ r; r] and the graph of i ( i ) lies to the left of y = ( + r)( + ) for some >. Stable weight ranges To dene a suitable set of weight matrices, which parametrise a manifold of stable systems, we assume that a network is stable for all in some class [; ]. If we now employ a nonlinearity ~, which is known to be of class [; ], where >, we can dene an interval [k min ; k max ] such, that for all matrices K = diagfk i g; k i 2 [k min ; k max ] the product K~ remains in the original class [; ], i.e., parametrises a stable system. We rewrite the system () as _x = x + WK~(x) + ~u = x + ~W ~(x) + ~u: (7) and nd that the system (7) is stable for all weight matrices ~W(t) in the matrix set M : = fw jw = diagfk i gw; k i 2 [k min ; k max ]g: (8) 6 Relation to Lyapunov theory In state space the stability concept corresponding to input-output stability is global asymptotic stability of an equilibrium, which can be taken to be the origin without loss of generality. Recently a number of strong conditions based on matrix measures have been developed [2, 4, 7, 8] which give sharper stability bounds than earlier results for matrix norms reviewed eg in [7]. All these results rely on time-invariant feedback (x(t); t) = (x(t)) and explicit construction of Lyapunov functions. It can be shown that these results can as well be derived in the input-output framework [4] using a multivariable Popov criterion and the Kalman- Yakubovich-Lemma [2], even if the weight matrix is not invertible. In general there is a far reaching equivalence between the input-output and the state space approaches. If for a given u(t) the input-output system is L 2 -stable, incrementally bounded and the state space is uniformly observable and reachable, then the corresponding solution trajectory in state space is globally asymptotically stable, regardless whether it is an equilibrium or a non-stationary solution. If W is invertible the assumptions are satised by the RNN models by virtue of the sector bounds (3). Thus implicitly our Theorems (2-4) provide also new conditions for state space systems. Theorem 5. Assume the system (2) with timevarying feedback (x(t); t) is L 2 -stable according to Theorems (2-4) and let a corresponding stable weight matrix set M be dened by (8). Then for the time-varying state space system () under zero input ~u(t) = the origin is global asymptotically stable and under non-zero ~u(t) the corresponding state-space trajectory is globally asymptotically stable for all W(t) 2 M. The analysis of delays with Ljapunov methods is quite dicult and has been included in more general studies on systems with uncertain parameters with known bounds, which correspond to the sector bounds (3). These systems are regarded to dene a polytope in the linear matrix space and stability for the whole class can be proven by parallel solving of a Lyapunov equation for every corner of that polytope [7, 3]. This leads to complicated proofs and much computational eort, eg in [7] a large number of 2 n auxiliary linear systems must be checked for stability or simplications are made which result in conservative inequalities. 7 Example We consider a simple example where W n is normal for every choice of the parameters p; q; r, but
3.5 C( :25; ) C( 3 ) :5 (!).8.6.4 - -.5.5 C( 3 ) = :6 -.5 - C( ) Figure 4. The eigenvalue circles ;3 (!) dene the stability range through the abscissa = :6 or the critical circle for = ; = :25..2 - -.8 -.6 -.4 -.2.2.4! e! -.2 -.4 -.6 Figure 5. The 3 circle with and without delay = :5. The delayed loop is stable for =, = :9. is neither symmetric nor antisymmetric, W n = r q p 4 4 B r p q C @ q p r A ; i:e: B 4 4 C @ A 4 4 p q r 4 4 with eigenvalues ;2 = and 3;4 = ( ). 2 As complex conjugate eigenloci lead to the same circle conditions we show in Fig. 4 the eigenvalue circles C( ) and C( 3 ) only, together with the critical circle corresponding to the minimal = and the critical line for =. From Fig. 5, we see how in the delayed case for = :5 the sector dened by shrinks. The Fig.5 also illustrates Theorem 3, from which it follows that for = p 5 = max j 2 i j the system is stable for all delays. To apply the normal approximation method we add to W a disturbance matrix W with jw ij j < :3. The matrix W denes the approximation error W = (G!+!+ n G) in the feedback path and is in the symmetric sector [ :2; :2]. In the table we show some sectors derived from the various Criteria presented above. Obviously the larger we choose, the larger will be. However, from Fig. 4 it is obvious that the largest overall sector is achieved if is at its minimum. 8 Discussion We provide an input-output framework to introduce a number of powerful methods of non-linear System a b W n.94 W n -.88 W n -.2.93 W n, all > p 2 p 2 5 5 W n ; = :3.924 W n ; = :5.97 W n ; =.9 W n + W.85 W n + W -.89.8 Figure 6. Stability sectors for dierent systems with and without delays and disturbances. system theory into the research on recurrent network stability. The main gain lies in the possibility to handle time-invariant and time-varying systems with and without delay all in a unied manner. Further conceptually simple manipulations of the input-output loop provide many tools to reshape the system for better application of the theory, which is much simpler than developing a suitable Lyapunov function for every special case. We show that the input-output theory also enriches the much more developed state space theory. Especially interesting is the link to on-line weight adaptation provided by the transfer of the degree of freedom in the choice of the non-linear feedback to the freedom to choose weights from some stable sets. We demonstrated a simple and eective graphical method to nd such stability ranges for normal matrices and normal approximations of
general matrices. Further research will concentrate on the task to nd structurally `well behaved' networks, which yield large stability sectors, which includes to develop techniques for nding good normal approximations. Developing the approach we hope to draw more on the known methods of feedback design in control theory, eg to add stabilisators to a neural controller to give more internal freedom in the choice of the weights. References [] C. Desoer and M. Vidyasagar. Feedback Systems: input-output properties. Academic Press, New York, 975. [2] Y. Fang and T. G. Kincaid. Stability analysis of dynamical neural networks. IEEE Tansactions on Neural Networks, 7(4):996{5, 996. [3] Y. Fang, K. A. Lopardo, and X. Feng. Sucient conditions for the staiblity of interval matrices. Int. J. Control, 58(4):969{977, 993. [4] M. Forti and A. Tesi. New conditions for global stability of neural networks with application to linear and quadratic programming problems. IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, 42(7):354{366, 995. [5] C. Guzelis and L. Chua. Stability analysis of generalized cellular neural networks. International Journal of Circuit Theory and Applications, 2():{33, 993. [6] C. Harris and J. Valenca. The Stability of Input- Output Dynamical Systems. Academic Press, London, 983. [7] X. Liang and L. Wu. Global exponential stability of hopeld-type neural network and its applications. Science in China (Series A), 38(6):757{768, 995. [8] K. Matsuoka. Stability conditions for nonlinear continuous neural networks with asymmetric connection weights. Neural Networks, 5:495{5, 992. [9] K. S. Narendra. Neural networks for control: Theory and practice. Proceedings of the IEEE, 84():385{, 996. [] K. S. Narendra and J. H. Taylor. Frequency Domain Criteria for Absolute Stability. Academic Press, New York, 973. [] B. A. Pearlmutter. Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Tansactions on Neural Networks, 6(5):22{228, 995. [2] A. Rantzer. On the kalman-yakubovich-popov lemma. Systems & Control Letters, 28:7{, 996. [3] J.-J. C. Slotine and R. M. Sanner. Stable adaptive control of robot manipulators using neural networks. Neural Computation, 7(4):753{79, 995. [4] J. J. Steil and H. Ritter. Input-output vs. Lyapunov stability for continuous time recurrent neural networks. 998. submitted to NIPS 98. [5] K. Tanaka. An approach to stability criteria of neural-network control systems. IEEE Transactions on Neural Networks, 7(3):629{642, 996. [6] M. Vidyasagar. Nonlinear Systems Analysis. Prentice Hall, second edition, 993. [7] K. Wang and A. M. Michel. Stability analysis of dierential inclusions in banach space with applications to nonlinear systems with time delays. IEEE Transactions on Circuits and Systems-I:Fundamental Theory and Applications, 43(8):67{626, 996.