Embedding Recurrent Neural Networks into Predator-Prey Models Yves Moreau, St phane Louies 2, Joos Vandewalle, L on renig 2 Katholieke Universiteit Le

Size: px

Start display at page:

Download "Embedding Recurrent Neural Networks into Predator-Prey Models Yves Moreau, St phane Louies 2, Joos Vandewalle, L on renig 2 Katholieke Universiteit Le"

Gilbert Ryan
5 years ago
Views:

1 Katholieke Universiteit Leuven Departement Elektrotechniek ESAT-SISTA/TR Embedding Recurrent Neural Networks into Predator-Prey Models Yves Moreau 2, Stephane Louies 3, Joos Vandewalle 2, Leon renig 3 January 998 This report is available by anonymous ftp from ftp.esat.kuleuven.ac.be in the directory pub/sista/moreau/reports/. 2 ESAT - Katholieke Universiteit Leuven, Kardinaal Mercierlaan 94, 3 Leuven (Heverlee), elgium, tel +32/6/32 8 6, fax +32/6/ , yves.moreauesat.kuleuven.ac.be, www: 3 SO - Universite Libre de ruxelles, ampus Plaine, P 234, lvd du Triomphe, -5 ruxelles, elgium, tel +32/2/ , fax +32/2/65 5 3, lbrenigulb.ac.be, www:

2 Embedding Recurrent Neural Networks into Predator-Prey Models Yves Moreau, St phane Louies 2, Joos Vandewalle, L on renig 2 Katholieke Universiteit Leuven Department of Electrical Engineering Kardinaal Mercierlaan 94-3 Leuven, elgium 2 Universit Libre de ruxelles oulevard du Triomphe -5 ruxelles, elgium Running Title: Embedding Recurrent Neural Networks Acknowledgments Yves Moreau is a Research Assistant with the F.W.O.-Vlaanderen. Joos Vandewalle is Full Professor at the K.U.Leuven. Their work is supported by several institutions: the Flemish Government, GOA-MIPS (Model-based Information Processing Systems); the FWO Research ommunity, IoS (Identication and ontrol of omplex Systems); the elgian State, DWT - Interuniversity Poles of Attraction Programme (IUAP P4-2 (997-2): Modeling, Identication, Simulation and ontrol of omplex Systems. L on renig and St phane Louies are grateful to the ATHODE ESPRIT Working Group (WG #2449) for scientic and nancial support. They would like also to thank A. Figueiredo and T. M. da Rocha from the Physics Departement of the University of rasilia for fruitful scientic discussions. January 998 Requests for reprints should be addressed to Yves Moreau, Department of Electrical Engineering, Katholieke Universiteit Leuven, -3 Leuven, elgium. address: yves.moreauesat.kuleuven.ac.be.

3 Abstract We study changes of coordinates that allow the embedding of the ordinary dierential equations describing continuous-time recurrent neural networks into dierential equations describing predator-prey models also called Lotka-Volterra systems. We do this by transforming the equations for the neural network rst into quasi-monomial form (renig, 988), where we express the vector eld of the dynamical system as a linear combination of products of powers of the variables. In practice, this transformation is possible only if the activation function is the hyperbolic tangent or the logistic sigmo d. From this quasi-monomial form, we can directly transform the system further into Lotka-Volterra equations. The resulting Lotka-Volterra system is of higher dimension than the original system, but the behavior of its rst variables is equivalent to the behavior of the original neural network. We expect that this transformation will permit the application of existing techniques for the analysis of Lotka-Volterra systems to recurrent-neural networks. Furthermore, our result shows that Lotka-Volterra systems are universal approximators of dynamical systems, just as continuous-time neural networks. Keywordsontinuous-time neural networks, Equivalence of dynamical systems, Lotka-Volterra systems, Predator-prey models, Quasi-monomial forms. 2

4 Introduction Although major advances have been made, the eld of application of dynamical system theory to recurrent neural networks still has much uncharted territory. ohen and Grossberg studied a large class of competitive systems (which include many types of neural networks) (ohen & Grossberg, 983) with a symmetric interaction matrix for which they were able to construct a Lyapunov function guaranteeing global convergence. Hirsch considered cascades of networks and proved a cascade decomposition theorem (Hirsch, 989) giving conditions under which convergence of the individual networks in isolation guarantees the convergence of the whole cascade. For systems with inputs, Sontag, Sussman, and Albertini have studied their property from the point of view of nonlinear control system theory (Albertini & Sontag, 993, Sontag & Sussman, 997). ut for the analysis of many properties of recurrent networks, simple computational tools are still missing. Recently, we have started to investigate how changes of coordinates allow us to nd equivalences between dierent neural networks. We have shown in (Moreau & Vandewalle, 997) how to use linear changes of coordinates to compute a transformation that maps a continuous-time recurrent neural network with a hidden-layer onto a neural network without hidden-layer and with an extra output map. In this paper, we will show how to embed dierent types of neural networks (having the hyperbolic tangent or the logistic sigmo d as activation function) into predator-prey models (also called Lotka-Volterra systems) of the form: _z i = i z i + z i m X M ij z j ; with j = ; : : : ; m: The Lotka-Volterra system will be of higher dimension than the original system but its n rst variables will have a behavior equivalent or identical to that of the original system (if n was the number of variables of the original system). Such a Lotka-Volterra representation of neural networks is of interest for several reasons. Lotka-Volterra systems are a central tool in mathematical biology for the modeling of competition between species and a classical subject of dynamical system theory (MacArthur, 969, May & Leonard, 975). They have therefore been the object of intense scrutiny. These systems have simple quadratic nonlinearities (not all quadratic terms are possible in each equation), which might be easier to analyze than the sigmo dal saturations of the neural networks. We thus expect that this representation will allow us to apply methods and results from the study of Lotka-Volterra systems to neural networks. Indeed, early work by Grossberg (Grossberg, 978) exploited one of the changes of variables that we present here to study convergence and oscillations in generalized competitive networks and generalized Lotka-Volterra systems. Furthermore, since we know that dynamical neural networks can approximate arbitrary dynamical systems (Sontag, 992, Funahashi & Nakamura, 993) for any nite time because of their equivalence with dynamical single-hidden-layer perceptrons (Moreau & Vandewalle, 997), our result also serves as a simple 3

5 proof that Lotka-Volterra systems enjoy the same approximation property; and we might want to investigate them further from that point of view. We have structured this work as follows. We will rst consider the simple case of a dynamical perceptron. We will transform it into what we call a quasimonomial form in a rst step. This form will allow us to transform the system further into a Lotka-Volterra system. Second, we will consider the case of the dynamical perceptron with self-connections that is, where each neuron has a linear connection to itself. We will perform the same transformations rst into a quasi-monomial form and then into a Lotka-Volterra system. Finally, we will show that the same transformations apply to much more general types of continuous-time recurrent neural networks with hyperbolic tangent or logistic activation functions. In Appendix A and, we describe in details the structure of the quasi-monomial forms for the dynamical perceptron and the dynamical perceptron with self-connections. In Appendix, we clarify what the exact constraints on the activation function are and we show that both the hyperbolic tangent and the logistic sigmo d satisfy these. 2 The dynamical perceptron We consider rst an n-dimensional neural network of the following form, which we call a dynamical perceptron: where x 2 R n ; A 2 R nn ; a 2 R n ; and ~ is dened as _x = ~(Ax + a ); () ~(x) = ((x ); (x 2 ); : : : ; (x n )) T with a continuous sigmo dal function from R to R such that lim x!? (x) = a and lim x! (x) = b (a; b 2 R and a < b). We consider this type of system because its simplicity makes our calculations easier to follow, yet it permits us to illustrate the essential points of our method. In this paper, we do all developments for the special case of (x) = tanh(x). Another possibility is the use of the logistic sigmo d (x) = =( + e?x ). We will later discuss what constrains the choice of the activation function. For the dynamical perceptron (), the choice of the logistic sigmo d would be meaningless since the vector eld would then have positive components everywhere and the topology of the ow would be trivial (the hyperbolic tangent guarantees on the other hand the existence of an equilibrium). For the types of networks we describe in the following sections, this restriction to the hyperbolic tangent does not apply. ut since the developments depend on the choice of the activation function, we limit ourselves to the hyperbolic tangent; all developments can be done similarly for the logistic sigmo d. We rst apply the change of coordinates y = Ax + a (often called Sigma-Pi transformation (Grossberg, 988)) and get the following equation: _y = A(y): 4

6 As long as A is invertible (A is n n), we can directly revert the change of coordinates. Moreover, if A is singular, we in fact have an r-dimensional dynamics where r is the rank of the matrix A. Indeed, if we look at two dierent trajectories x(t) and ~x(t) of the system _x = (Ax+a ) having the property that their initial conditions x and ~x are such that Ax = A~x, the initial conditions y = Ax + a and ~y = A~x + a for the system _y = A(y) are equal. This means that y(t) = ~y(t) for all times. ut the vector eld of the initial system is _x = (Ax + a ) = (y), thus _x(t) = _~x(t) for all times. This means that the trajectories x(t) and ~x(t) through x and ~x always remain parallel. Since the kernel of A is of dimension n? r, this means that the dynamics is in fact r-dimensional. If we dene (x ) = x? A y Ax where A y is the Moore-Penrose pseudo-inverse of A, then x(t) = A y (y(t)? a ) + (x ) with the initial condition y = Ax + a. From here on, it will be necessary to look at the equations coordinate by coordinate, so that we have _y i = A ij (y j ); with i = ; : : : ; n: (2) Now, we decide to apply the change of coordinates z i = (y i ); we get _z i = (y i ) _y i : (3) It is here that we make use of our special choice of activation function, (y) = tanh(y): The essential property of the activation function we need is that we can express its derivative as a polynomial of the activation function: (y) = tanh (y) = = cosh 2 (y) =? tanh 2 (y) =? 2 (y): (4) Substituting (4) and (2) in Equation 3, we have _z i = (? z 2 i ) n X A ij z j : (5) We see that this form begins to resemble the Lotka-Volterra model except for the factor (? z 2 i ) instead of z i. This means that the system is polynomial of degree three. However, the vector eld of the system has now become a linear combination of products of powers of the variables, which is what we call a quasi-monomial form. Example We will follow our developments at the hand of an example to clarify the procedure. We look at the case n = 3. The equations of the recurrent neural network are 8 < : _x = tanh(a x + A 2 x 2 + A 3 x 3 + A ) _x 2 = tanh(a 2 x + A 22 x 2 + A 23 x 3 + A 2 ) _x 3 = tanh(a 3 x + A 32 x 2 + A 33 x 3 + A 3 ) 5

7 We can check that the equations become, after the ane and sigmo dal changes of coordinates, 8 < : _z = (? z 2)(A z + A 2 z 2 + A 3 z 3 ) _z 2 = (? z 2)(A 2 2z + A 22 z 2 + A 23 z 3 ) _z 3 = (? z3 2 )(A 3z + A 32 z 2 + A 33 z 3 ) 3 Transformation into a quasi-monomial form y quasi-monomial form, we mean a dynamical system that we can write as _z i = z i ( rx (6) A ij n Y k= z jk k ); with i = ; : : : ; n: (7) The number of dierent products of powers of the variables is r and we dene 2 R rn. It is important to note that the exponents jk are not restricted to integer values, hence the name quasimonomial form. Now, to reach this form, we write Equation 5 as A = A ij 2 R nr and = jk _z i = z i [ A ij (z? i z j? z i z j )]: (8) We then need to collect all possible monomials and use their exponents to construct the matrix and collect all the coecients of the monomials correspondingly to form the matrix A. We do this in a systematic fashion in Appendix A; and we illustrate it on the following example. renig (renig, 988) has previously shown that we can transform such a quasi-monomial form into a Lotka-Volterra system. Example (cont'd) To illustrate this somewhat obtuse notation (7), we take again the case n = 3. Expanding the expression (6), we get 8 < : _z = z (A? A z 2 + A 2z? z 2? A 2 z z 2 + A 3 z? z 3 + A 3 z z 3 ) z 2 3? A 23 z 2 z 3 ) 3? A 32 z 2 z 3 + A 33? A 33 z3 2) _z 2 = z 2 (A 2 z z? 2? A 2 z z 2 + A 22? A 22 z A 23z? _z 3 = z 3 (A 3 z z? 3? A 3 z z 3 + A 32 z 2 z? We see that there are 2n 2 = 8 terms, but only r = (3n 2? n + 2)=2 = 3 dierent monomials. The matrix is then = while the matrix A is A = 2?? 2?? 2?? T A ;?A?A 2?A 3 A 2 A 3 A?A 2?A 22?A 23 A 2 A 23 A 22?A 3?A 32?A 33 A 3 A 32 A 33 6 A :

8 How should we interpret these matrices? If we look at the rst row of, we have a row vector (2; ; ) (note that the notation above uses a transposition); and if we look at the rst column of A, we have a column vector (?A ; ; ) T. This means that the expression for the system contains a monomial of the form zz 2 2z 3 = z2 ; and that it appears in the rst equation with a coecient of?a and is absent from the second and third equations. hecking this for the whole P3 Q matrices, we indeed have _z i = z 3 i A ij k= z jk with i = ; 2; 3: 4 Transformation into a Lotka-Volterra system The whole transformation relies now on a simple trick: We look at each product of powers of the variables that appear in the quasi-monomial form as a new variable of our dynamical system. With r the number of dierent monomials in the quasi-monomial form, we dene the supplementary variables z n+ ; : : : ; z n+r as z n+k = ry l= z kl l ; with k = ; : : : r: If we now look at the system of n + r variables _z i, taking into account these new denitions, we have for the rst n variables _z i = z i rx A ij z n+j ; with i = ; : : : n: ecause of our special denition of the new variables, many simplications occur in the expression of their time derivative, as seen in Equation 9. We rst use the formula for the derivative of a product (). Then we try to reconstruct the original variables since they are almost preserved by the derivation (); the necessary correction leads to the appearance of a logarithmic derivative (2), which in turn transforms the exponent of the monomials into multiplicative coecients (3). Lastly, the remaining correction produces the new expression of the vector elds of the rst n variables, except for the disappearance of the multiplicative variable at the front (4). Writing these computations down, we have, for the r supplementary variables, _z n+j = ( = = ny k= l= l= ( k z jk k ) with j = ; : : : ; r (9) ny k = k 6= l z jk k )(z jl l ) () z n+j =z jl l (z jl l ) () 7

9 X n = z n+j (ln(z jl l )) (2) l= = z n+j n X l= jl _z l =z l (3) = z n+j n X l= = z n+j rx m= jl (( l= rx m= A lm z n+m (4) jl A lm )z n+m ) (5) So, we can regroup all these terms by dening the (n + r) (n + r) matrix M as O nn A M = O rn A since P n l= jl A lm = ( A) jm ; and the total system becomes then of the form X n+r _z i = z i M ij z j ; with i = ; : : : ; n + r; which is precisely an n + r-dimensional Lotka-Volterra system! Supposing we were to have a solution z(t) = (z (t); z 2 (t); : : : ; z n+r (t)) T, we could then retrieve the behavior of the original system by transforming back the n rst variables (if the matrix A is invertible): x(t) = A? (~? ((z (t); : : : ; z n (t)) T )? a ): If the matrix is not invertible, we can apply the argument from Section 2 to retrieve the solution. 5 The dynamical perceptron with self-connections One criticism that we can formulate about the dynamical perceptron is that its global behavior is often similar to the behavior around its xed point. Although we can obtain complex dynamics by a careful choice of parameters, this is a delicate operation. The dynamics of such systems appear to be essentially limited. One way to obtain complex oscillatory behaviors more easily is to add a proportional self-connection to each variable: _x i =? i x i + ( A ij x j + a i ): (6) The reason why we can obtain oscillatory behaviors more easily is that we can choose A to make one of the xed points unstable and then choose the i 8

10 suciently large so that the behavior of the system always remain bounded. A further remark is that this kind of system can have multiple isolated equilibria, while the previous system either had to have a single equilibrium or a connected innity of equilibria. The dynamics of the systems with inhibitory connections is thus richer. Also, the activation function for this system is not limited to the hyperbolic tangent anymore; in Appendix, we discuss other possible choices and conclude that we could use the logistic sigmo d as well. We will embed this system into a Lotka-Volterra system. The rst step is to dene the following auxiliary variables: w i = ( A ij x j + a i ): Looking at the derivative of w, we then have the following dynamics for the system of x and w variables: _x i = w i? i x i _w i = (? w 2 i ) n X A ij (w j? j x j ) This dynamics is essentially in quasi-monomial form and we can rewrite it as: _x i = x i (? i + x? i w i ) X n _w i = w i (? j A ij x j w? i? j A ij x j w i + A ij w? i w j + A ij w i w j ) In Appendix, we show how to write the matrices A and for the quasimonomial form of this system. Again, we simply have to carefully collect all the exponents of the monomials in the matrix and all their coecients in the matrix A. After this transformation into quasi-monomial form, the transformation of Section 4 directly applies in the same fashion except that the variable x is directly part of the Lotka-Volterra system and does not need to be retrieved by a change of coordinates (this advantage is paid by having a larger Lotka-Volterra system than for the dynamical perceptron). 6 Embedding a dynamical multi-layer perceptron into a Lotka-Volterra system The transformations we have showed for the dynamical perceptron and the dynamical perceptron with self-connections apply also to the dynamical Multi- Layer Perceptron (MLP). To avoid an unnecessary clutter of notation, we will take the specic example of a two-hidden layer continuous-time recurrent MLP: _x = (((Ax + a ) + b ) + c ) 9

11 where we have n states, p hidden units in the rst layer, q hidden units in the second layer, and the activation function is the hyperbolic tangent. This means x(t) 2 R n ; A 2 R pn ; 2 R qp ; 2 R nq ; a 2 R p ; b 2 R q ; c 2 R n. The argument applies to any kind of MLP by doing the same type of computations again. If we write this coordinate by coordinate, we get _x i = ( qx l= il ( px k= lk ( A kj x j + a k ) + b l ) + c i ); with i = ; : : : ; n: We dene the following vector of p new variables: y = (Ax + a ). Again, we will want to discuss the rank of the matrix A. If the matrix A is of rank n, then we can uniquely retrieve x from y = (Ax +a ). If the rank r of the matrix A is inferior to n, we have the same argument as in Section 2: Two initial conditions x and ~x such that Ax = A~x will have the same dynamics in the new variables y(t) = ~y(t) for all times, thus x(t) = (( y(t) + b ) + c ) = ~x(t) for all times. This means that the trajectories x(t) and ~x(t) through x and ~x always remain parallel. Since the kernel of A is of dimension n?r, this means that P the dynamics p is in fact r-dimensional. The dynamics of the variables y i = ( A ijx j +a i ) is thus _y i = ( A ij x j + a i )( = (? y 2 i )( n X A ij ( qx l= A ij _x j ) = (? 2 ( jl ( px k= lk y k + b l ) + c j )) A ij x j + a i ))( A ij _x j ) We can remark that the form of the dynamics of y closely resembles that of the dynamics of x; but that the rst hidden layer of the MLP has disappeared. This encourages us to dene, in the same fashion as before, a P new vector of q variables: p z = (y + b ). The dynamics of the variables z i = ( k= iky k + b i ) is thus _z i = ( px k= = (? z 2 i )( px ik y k + b i )( px k= k= ik (? y 2 k )( n X px ik _y k ) = (? zi 2 )( ik _y k ) A kj ( qx l= k= jl z l + c j ))) Enthusiastically, we pursue further by dening a new vector of n variables: w = (z + c ). The dynamics of the variables w i = ( P q l= ilz l + c i ) is thus _w i = ( qx l= il z l + c i )( qx l= qx px _w i = (? wi 2 )( il (? zl 2 )( l= qx il _z l ) = (? wi 2 )( il _z l ) l= k= lk (? y 2 k )( n X A kj w j )))

12 Now, substituting the denitions for y; z; w in the previous equations and noting that, because of the same denitions, _x = w, we get the following system: _x i = w i _y i = (? y 2 i )( n X A ij w j ) _z i = (? z 2 i )( px X n ik (? yk 2 )( k= qx px _w i = (? wi 2 )( il (? zl 2 )( l= A kj w j )) k= lk (? y 2 k )( n X A kj w j ))) The dynamics of the rst n variables of this (n + p + q + n)-dimensional system (with appropriate initial conditions: x = x ; y = (Ax + a ); z = (y + b ); w = (z + c )!) will be identical to the dynamics of the original system. Further, we can write this system in quasi-monomial form: _x i = x i (x i? w i ) _y i = y i ( _z i = z i ( _w i = w i ( px k= qx A ij (y? i w j? y i w j )) px l= k= ik A kj (z? i w j? y 2 kz? i w j? z i w j + y 2 kz i w j )) il lk A kj (? y 2 k )(? z2 l )(w? i? w i )w j ) We see that we have written the ordinary dierential equation for each variable as a product of the variable itself by a linear combination of products of the variables, which is precisely the denition of the quasi-monomial form. A last remark is that the convergence of x implies the convergence of the enlarged system (x; y; z; w) since _x = implies _y = ; _z =, and _w =. The main dierence is that, to check the convergence of the original system, we only need to check that all the trajectories of the enlarged system starting on the manifold of initial conditions m(x ) = (x ; (Ax + a ); ((Ax + a ) + b ); (((Ax + a ) + b ) + c )) converge. The transformation into a Lotka-Volterra system proceeds exactly in the same way as in Appendix A and. However, writing down the matrices A and is a lengthy and tedious procedure and does not bring any further insight. We therefore do not perform it here. This computation will be helpful only when we will use a computer algebra system to analyze the resulting Lotka-Volterra system.

13 7 Discussion and conclusion An important corollary of this embedding of continuous-time recurrent multilayer perceptrons into higher-dimensional Lotka-Volterra systems is that we can use Lotka-Volterra systems to approximate arbitrarily closely the dynamics of any nite-dimensional dynamical system for any nite time. The argument for this universal approximation property is exactly the same as in (Funahashi & Nakamura, 993) and says that, for any given nite time, the trajectories of two suciently close dynamical systems remain close (this is true only for a nite time); this statement is based on the classical use of the Gronwall inequality for dynamical systems. It could therefore be interesting to use Lotka-Volterra systems as approximators for dynamical systems. The restriction presented in Appendix on the choice of the activation function (which is in practice limited to the hyperbolic tangent or the logistic sigmo d) is theoretically strong: There is no guarantee that systems with dierent choices of activation function have qualitatively the same behavior. Indeed, Sontag and Sussman (Sontag & Sussman, 997) have shown that controllability properties of systems with inputs could dier, even for dierent types of monotone increasing sigmo ds. However, the limitation is fairly weak in the sense that the possible activation functions are the ones most commonly used and that we have universal approximation properties for dynamical systems. We have presented techniques to embed dierent types of neural networks (dynamical perceptrons, dynamical perceptrons with self-connections, and multilayer perceptrons) into predator-prey (Lotka-Volterra) systems. These techniques relied on looking at the dynamics of the output of every neuron in the network and using the special properties of the derivative of the hyperbolic tangent or logistic sigmo d activation function to write the system as what we called a quasi-monomial form that is a dynamical system where the vector eld is a linear combination of products of powers of the state variables. Once we had this quasi-monomial form, we looked at the dynamics of these products of powers of the state variables to obtain directly a Lotka-Volterra system. Our expectation is that these techniques will be useful to migrate techniques for the analysis of Lotka-Volterra systems into the eld of dynamical neural networks for example, techniques to study convergence or stability properties of dynamical systems, or discover rst integrals of motion. This is our next objective. References Albertini, F., & Sontag, E.D. (993). For Neural Networks Function Determines Form, Neural Networks, 6, renig, L. (988). omplete Factorization and Analytic Solutions of Generalized Lotka-Volterra Equations. Physics Letters A, 33 (78),

14 ohen, M. A., & Grossberg, S. (983). Absolute Stability of Global Pattern Formation and Parallel Memory Storage by ompetitive Neural Networks. IEEE Transactions on Systems, Man, and ybernetics, SM-3 (5), Funahashi, K.-I., & Nakamura, Y. (993). Approximation of Dynamical Systems by ontinuous-time Recurrent Neural Networks. Neural Networks, 6, 886. Grossberg, S. (978). Decisions, Patterns, and Oscillations in Nonlinear ompetitive Systems with Applications to Volterra-Lotka Systems. Journal of Theoretical iology, 73, 3. Grossberg, S. (988). Nonlinear Neural Networks: Principles, Mechanisms, and Architectures. Neural Networks, (), 76. Hirsch, M. W. (989). onvergent Activation Dynamics in ontinuous Time Networks. Neural Networks, 2, MacArthur, R. H. (969). Species Packing, or what ompetition Minimizes. Proceedings of the National Academy of Sciences, USA, 54, May, R. M., & Leonard, W. J. (975). Nonlinear Aspects of ompetition between Three Species. SIAM Journal of Applied Mathematics, 29, Moreau, Y., & Vandewalle, J. (997). When Do Dynamical Neural Networks with and without Hidden Layer Have Identical ehavior? (Tech. Rep. ESAT- SISTA TR97-5). Katholieke Universiteit Leuven: Department of Electrical Engineering, elgium. Sontag, E. D. (992). Neural Nets as System Models and ontrollers. Proceedings Seventh Yale Workshop on Adaptive and Learning Systems, Sontag, E. D., & Sussman, H. J. (997). omplete ontrollability of ontinuous-time Recurrent Neural Networks, Systems and ontrol Letters, 3, Appendix A: Transformation of the dynamical perceptron into a quasi-monomial form We derive here the structure of the matrices A and of the quasi-monomial form for the dynamical perceptron in a systematic fashion. Again, we write the equations as _z i = z i [ A ij (z? i z j? z i z j )]: (7) We see that the expression between brackets contains 2n monomials; for the n dierential equations this gives a total of 2n 2 terms. There are two types 3

15 of monomials: z i z j and z i? z j. The rst type of monomial can take two different forms: zi 2 (when i = j) or z iz j with j > i (when i 6= j). There are n monomials of the rst form and n(n? )=2 monomials of the second form. The second type of monomials can also take two dierent forms: (when i = j) or z i? z j (when i 6= j). There is one monomial of the rst form and n(n? ) monomials of the second form. This gives a total of (3n 2? n + 2)=2 dierent monomials to share among 2n 2 terms. arefully collecting all these terms into row vectors of exponents to form the matrix and all their corresponding coecients into column vectors to form the matrix A, we can write these matrices as = h ()T (2)T : : : (n)t D ()T D (2)T : : : D (n)t O nti T (8) A = h i S () S (2) : : : S (n) R () R (2) : : : R (n) V (9) where we dene the sub-matrices (i) ; D (i) ; O n ; and P (i) ; R (i) ; V as follows. In general, the matrix O nm is a matrix of zeros of dimension n m and the matrix I nm is a diagonal matrix of ones of dimension n m. The matrix (i) represents all the monomials of the form zi 2 and z i z j (with j > i) and thus the dimension of the matrix depends on the index i with the matrix n? i + rows and n columns. We can write it as (i) = : : : 2 : : : O (n?i)(i?). I (n?i)(n?i) A : Each of its rows is associated with the corresponding column of the matrix P (i), which collects all the coecients of the monomials zi 2 and z i z j (with j > i). Such a matrix has again variable dimension with n rows and n? i + columns. We can write it as P (i) = O (i?)(n?i+)?a ii?a i(i+)?a i(i+2) : : :?A in?a (i+)i : : :.?A (i+2)i : : :?A ni Turning back to the matrix of exponents, we collect all the exponents of the monomials of the form z? i z j (with j 6= i) in the matrix D (i) which has always A : 4

16 n? rows and n columns: D (i) = I (i?)(i?)?. O (n?i)(i?)..? O (i?)(n?i) I (n?i)(n?i) A : We collect the coecients of these monomials in the corresponding columns of R (i), which also has n? rows and n columns: R (i) = O (i?)(n?) A i : : : A i(i?) A i(i+) : : : A in O (n?i)(n?) We are now only left with the constant monomial (which we represent by the row vector O n ) and its coecients, which we collect in the column vector V = A ; : : : ; A nn T : Taking all this notation into consideration, we can represent the matrices A and according to Equations 8 and 9. Example We now present the dierent matrices for the case of a four dimensional system. This claries how we have collected the dierent terms in the matrices A and. The sub-matrices are () ; (2) ; (3) ; (4) and D () ; D (2) ; D (3) ; D (4) for : () = (3) = D () = D (3) = 2 2?????? A ; (2) = 2 A : A ; ; (4) =? 2 ; A ; D (2) = A ; D (4) =?????? A ; A : 5

17 We have decomposed the matrix A into the sub-matrices P () ; P (2) ; P (3) ; P (4) ; R () ; R (2) ; R (3) ; R (4) ; and V : P () = P (2) =?A?A 2?A 3?A 4?A 2?A 3?A 4 R () = R (3) =?A 22?A 23?A 24?A 32?A 42 A 2 A 3 A 4 A 3 A 32 A 34 A ; P (3) = A ; P (4) = A ; R(2) = A ; R(4) = V = (A ; A 22 ; A 33 ; A 44 ) T :?A 33?A 34?A 43?A 44 A 2 A 23 A 24 A 4 A 42 A 43 A ; A ; A ; A ; Appendix : Transformation of the dynamical perceptron with self-connections into a quasi-monomial form We recall that for the system (6), the quasi-monomial form is _x i = x i (? i + x? i w i ) _w i = w i ( P n (? ja ij x j w? i? j A ij x j w i + A ij w? i w j + A ij w i w j )) (2) To use the quasi-monomial formalism in the same way as in the previous section, we need to unify our formalism to use a single set of variables. We therefore dene the following variable u = (x; w) T : u i = x i ; with i = ; : : : ; n u n+i = w i ; with i = ; : : : ; n If the matrices A () and () contain respectively all the coecients and all the exponents of the dierent monomials, we can write the dynamics of u in 6

18 quasi-monomial form: _u i = u i? m X A () ij 2nY k= u () jk k : To build the matrices A () and (), we have to look at all the monomials in (2). If we rst focus on the terms w i? w j and w i w j, which already appeared in Section 3, we see that we have exactly the same contribution in the matrix () as in the case of the dynamical perceptron. We have the same cancellation of the factors of the terms in w i? w j or w i w j as in Section 3. They will contribute to the matrix () with a rst sub-matrix equal to. Notice that the left part of the matrix () is for powers of x while the right part is for powers of w. The next blocks take respectively into account the monomials of the form x i? w i ; x i w j? ; and x i w j. None of these terms ever adds up or cancels out, which explains the plain diagonal and column structure of these sub-matrices. We have () =?I nn I nn I nn?k () O rn. I nn.?k (n) I nn K (). I nn. K (n) A ; where we dene K (i) as K (i) = O n(i?). O n(n?i) A : Similarly, we collect all the coecients of the monomials in A ().We place the contributions relating to the equations for the x variables in the top part of the matrix while we place the contributions for w in the bottom part of the matrix. The contributions for the monomials w i? w j and w i w j are exactly the same for the variables w as before; the rst bottom sub-matrix is therefore equal to A. These monomials have no inuence on the variables x and the rst top sub-matrix is therefore equal to zero; except in the case where w i? w j = corresponding to the? i terms in the equations for x, which generate a column of? i in the otherwise empty matrix Q. The other sub-matrices take correspondingly into account the coecients of the monomials of the form x? i w i ; x i w j? ; and x i w j. We thus have A () = Q nr I nn O nn : : : O nn O nn : : : O nn A O nn ;?L () : : :?L (n) L () : : : L (n) 7

19 where we dene Q and L (i) as Q = O n(r?)?.? n A ; L (i) = O (i?)(n?) A i : : : A in n O (n?i)(n?) A : Appendix : hoice of the activation function We chose the activation function as (x) = tanh(x) for simplicity. ut we are in not directly limited to this activation function. The essential property that we used was the fact that (x) =? 2 (x). For the logistic sigmo d (x) = =(+e?x ), we have (x) = (x)(?(x)) and we can do the calculations again without diculty. The condition that the activation function has to satisfy is that its derivative be a sum of quasi-monomials of itself. If we do not require saturation at plus and minus innity, we can use (x) = e x, but the system is not globally Lipschitz and the solution might blow-up in nite time, which is an annoyance. A square function (x) = x 2 suers from the same problem. The hyperbola (x) = =x gives (x) =?= 2 (x), but then the vector eld is not even bounded, which is even worse. In general, we could consider the P condition (x) = L((x)), where L(:) is a nite Laurent series L(y) = (=y m n ) a k= ky k, or even more generally, a Puiseux series. For the Laurent series, the solution of the dierential equation is given by separation of variables m (x) d(x) = L((x)) dx. We can then expand the term m (x)=l((x)) as a polynomial plus a sum of simple fractions, and integrate them. This integration produces a rational function plus a sum of logarithmic terms and arctangent terms and (x) is found by inverting this relation. This inversion will not lead to an explicit solution in general. If we further require that the vector eld be globally Lipschitz to guarantee existence of the trajectories for all times, the possible solutions become scarce. From the most frequently used activation functions, we have been able to identify only the hyperbolic tangent and the logistic sigmo d as satisfying these conditions. 8

Departement Elektrotechniek ESAT-SISTA/TR Dynamical System Prediction: a Lie algebraic approach for a novel. neural architecture 1

Katholieke Universiteit Leuven Departement Elektrotechniek ESAT-SISTA/TR 1995-47 Dynamical System Prediction: a Lie algebraic approach for a novel neural architecture 1 Yves Moreau and Joos Vandewalle