Departement Elektrotechniek ESAT-SISTA/TR Dynamical System Prediction: a Lie algebraic approach for a novel. neural architecture 1

Katholieke Universiteit Leuven Departement Elektrotechniek ESAT-SISTA/TR 1995-47 Dynamical System Prediction: a Lie algebraic approach for a novel neural architecture 1 Yves Moreau and Joos Vandewalle 2 May 1993 Accepted for publication in Annals of Mathematics and Articial Intelligence 1 This report is available by anonymous ftp from ftp.esat.kuleuven.ac.be in the directory pub/sista/moreau/reports/manna95 tr95-47.ps.z 2 ESAT - Katholieke Universiteit Leuven, Kardinaal Mercierlaan 94, 3001 Leuven (Heverlee), Belgium, tel 32/1/32 17 09, fax 32/1/32 19 8, email: moreau@esat.kuleuven.ac.be.

Abstract In this paper we study the problem of the prediction of autonomous continuous-time dynamical systems from discrete-time measurements of the state variables. From the fact that the system is inherently continuous in time but that the measurements are made at discrete time-intervals, we show that the predictor of such a system needs to be an invertible map from the state-space into itself. The problem then becomes one of how to approximate invertible maps. We show that standard approximation schemes do not guarantee the property of invertibility. We therefore propose a new approximation scheme which is based on the composition of invertible basis functions and which preserves invertibility. This approach can be cast in a Lie algebraic framework where the approximation is based on the use of the Baker-Campbell-Hausdor formula. The method can be implemented in a neural-like form and we illustrate it by an example. We also present a more general implementation which we call "MLP in dynamics space". Keywords: dynamical system prediction, Lie algebras, Baker-Campbell-Hausdor formula

1 Introduction We will study here the problem of the prediction of nonlinear dynamical systems from a set of sampled measurements of the state of the system. The nonlinear systems which we will consider are those nonlinear autonomous dynamical systems whose dynamics is continuous in time and therefore governed by a multidimensional ordinary dierential equation. However, although the dynamics of the system is continuous in time, the samples that we will use for prediction are the state variables measured at discrete time-intervals only. So, we are led to postulate a discretetime form for our prediction: ^x(k + 1) = F (x(k)). Using the fact that the underlying dynamics of the system is time-continuous, we will show that the function F used for prediction should be invertible. The problem then becomes that of building an approximation scheme specically designed for invertible maps. The main drawback of standard approximation schemes is that they are based on summing basis functions to build an approximation. However, this operation does not preserve invertibility. Using the operation of composition instead of addition, we are able to preserve invertibility. We will thus build approximation schemes based on composition of invertible basis functions. Those compositions can be presented in the framework of Lie algebra theory. Starting from a method from numerical analysis we will be able to derive a new approach to the prediction of nonlinear systems. This new method can easily be presented in a neural-like form. This work is organized as follows. In the rst section we will recall the basic concept of dynamical system theory and we will show that a predictor of a dynamical system should be invertible. In the second section we will present the elements of Lie algebra theory and how they can be applied, using the Baker-Campbell-Hausdor formula, to build approximation schemes based on compositions. In the third section we will present the network implementation of our model which we will illustrate by an example. We will also present a general model for approximation which we call "MLP in dynamics space". Finally, we will conclude. 1.1 Dynamical systems A dynamical system is a rule for the evolution over time of a set of variables. This set of variables is called the state of the system. Following the general dierential geometric denition of dynamical systems [2], the state will be identied here with some point on a manifold M. We will consider here only those systems for which the state evolves according to some ordinary dierential equation _x(t) = f(x(t)). The function f is the vector eld of the system and it associates to each point of the manifold M an element of the tangent space T x M. A solution to the ODE is a trajectory of the system. Such a system can thus be written as: M a manifold x(t) 2 M for all t f : M! T x M _x(t) = f(x(t)) (1) 1

1.2 The initial value problem and the existence, uniqueness and smoothness of solutions The initial value problem (IVP) simply consists in nding the trajectory passing through a given initial state x 0 : ( _x(t) = f(x(t)) x(0) = x 0 (2) A sucient condition for the existence of a unique solution to the IVP is that the vector eld f be everywhere locally Lipschitz [4]. The solution might however be dened only for small enough positive and negative times as the trajectory might escape from the manifold after some time. If we require the vector eld to be C r (1 r 1), the solution x(t) will be C r+1 with respect to time t and C r with respect to a change in initial condition x 0. From here on we will always assume the vector eld to be at least locally Lipschitz and we will assume further smoothness wherever necessary. 1.3 The ow of a dynamical system For each initial condition x 0, we can solve the initial value problem. We then decide to collect all those solutions in one function (x 0 ; t) which gives the solution of the ODE as a function of the initial condition: 8 > < >: _x = f(x(t)) ) x(t) = (x 0 ; t) (3) x(0) = x 0 is called the ow of the dynamical system. The ow is a remarkable object which possesses the group property. We refer the reader to [4] for a general presentation of the group theoretic approach to dynamical systems. If we assume, for simplicity, that the ow is dened for all times, the group property writes as follows: : R M! M (x; t + s) = ((x; t); s) 8 t; s 2 R (4) As we will see, this gives to the ow the structure of a group. We rst dene the time-t map as: ' t (x 0 ) (x 0 ; t) (5) This map ' t associates to any initial condition its image under the ow of the dierential equation after a given time t. So, we have on one hand a trajectory which is a way of looking at the evolution of a given initial condition for all possible times. And we have on the other hand a time-t map which is a way to look at the evolution of all possible initial conditions but only for one given time.the map ' t is a map from M into itself. Using the group property of the ow, we can deduce the properties of ': ' t ' s (x 0 ) = ((x 0 ; s); t) = (x 0 ; t + s) = ' t+s (x 0 ) ' 0 = I ( ' 0 (x 0 ) = (x 0 ; 0) = x 0 = I(x 0 ) ' t '?t (x0) = (x 0 ; 0) = x 0 = I(x 0 ) ) '?t = (' t )?1 composition identity inverse () 2

Moreover, if f is C r then ' is C r. Hence, ' is a smooth invertible map with a smooth inverse, that is, ' is a dieomorphism. Furthermore, the properties of ' show that f' t g is a group. We therefore say that a ow (x; t) is a one-parameter group of dieomorphisms ' t. 1.4 Dynamical system prediction Even if we assume that the underlying dynamics of the system is governed by an ODE, the system is generally observed only at discrete time-intervals. So, if we choose a sampling period t, we can assume a functional relationship between the current state and the next sample. The system becomes of the form: x((k + 1):t) = F (x(k:t)); x(k) 2 M (7) And, from what we have shown before, we can see that: Hence, F is a dieomorphism. F (:) = ' t (:) (8) The following gure (Fig. 1) summarizes those observations. On the left we consider a square grid of initial conditions. We then let each point of that grid follow its trajectory for a timeinterval t. After this time, all the points of the grid have moved and the shape of the grid has changed. The dieomorphism ' t is that transformation from the manifold (here, R 2 ) into itself which maps the square grid on the left into the deformed grid on the right. One can immediately see that this transformation is smooth and invertible, hence a dieomorphism. ' t (x 0 ) = (x 0 ; t) Figure 1: Dieomorphic transformation of a square grid after one time-step. So, the question of predicting a continuous-time system observed at discrete time-intervals becomes one of how we can approximate those dieomorphisms which arise from the sampling of an ODE. The basic idea is that the set of all C r dieomorphisms is also a group. In a group, the natural operation is the composition of the elements of the group. While most standard approximation schemes are based on the summation of some basis functions, this operation does not preserve the property of being a dieomorphism. On the other hand, composition does preserve this property, so we propose to build an approximation scheme based on composing basis 3

functions instead of adding them. Schematically, one could caricature this by saying that the transformation in Fig. 1 can be approximated by a sequence of two successive transformations. The rst transformation could for example pinch the left and right sides of the square and then be followed by a second one which would be a clockwise rotation of about 15 degrees. 2 Lie algebra theory Lie algebra theory has a long history in physics, mostly in the areas of classical mechanics [1] and partial dierential equations [8]. It is also an essential part of nonlinear system theory [5]. Our approach to system prediction can be best cast in this framework. A Lie algebra is dened as follows: Denition.1 let V be a vector space; V is a Lie algebra; with Lie bracket [:; :] : V V! V i for all A; B; A 1 ; A 2 ; C 2 V : [A; B] =?[B; A] antisymmetry [a:a; B] = a:[a; B] [A 1 + A 2 ; B] = [A 1 ; B] + [A 2 ; B] bilinearity [[A; B]; C] + [[B; C]; A] + [[C; A]; B] = 0 Jacobi identity (9) In the case where the Lie algebra is the vector space of all C r vector elds, the time-t map ' t plays a very important role. If the vector eld is A, we dene the exponential map as follows: exp(t:a) ' t. This notation is an extension of the case where the vector eld is linear in space and where the solution is given by exponentiation of the matrix. The exponential is thus a mapping from the manifold into itself. And this map depends on the underlying vector eld A and changes over time. From here on, the product of exponentials will denote the composition of the maps. One can then dene the Lie bracket by: [A; B] = @2 exp(?s:b): exp(?t:a): exp(s:b): exp(t:a) (10) @s:@t t=s=0 The bracket [A; B] is called the commutator of the vector elds as it measures the degree of noncommutativity of the ows of the vector elds ( [A; B] = 0, exp(a): exp(b) = exp(b): exp(a)). In the case where the manifold is R n, the bracket can be further particularized to: nx [A; B] i = B i : @A j? A i : @B j (11) @x i @x i i=1 4

2.1 The Baker-Campbell-Hausdor formula The Baker-Campbell-Hausdor (BCH) formula gives an expansion for the product of two exponentials of elements of the Lie algebra, see []. exp(a): exp(b) = exp(a + B + 1 2 [A; B] + 1 1 ([A; [A; B]]? [B; [B; A]]) + [A; [B; [B; A]]] + : : :) 12 24 (12) The problem of predicting a dynamical system with vector eld X then becomes that of building an approximation for exp(t:x) as we have that: x(k + 1) = exp(t:x)[x(k)] (13) This problem has recently been the focus of much attention in the eld of numerical analysis for the integration of Hamiltonian dierential equations [] [7]. Suppose that the vector eld X is of the form X = A + B where we can integrate A and B directly. Then one can use the BCH formula to produce a rst-order approximation to the exponential map: BCH : exp(t:x) = exp(t:a): exp(t:b) + o(t 2 ) (14) This is the essence of the method as it shows that one can approximate an exponential map (that is the mapping arising from the solution of an ODE) by composing simpler maps. By further manipulating the BCH formula, we can show that the following leapfrog scheme is second-order. Leapfrog : exp(t:x) = exp( t :A): exp(t:b): exp(t 2 2 :A) + o(t3 ) (15) Using this leapfrog scheme as a basis element for further leapfrog schemes. Yoshida [9] showed that it was in fact possible to produce an approximation to exp(t:x) up to any order. Order p : 9k; 9w 1 ; v 1 ; : : :; w k ; v k : exp(t:x) = exp(w 1 :t:a): exp(v 1 :t:b) : : : exp(w k :t:a): exp(v k :t:b) + o(t p+1 ) (1) Forest and Ruth [3] showed that approximations could be built for more than two vector elds. Combining the two we can state that it is possible to build an approximation to the solution of a linear combination of vector elds as a product of exponential maps. X = P m i=1 a i :A i ) 9w ij : exp(t:x) = Q k j=1 Q mi=1 exp(w ij :t:a i ) + o(t p+1 ) (17) 3 Network implementation Such an approximation scheme can be easily implemented as a network as can be seen in the following gure (Fig. 2). The problem of predicting the system can now simply be solved by minimizing the prediction error of the model by tuning the weights w ij. Moreover, gradients can easily be computed which allows for a gradient-descent, backpropagation-like implementation or any other gradient based optimization method. 5

^x(k + 1) exp(w 22 A 2 ) exp(w 21 A 1 ) exp(w 12 A 2 ) exp(w 11 A 1 ) x(k) w 22 w 21 w 12 w 11 Figure 2: Implementation of the composition network. 3.1 Example We will now present an example to illustrate how this method can be applied. We will do this by looking at the Vanderpol oscillator. This system is described by the following second-order nonlinear dierential equation: x? :(1? x 2 ): _x + x = 0 (18) It can be directly written as a rst-order system in state-space form: ( _x = y _y =?x + :(1? x 2 ):y This system can be seen as a linear combination of two vector elds which can be solved analytically:!!! _x y 0 = +: _y?x (1? x 2 ):y (20) = A 1 + :A 2 One can then build a predictor according to the architecture just presented (Fig. 2), simply choosing an appropriate number of layers. One then nds the appropriate weights w ij by minimizing the error on the training set. One interesting property of this method is that we were able to use the a priori knowledge we had about the system. Such a feature is uncommon in the eld of system prediction using neural networks. This method allowed us to build a predictor for the Vanderpol oscillator in the case where the parameter was unknown. Using the same method we were able to build a predictor for the Lorentz attractor in the case where the parameters were also unknown. (19)

3.2 MLP in dynamics space If we decide to approximate the vector eld of our system by a multi-layer perceptron, one can derive a composition network to implement this. The form of the vector eld is: X(x) = nx j=1 ~c j :( ~ b j :x + b j0 ) = X j=1 where (x) = tanh(x) and ~c j ; ~ b j 2 R d+1 ; j = 1; :::; n. A ~c j; ~ b j j (x) (21) But the dierential equation _x = (x) can be solved explicitly in the one-dimensional case. We can use this to explicitly integrate the multidimensional system _x = A ~c;~ b (x) for any value of the parameters ~c j ; ~ b j. So, we can design a network of the following form: ^x(k + 1) = F (x(k)) = exp(w k :A ~c k; ~ b k ) : : : exp(w 1 :A ~c 1; ~ b 1 )(x(k)) (22) We call this an \MLP in dynamics space" as the MLP is implicitly used to parametrize the vector eld. Unfortunately, preliminary simulations have shown so far that it is very hard to nd the optimal coecients of such a model using only brute force gradient optimization. Further insight is needed in order to nd less computationally demanding methods for the training of such a network. 4 Conclusions We have studied the problem of predicting an autonomous continuous-time nonlinear dynamical system from discrete-time measurements of its state. We showed that because the predictor must have the form of an invertible map, it is interesting to have an approximation scheme which has this property. This can be achieved by using compositions instead of additions in the approximation. The theory of Lie algebras provided a framework to develop such a method, in a fashion similar to what is done for some numerical methods of integration of ordinary dierential equations. We showed that this resulted in a neural-like architecture which we illustrated by a simple example, and that it could be further extended to some form of "MLP in dynamics space". Further work will be primarily devoted to practical applications of this scheme, especially to the design of ecient training algorithms. Acknowledgments This research work was carried out at the ESAT laboratory of the Katholieke Universiteit Leuven, in the framework of a Concerted Action Project of the Flemish Community, entitled Model-based Information Processing Systems GOA-MIPS, and within the framework of the Belgian program on inter-university attraction poles (IUAP-17) initiated by the Belgian State, Prime Minister's Oce for Science, Technology and Culture, and within the framework of the European Commission Human Capital and Mobility SIMONET (System Identication and Modeling Network). The scientic responsibility rests with its authors. Yves Moreau is a research assistant of the N.F.W.O. (Belgian National Fund for Scientic Research). 7

References [1] V.I. Arnold, Mathematical methods of classical mechanics., Springer-Verlag, New York, 1989. [2] D.R.J.Chillingworth, Dierential topology with a view to applications., Pitman Publishing, London, 1977. [3] E. Forest, R. Ruth, Fourth-order symplectic integration, Physica D, 43 (1990) 105-117. [4] M.C. Irwin, Smooth Dynamical Systems., Academic Press, New York, 1980. [5] A. Isidori, Nonlinear control theory, Springer-Verlag, New York, 1989. [] P.-V. Kosele, Calcul formel pour les methodes de Lie en mecanique hamiltonienne, Ph.D. thesis, Ecole Polytechnique, Paris, 1993. [7] R.I. McLachlan, On the numerical integration of ordinary dierential equations by symmetric composition methods, SIAM J. Sci. Comput., 1(1) (1995) 151-18. [8] P.J Olver, Applications of Lie groups to dierential equations., Springer-Verlag, New York, 1985. [9] H. Yoshida, Construction of higher order symplectic integrators, Phys. Lett. A, 150 (1990) 22-29. 8