Abstract An algorithm for Iterative Learning Control is developed based on an optimization principle which has been used previously to derive gradient

Size: px

Start display at page:

Download "Abstract An algorithm for Iterative Learning Control is developed based on an optimization principle which has been used previously to derive gradient"

Jayson Melton
6 years ago
Views:

1 ' $ Iterative Learning Control using Optimal Feedback and Feedforward Actions Notker Amann, David H. Owens and Eric Rogers Report Number: 95/13 & July 14, 1995 % Centre for Systems and Control Engineering, University of Exeter, North Park Road, Exeter EX4 4QF, Devon, United Kingdom. For more information on Centre activities contact Professor D.H.Owens Tel: /263628/ Fax: Research funded by the UK Engineering and Physical Sciences Research Council under contract No. GR/H/48286

2 Abstract An algorithm for Iterative Learning Control is developed based on an optimization principle which has been used previously to derive gradient type algorithms. The new algorithm has numerous benets which include realization in terms of Riccati feedback and feedforward components. This realization also has the advantage of implicitly ensuring automatic step size selection and hence guaranteeing convergence without the need for empirical choice of parameters. The algorithm is expressed as a very general norm optimization problem in a Hilbert space setting and hence, in principle, can be used for both continuous and discrete time systems. A basic relationship with almost singular optimal control is outlined. The theoretical results are illustrated by simulation studies which highlight the dependence of the speed of convergence on parameters chosen to represent the norm of the signals appearing in the optimization problem.

3 Contents 1 Introduction 1 2 Norm Optimal Iterative Learning Control Problem Formulation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Learning Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The Proof of Convergent Learning : : : : : : : : : : : : : : : : : : : : : : : : : : The Convergence of the Input Sequence : : : : : : : : : : : : : : : : : : : : : : : Relaxation and Almost Singular Optimal Control : : : : : : : : : : : : : : : : : 9 3 Iterative Learning Control for Linear, Continuous State-space Systems Problem formulation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The Iterative Learning Control Algorithm : : : : : : : : : : : : : : : : : : : : : Discussion of Design Parameters : : : : : : : : : : : : : : : : : : : : : : : : : : Simulation Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 4 Conclusions 17 List of Figures 1 Sequence of actual experiment and numerical simulations : : : : : : : : : : : : : 12 2 Simulation of the plant from [4], showing dependence on. : : : : : : : : : : : : 14 3 The error after twenty simulations for dierent values of. : : : : : : : : : : : 15 4 Graphs of the value of the performance criterion for dierent values of. : : : : 15 5 Simulation of a pendulum. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 i

4 1 Introduction Iterative Learning Control considers systems that repetitively perform the same task with a view to sequentially improving accuracy. Examples of this idea can e.g. be found in [4, 7, 10, 12, 15, 16, 17, 18] and includes the general area of trajectory following in robotics. The specied task is regarded as the tracking of a given reference signal r(t) or output trajectory for an operation on a specied time interval 0 t T. It is important to note that feedback control cannot, by its very nature, achieve this exactly as a non-zero error is required to activate the feedback mechanism. The objective of Iterative Learning Control is to use the repetitive nature of the process to progressively improve the accuracy with which the operation is achieved by changing the control input iteratively from trial to trial. Improvements in performance correspond intuitively to reductions in the (point-wise, peak or average) dierence between the desired reference signal and the actual output of the system in a trial. Improving performance is the objective of the control strategy and this can only be achieved by using available data from the process in an eective manner. As the Iterative Learning Control process is, by denition, iterative, this means that signals/measurements from previous trials are the natural choice of data for use in the construction of control inputs for the present trial. The control system is said to \learn" by remembering the eectiveness of previously tried inputs and using information on their success or failure to construct new trial control input functions. The learning mechanism is iteration and what is learned is the control input signal u 1 (t) that ensures that the systems output y(t) is exactly equal to the specied reference trajectory r(t) at all points in time t 2 [0; T ]. In contrast to adaptive schemes, Iterative Learning Control does not attempt to explicitly identify the plant, but changes (or adapts) only the control input. This \adaption" or updating takes place after each trial, not after each time step as in adaptive control. The technical diculty of Iterative Learning Control lies in the two-dimensionality (in the mathematical sense) of the overall system [17] and the need for consequent changes in methods of analysis and thinking including the ideas of causality and stability. The two dimensions are the trial index k (discrete) and the elapsed time t (continuous or discrete) during a trial. It is obviously desirable to have notions of stability with respect to both dimensions in a precisely dened sense (see [19] for some related ideas in the theory of repetitive dynamical systems). Whilst stability in the t-direction has the simple and standard interpretation, the stability in the k-direction is taken to be equivalent to convergence of the Iterative Learning Control algorithm in a precisely dened sense (see below). As the dierent notions of causality, stability and convergence places Iterative Learning Control outside of the traditional realm of control theory, it is important to study it as a subject area in its own right. Iterative Learning Control was originally introduced in 1984 by Arimoto et al. [3, 4] who presented an algorithm that generated the new trial control input by adding a \correction" term to the control input of the previous trial. This control increment was calculated from previous trial tracking error data. He also derived convergence conditions for this algorithm in terms of the state-space matrices of the plant. Iterative Learning Control has since then been further explored using similar techniques and ideas but is still underdeveloped. Various update algorithms and corresponding convergence conditions have been proposed, considering all kinds of systems: time-invariant or time-variant, linear or nonlinear and especially the particular problems of mechanical systems, as seen, for example, in robotic manipulators. Robotics is 1

5 a particularly important application area for Iterative Learning Control. A recent textbook about Iterative Learning Control [15] includes a literature survey until A signicant distinction is whether linear or nonlinear systems are considered. Studies of nonlinear systems tend to specialise the analysis to suit specic characteristics of the systems and are often based on detailed assumptions on these, e.g. the particular characteristics of mechanical systems. On the other hand, considering linear systems in their generality allows the use of \classical" control theory for analysis and design. It can also be argued that the time-varying linearisation of a nonlinear system is a good approximation to that system on any one trial, as one of the basic principles of Iterative Learning Control is that the control input is changed by an incremental control at each trial and the trajectory of the previous trial oers itself as point of linearisation for the next trial. The eects of the interaction between the two dierent dynamics of Iterative Learning Control systems are central to the problem but are not yet fully understood. In particular, the full power of systems and control theory has not yet been used in the development and analysis of a full range of algorithms and the eects of systems dynamical structure of Iterative Learning Control performance is, as yet, relatively unexplored. In this paper, a new convergent Iterative Learning Control approach is developed that can be realized in terms of current trial feedback mechanisms combined with feedforward of previous trial data. The approach is based on splitting the two-dimensional dynamics into two separate one-dimensional dynamics. This is done by introducing a performance criterion as the basis of specifying the control input to be used on each trial. The algorithm uses the criterion to evaluate the performance of the system on a given trial by \averaging over time" and hence removing the dimension of time from the analysis. The performance criterion is then used to construct and solve an optimization problem whose solution is the proposed control input for the new trial. The optimization problem is solved rstly at the abstract level using operator theory. These results are then converted in an illustrative and important case of practical interest into a well-known optimal tracking problem solvable by familiar Riccati methods. Although these optimization methods lead to, what appears, in the standard mathematical sense, to be a non-causal representation of the solution, it is noted that the solution is, in fact, causal in the Iterative Learning Control context as it can be represented by a causal Riccati feedback of current trial data plus a feedforward component obtained from previous trial records. The feedback component of the solution representation opens up the possibility of enhancing robustness of the algorithm to plant modelling errors. The detailed analysis of this topic will be the subject of future research and publications. The use of optimality criteria in Iterative Learning Control is not new to this paper. Furuta and Yamakita [10] have used a steepest-descent algorithm to minimize the L 2 [0; T ] norm of the tracking error. Their approach also takes the reference signal into account and is (as a steepest-descent optimization method) guaranteed to converge provided that the `step length' is judiciously chosen on each trial. It diers from the approach here in that it only uses the error recorded in the previous trial to generate the new trial input. Hence, their results are of a pure feedforward type and consequently can be expected to suer from a lack of robustness in practice. The results presented in this paper represent an improvement on their algorithm with the added bonus that convergence is guaranteed without the need to choose any step length parameters. In [7], an optimization problem related to the one in this paper is proposed. Because it is numerically more involved, it must be solved iteratively, leading to a dierent and more complicated scheme than proposed here. It also does not make use of the current error 2

6 and hence does not have a feedback form. The outline of the next sections is as follows. In section 2, the mathematical problem formulation and the proposed learning algorithm are shown. Its main properties are derived and the relation to almost singular optimal control is discussed. In the next section, the algorithm for linear, time-varying continuous plants is presented, together with a discussion of the design parameters and illustrative simulation results. 2 Norm Optimal Iterative Learning Control In this section, the Iterative Learning Control algorithm is formulated in the general form using operator methods from functional analysis in Hilbert space. The proof of the convergence of the algorithm is presented and a number of useful properties of the method explored. 2.1 Problem Formulation The mathematical denition of Iterative Learning Control used in this paper has the following general form: Denition 1 Consider a dynamic system with input u and output y. Let Y and U be the output and input function spaces respectively and let r 2 Y be a desired reference trajectory from the system. An Iterative Learning Control algorithm is successful if and only if it constructs a sequence of control inputs fu k g k0 which, when applied to the system (under identical experimental conditions), produces an output sequence fy k g k0 with the following properties of convergent learning lim k!1 y k = r ; lim k!1 u k = u 1 (1) Here convergence is interpreted in terms of the topologies assumed in Y and U respectively. Note that this general description of the problem allows a simultaneous description of linear and nonlinear dynamics, continuous or discrete plant and time-invariant or time varying systems. Let the space of output signals Y be a real Hilbert space and U also be a real (and possibly distinct) Hilbert space of input signals. The respective inner products (denoted by h; i) and norms k k 2 = h; i are indexed in a way that reects the space if it is appropriate to the discussion e.g. kxk Y denotes the norm of x 2 Y. The Hilbert space structure induced by the inner product is essential in what follows but is not restrictive, e.g. choosing Y as the space L 2 [0; T ] of square integrable functions permits the analysis of continuous systems whilst the choice of Y as the space `2 of square summable data sequences enables the analysis of discrete time systems. The dynamics of the systems considered here are assumed to be linear and represented in operator form as y = Gu + z 0 (2) where G : U! Y is the system input/output operator (assumed to be bounded and is typically a convolution operator) and z 0 represents the eects of system initial conditions. If r 2 Y is 3

7 the reference trajectory or desired output then the tracking error is dened as e = r? y = r? Gu? z 0 = (r? z 0 )? Gu (3) Hence without loss of generality, it is possible to replace r by r? z 0 and thence assume that z 0 = 0. It is clear that the Iterative Learning Control procedure, if convergent, solves the problem r = Gu 1 for u 1. If G is invertible, then the formal solution is just u 1 = G?1 r. A basic assumption of the Iterative Learning Control paradigm is that the direct inversion of G is not acceptable. Inversion of a dynamical system is regarded as an impractical solution because it requires exact knowledge of the plant and involves derivatives of the reference r. This high-frequency gain characteristic would make the approach sensitive to noise and other disturbances. Furthermore, it is argued that inversion of the whole plant G is unnecessary as the solution only requires nding the pre-image of the specic signal r under G. The problem can easily be seen to be equivalent to nding the minimizing input u 1 for the optimization problem minfkek 2 : e = r? y; y = Gug (4) u The optimal error kr? Gu 1 k 2 is a measure for how well the Iterative Learning Control procedure has solved the inversion problem. It also represents the best the system can do in tracking the signal r. The case of interest here is when the optimal error is exactly zero, i.e. when u 1 is a solution of r = Gu 1 and hence solves the Iterative Learning Control problem. The optimization problem (4) can be interpreted as a singular optimal control problem [6, 8, 21] that, by its very nature, needs an iterative solution. This iterative solution is traditionally seen as a problem in numerical analysis but, in the context of this paper, it is seen as an experimental procedure. The dierence between the two viewpoints is the fact that an experimental procedure has an implicit causality structure that is not naturally there in numerical computation. Causality for Iterative Learning Control systems is dened in section Learning Algorithm There are an innity of potential iterative procedures to solve optimization problem (4). The gradient approach has the simplest form and has been investigated in Iterative Learning Control elsewhere [10]. The gradient based Iterative Learning Control algorithm generates the control input to be used on the (k + 1) th trial from the relation u k+1 = u k + k+1 G e k (5) where G : Y! U is the adjoint operator to G and k+1 is a step length to be chosen at each iteration. The approach suers from the need to choose a step length and the feedforward structure of the iteration which takes no account of current trial eects including disturbances and plant modelling errors. The improved approach taken in this paper is to develop in detail a new algorithm with the following two important properties of 1. automatic choice of step size; and 4

8 2. potential for improved robustness through the use of causal feedback of current trial data and feedforward of data from previous trials. More precisely, the algorithm proposed here, on completion of the k th trial, calculates the control input on the (k + 1) th trial as the solution of the minimum norm optimization problem u k+1 = arg min u k+1 fj k+1 (u k+1 ) : e k+1 = r? y k+1 ; y k+1 = Gu k+1 g (6) where the \performance index" or optimality criterion used is dened to be J k+1 (u k+1 ) = ke k+1 k 2 Y + ku k+1? u k k 2 U (7) The initial control u 0 2 U can be arbitrary in theory but, in practice, will be a good rst guess at the solution of the problem. The problem can be interpreted as the determination of the (k + 1) th trial control input as an input that reduces the tracking error in an optimal way whilst not deviating too much from the control input used on the k th trial. The relative weighting of these two objectives can be absorbed into the denitions of the norms in Y and U in a manner that will become more apparent in what follows. The benets of this approach are immediate from the simple interlacing result ke k+1 k 2 J k+1 (u k+1 ) ke k k 2 8k 0 (8) which follows from optimality and the fact that the (non-optimal) choice of u k+1 = u k would lead to the relation J k+1 (u k ) = ke k k 2. The result states that the algorithm is a descent algorithm as the norm of the error is monotonically non-increasing in k. Also, equality holds if, and only if, u k+1 = u k, i.e. when the algorithm has converged and no more input-updating takes place. The controller on the (k + 1) th trial is obtained from the stationarity condition, necessary for a minimum, by Frechet dierentiation of (7) with respect to u k+1 to be u k+1 = u k + G e k+1 8k 0 (9) This equation is the formal update relation for the proposed new Iterative Learning Control algorithm. Using e = r? Gu then gives the tracking error update relation and the recursive relation for the input evolution e k+1 = (I + GG )?1 e k 8k 0 (10) u k+1 = (I + G G)?1 (u k + G r) 8k 0 (11) This last relationship is a form of Levenberg-Marquardt [14] or modied Newton iteration which is familiar in nite dimensional problems but, in this case, could equally well apply to an innite dimensional problem. The algorithm has a number of other useful properties. For example, monotonicity immediately shows that the following limits exist lim k!1 ke kk 2 = lim k!1 J k(u k ) := J 1 0 (12) 5

9 The existence of the limits suggests that the algorithm has a form of convergence property. The details are developed below. An inductive argument and the inequality kyk kgkkuk also yields the relations X k0 and hence ku k+1? u k k 2 < ke 0 k 2? J 1 < 1; X k0 ke k+1? e k k 2 < kgk 2 (ke 0 k 2? J 1 ) < 1 (13) lim k!1 ku k+1? u k k 2 = 0; lim k!1 ke k+1? e k k 2 = 0 (14) These equations provide another indication of the possibility of convergence. Eqn. (14) shows that the algorithm has an implicit choice of step size as the incremental input converges to zero. This asymptotic slow variation is a prerequisite for convergence. Furthermore, the summation of the energy costs from the rst to the last trial is bounded, as indicated by (13). This implicitly contains information on convergence rates. The proof that the algorithm actually leads to convergent learning is given next. 2.3 The Proof of Convergent Learning The main new result of this paper on convergence of learning is as follows where the notation R(A) is used to denote the range of an operator A. Theorem 1 (Convergence in norm to zero) If either r 2 R(G) or R(G) is dense in Y, then the Iterative Learning Control tracking error sequence fe k g converges in norm to zero in Y, i.e. the Iterative Learning Control algorithm has guaranteed convergence of learning. The proof of the convergence in norm begins with the following lemma concerning convergence in the weak topology in Y. Lemma 2 (Weak convergence to zero) The sequence fe k g converges weakly to zero in the range of G. If the range of G is dense in Y, then fe k g converges weakly to zero in Y, i.e. 8z 2 Y, lim k!1 hz; e k i = 0. (Note: It also follows that the convergence also occurs in the closure of R(G) in Y as this subspace is also a Hilbert space with the same inner product as Y. This is however primarily a technical observation.) Proof: Let u 2 U be arbitrary. Also, note from the property of asymptotically slow variation (14) that u k+1? u k! 0 in norm as k! 1. Hence 0 = lim k!1 hu; u k+1? u k i U = lim k!1 hu; G e k+1 i U = lim k!1 hgu; e k+1i Y : (15) Eqn. (15) means that hy; e k i Y! 0 as k! 1 for all y 2 R(G) as required. To prove the second part of the lemma, this property must be extended to all elements of Y. For this, let ~y 2 Y and " > 0 be arbitrary. The range of G is by assumption dense. This means 6

10 that there is a y in the range of G such that k~y? yk < ". The use of simple inequalities and the monotonicity of the error sequence then yields jh~y; e k ij = jhy; e k i + h~y? y; e k ij jhy; e k ij + k~y? ykke k k jhy; e k ij + "ke k k jhy; e k ij + "ke 0 k : (16) The limit then satises lim k!1 sup jh~y; e k ij "ke 0 k. The result now follows as " was assumed to be arbitrary. 2 (Note: For nite dimensional spaces, weak convergence is equivalent to convergence in norm so the Lemma proves the main result in this case with no extra eort. This situation includes that of sampled data systems.) It is now possible to prove the main result, Theorem 1, as follows: Proof: From (9) and (7) it follows that J k = ke k k 2 Y + kg e k k 2 U = he k ; (I + GG )e k i Y : (17) Dene the self-adjoint operator H = (I + GG ). By induction from (10) e k = H i e k+i. Applying this relation twice gives J k = he k ; e k?1 i = hh?k e 0 ; H k e 2k?1 i = he 0 ; e 2k?1 i : (18) If r 2 R(G), then e 0 = r? Gu 0 is in the range of G. By writing e 0 = Gu, u 2 U the limit for J k in (18) can be obtained from (15) i.e. J k! 0 as k! 1. Alternatively, if R(G) is dense in Y then the argument in the proof of Lemma 2 yields a relation of the form lim sup k!1 jhe 0; e 2k?1 ij "ke 0 k (19) for arbitrary ". In both cases, it follows that lim k!1 J k = 0 and from (12) follows then that lim k!1 ke k k 2 Y = 0; i.e. the algorithm converges in norm to a terminal error of zero. 2 The guaranteed convergence together with the monotonicity of the tracking error sequence represent powerful properties of the algorithm. Note also that the abstract proof using the techniques of functional analysis enables the wide applicability of the Iterative Learning Control algorithm to both continuous and discrete, sampled data systems. The realization of this potential will nally rely on the conversion of the abstract results into a causal Iterative Learning Control algorithm, causal in the sense that it can be realized in the form of a sequence of experiments. This is not obvious as the relation u k+1 = u k + G e k+1, although apparently of a feedback form, suggests that the relationship is not causal. For example, R if G is the convolution operator in L m 2 [0; T ] (endowed with the inner product hw; vi L m 2 [0;T T = ] 0 wt (t)v(t)dt) described by the relation (Gu)(t) = R t 0 K(t? s)u(s)ds, then (G e)(t) = R T t K T (s? t)e(s)ds. This means that evaluation of G e k+1 requires knowledge of future values of tracking errors. Such data is not, of course, available in practice. The special causality structure of Iterative Learning Control allows however the transformation of the algorithm into a causal procedure for a given practical, causal plant, as done in detail for a case of practical interest in section 3. 7

11 2.4 The Convergence of the Input Sequence Some results on the convergence of the input sequence u k are given in this subsection. Firstly note that, from a mathematical point of view, the tracking error always goes to zero but this does not imply convergence of the input sequence in U unless this space is chosen appropriately. For example, consider the case where both Y and U are L 2 -type spaces (see section 3) and G has the form of a linear time invariant system described by a state space model. If the state initial condition x(0) does not generate an output that matches the value of r at t = 0, the required u 1 contains distributions such as the Dirac Delta function and hence the desired u 1 62 U. A proof that u k! u 1 in U is therefore impossible. As a consequence, the following convergence results in U are conditional on additional assumptions on the input sequence or on the plant. The latter is used here. Theorem 3 (General Convergence of the Input) The sequence fu k g k0 has the property that lim k!1 kg (r? Gu k+1 )k U = 0 (20) If, moreover, G G has a bounded inverse in U, the input sequence converges in norm to u 1 = (G G)?1 G r 2 U. If := 1=jjA?1 jj > 0 then the convergence is bounded by a geometric relation of the form ku k+1? u 1 k ku k? u 1 k Proof: It has been noted that the sequence fu k+1? u k g converges in norm to zero in U. The rst part of the result then follows trivially from the identity u k+1?u k = G e k+1 = G (r? Gu k+1 ). The nal part of the result follows easily in a similar manner by noting that, if G G has a bounded inverse, the sequence f(g G)?1 (u k+1?u k )g = f(g G)?1 G r?u k+1 g also converges to zero in U as required. The proof of the existence of the geometric bound is a standard calculation, based on the inequality hu; G Gui 2 hu; ui 8u 2 U; and is omitted for brevity. 2 The result does not imply convergence of the inputs without boundedness assumptions on the plant inverse or, more precisely, an assumption that 2 > 0. It is however possible to prove the following result: Theorem 4 (Boundedness and Weak Convergence) If the sequence fu k g k0 is bounded in U, the desired input u 1 2 U and G G has range dense in U, then fu k g k0 converges to u 1 in the weak topology in U. Proof: Write r = Gu 1 and u k+1? u k = G e k+1 = G (r? Gu k+1 ) = G G(u 1? u k+1 ). Let v 2 U be arbitrary and note that It follows that 0 = lim k!1 hv; u k+1? u k i = lim k!1 hv; G G(u 1? u k+1 )i (21) 0 = lim k!1 hg Gv; u 1? u k+1 i (22) and the result now follows from the denseness of the range of G G using a similar argument to that used in Lemma

12 As remarked before, for nite dimensional spaces, weak convergence is equivalent to convergence in norm so the theorem proves convergence in norm in this case with no extra eort. This situation includes that of discrete time systems. 2.5 Relaxation and Almost Singular Optimal Control To complete this section the algorithm is related to the concept of almost singular optimal control by the following analysis. Consider the modied Iterative Learning Control rule u k+1 = u k + G e k+1 (23) where is a relaxation parameter as used commonly in numerical analysis techniques to improve robustness of algorithms. It also is similar in its mathematical eect to the use of forgetting factors in self-tuning adaptive control. The choice of = 1 represents the situation in the previous sections. Using the input-output relation e = r? Gu in e k+1? e k then yields the recursion relation e k+1 = (I + GG )?1 (e k + (1? )r) (24) Theorem 1 has already proved convergence in norm of the error when = 1. Using the results of Owens, as described in Rogers and Owens [19] it is easy to prove that the modied Iterative Learning Control algorithm converges robustly if, and only if, jj < 1 to a (non-zero) limit error ~e 1 2 Y given by the formula ~e 1 = (I + GG 1? )?1 r (25) Using the plant equation, it follows that the input sequence also satises the recursion u k+1 = (I + G G)?1 (u k + G r) (26) A similar analysis based on the observation that the norm of the recursion operator (I + G G)?1 is just jj then shows that the input sequence converges in norm in U if, again, jj < 1 to the limit ~u 1 = ((1? )I + G G)?1 G r (27) This convergence rate is geometric with geometric constant equal to jj. The rst important observation is that the control input converges in norm if relaxation is used. The second observation is that, for convergence to a solution close to u 1, it is necessary for to be chosen to be close to (but slightly less than) unity. This is veried by a simple calculation that indicates that ~u 1 and ~e 1 are the solutions of the optimization problem min u f ~ J(u) = kek2 + (1? )kuk 2 : y = Gu; e = r? yg (28) The analysis of the previous sections indicates that it is possible to make kek 2 arbitrarily small with controls u 2 U and hence that the minimum value of ~ J goes to zero as! 1?. It is therefore possible to establish the following result: Theorem 5 (Relaxation and Approximation) Under the assumptions of Theorem 1, the Iterative Learning Control algorithm with the modied update rule u k+1 = u k + G e k+1 with jj < 1 converges in norm in U to a control input that produces a non-zero limit error with norm that can be made arbitrarily small by making arbitrarily close to unity. 9

13 (Note: If < 1 is close to unity, then the control input weighting in the above optimization problem is very close to zero. This sort of problem is frequently described as an almost singular or cheap control problem in the optimal control literature [8, 9].) 3 Iterative Learning Control for Linear, Continuous State-space Systems The general analysis has to be converted into computational procedures that will depend in their detail on the form of systems dynamics. The essential aspect of this conversion is that the procedure is causal in the Iterative Learning Control sense. Although referred to intuitively earlier in the paper, a formal denition is as follows Denition 2 An Iterative Learning Control algorithm is causal if, and only if, the value of the input u k+1 (t) at time t on the (k + 1) th trial/experiment is computed only from data that is available from the (k + 1) th trial in the time interval [0; t] and from previous trials on the whole of the time interval [0; T ]. (Note: This process is not causal in the classical sense as data from times t 0 > t can be used, but only from previous trials.) In the following these calculations are outlined for a case of practical interest, namely the choice of L 2 [0; T ] input and output spaces. The problem then reduces to a form of familiar linear quadratic tracking problem. 3.1 Problem formulation Suppose that the plant has m outputs and ` inputs with connecting dynamics described by a linear, possibly time-varying state space model. The input-output map G : U! Y and, in particular, the relations y k = Gu k and e k = r? y k take the form: _x k (t) = A(t) x k (t) + B(t) u k (t) ; x k (0) = 0 ; 0 t T ; k 0 y k (t) = C(t)x k (t) e k (t) = r(t)? C(t) x k (t) (29) The choice of input and output spaces is as follows u 2 U = L`2[0; T ]; (r; r(t )) 2 Y = L m 2 [0; T ] IR m : (30) The unusual choice of output space as the Cartesian product of a familiar L 2 space with IR m is required for generality but, more importantly, for the avoidance of numerical convergence problems in the nal moments of the trials, as seen below. The inner products in Y and U are dened as: h(y 1 ; z 1 ); (y 2 ; z 2 )i Y = 1 2 hu 1 (t); u 2 (t)i U = 1 2 Z T t=0 Z T t=0 10 y T 1 (t)qy 2 (t) dt zt 1 F z 2 (31) u T 1 (t)ru 2 (t) dt ; (32)

14 where Q, R are symmetric positive denite matrices and F is a symmetric positive semidenite matrix 1. The initial conditions are taken to be homogeneous without loss of generality because the plant response due to non-zero initial conditions can be absorbed into r(t), as discussed before. The index J k+1 with the specied norms in Y and U becomes a familiar linear quadratic performance criterion [5] J k+1 = 1 2 Z T 0 e T k+1(t) Q e k+1 (t) + (u T k+1(t)? u k (t)) R (u T k+1(t)? u k (t)) dt et k+1(t ) F e k+1 (T ) : (33) More precisely, it is a combination of the optimal tracking (tracking of r(t)) and the disturbance accommodation problem [20] (regarding u k (t) as a known disturbance in trial k+1). The optimal solution u k+1 was found in section 2 to be u k+1 = u k + G e k+1. The abstract denition of the adjoint operator G can be transformed into a more concrete description with the denitions of the adjoint operator G and of the inner products [13]. In this case, the equation u k+1? u k = G e k+1 containing the adjoint operator G becomes the familiar costate system [5]: _ k+1 (t) =?A T (t) k+1 (t)? C T (t)q e k+1 (t) ; k+1 (T ) = C T (t)f e k+1 (T ) u k+1 (t) = u k (t) + R?1 B T (t) k+1 (t) ; T t 0 (34) This system has a terminal condition (at t = T ) instead of an initial condition, marking it (as expected) as an anti-causal representation of the solution. It cannot therefore be implemented in this form. This problem is removed in the next subsection by the derivation of an alternative, but equivalent, causal representation in the Iterative Learning Control sense. Before doing this however, the need for the F term in the index J k+1 can be made clearer by noting that, if F = 0, then the terminal boundary conditions on the costate equations imply that u k+1 (T ) = u k (T ) and hence that u k (T ) = u 0 (T ) does not change from trial to trial. The eect of this is that the error is minimized in a least-squares sense (with respect to the L 2 -norm) but not uniformly, i.e. with respect to the supremum norm in the space of continuous functions in [0; T ]. Choosing F > 0 should therefore lead to improved convergence properties of the learning algorithm. 3.2 The Iterative Learning Control Algorithm The non-causal representation can be transformed into a causal algorithm when using a statefeedback representation. The transformation is shown in this section for the Iterative Learning Control algorithm (23) with relaxation factor. The optimal control is transformed by writing for the costate k+1 (t) =?K(t)(x k+1 (t)? x k (t)) + k+1 (t)), where is a state relaxation parameter, and hence u k+1 (t) = u k (t) + R?1 B T (t)? K(t) (x k+1 (t)? x k (t)) + k+1 (t) ; (35) 1 Formally, F should be positive denite but, if F has a full rank decomposition F = V V T, a simple redenition of Y as L m 2 [0; T ] V T IR m regains the Hilbert space structure. The details are omitted for brevity. 11

15 Standard techniques [2, 5] then yield the matrix gain K(t) as the solution of the familiar matrix Riccati dierential equation on the interval t 2 [0; T ]: _K =?A T K? KA + KBR?1 B T K? C T QC ; K(T ) = C T F C (36) This equation is independent of the inputs, states and outputs of the system. In contrast, the predictive or feedforward term k+1 (t) is generated by _ k+1 (t) =? with terminal boundary condition A? BR?1 B T K T k+1 (t)? C T Q e k (t) + (? )KBu k (t)? (1?) C T Qr(t) ; (37) k+1 (T ) = C T F (e k (T ) + (1?)r(T )) (38) The predictive term is hence driven by a combination of the tracking error and the input on the previous (i.e. the k th trial) and also the reference signal. This is hence a causal Iterative Learning Control algorithm consisting of current trial full state feedback combined with feedforward from the previous trial output tracking error data. This representation of the solution is causal in the Iterative Learning Control sense because (36) and (37) can be solved o-line, between trials, by reverse time simulation using available previous trial data. The dierential matrix Riccati equation for the feedback matrix K(t) in fact needs to be solved only once before the sequence of trials begin. Fig. 1 shows the sequence of computations and experiment for this u 0 = : : : compute K(t) in reverse time r(t), u k (t) -? compute predictor k (t) in reverse time k (t)? simulate one trial y k (t); e k (t) next iteration k k + 1 u k+1 (t)? remember u k+1 (t) Figure 1: Sequence of actual experiment and numerical simulations algorithm. 12

16 Finally, the choice of appears to be arbitrary and hence it may be possible to choose a value that aids the implementation or performance of the algorithm. It could be chosen to simplify the computation or, potentially (noting its algebraic similarity to the relaxation parameter ), to enhance the robustness of the scheme to modelling errors and/or measurement noise. The simplifying eect of the choice of can be illustrated by noting that the choice of = removes the term containing the input in the predictor equation. This might also have a positive eect on robustness because inclusion of u k (t) in the predictor equation is equivalent to a implicit re-creation of data of the previous trial in this dierential equation. If the input does not appear there, the predictor equation corresponds more closely to a purely predictive equation. 3.3 Discussion of Design Parameters The Iterative Learning Control algorithm can be implemented in practice if full state feedback is available. For an implementation, the free parameters Q, R and F must be chosen appropriately. With the objective of minimizing the error norm in mind, intuitive guidelines for the choice of these parameters are provided below. The convergence properties are assumed to be described by the sequence fj k g k0 which simultaneously represents the behaviour of the error sequence and the rate of change of the input signals. Changing the parameters aects the speed of decrease of J k. The parameter Q is related to the size of the error, the parameter R to the size of the change of the input and F to the size of the error at the end of the trial. To illustrate the eect of Q and R, consider Q to be xed and let R = R 0 where R 0 = R T 0 > 0 and > 0 is a variable parameter. It is then expected that for small the algorithm will change the incremental input substantially in order to achieve a small error, resulting in a fast rate of decrease of ke k k, while for large the converse holds and ke k k will only slowly decrease. The last parameter F is not easily related to the overall decrease of ke k k and rules of thumb for its choice need more intuition. It is suggested that it is advantageous to choose F, which appears in the terminal condition (36) for K(t), such that K(t) is as close to being constant as possible. In the time-invariant case, this is achieved if C T F C = K 1 holds, where K 1 is the solution of the algebraic Riccati equation. In this case, guaranteed phase and gain margins [11] apply and hence previously derived robustness margins are valid. If, however, the number of outputs is less than the number of states, then an exact solution for F is not possible, but it was found in simulations that the choice of F as the best approximate (least squares) solution of C T F C = K 1 gives good performance of the algorithm. 3.4 Simulation Examples To illustrate the convergence and robustness properties of the algorithm, the results of simulations for two benchmark plants are shown in this section. At rst, the values = = 1 are chosen, since the limit error is only zero if there is no \input relaxation". The rst plant is included for comparative purposes and is the same linear time-varying plant used in [4]. The state space parameters are as follows A = " 0 1?(2 + 5t)?(3 + 2t) # ; B = " 0 1 # ; C = h 0 1 i ; x 0 = " 0 0 # (39) 13

17 and the reference signal is r(t) = 12t 2 (1? t) in the normalised interval t 2 [0; 1]. For the simulations, full-state knowledge was assumed. This can e.g. be achieved by state observers. Fig. 2 shows two simulations where Q = 1, R =, F = 0:0744 and only the parameter is Errors at trials 1, 2, 3, 6, Performance criterion 10 0 e(t) time Errors at trials 1, 2, 3, 6, trial index Performance criterion e(t) time trial index Figure 2: Simulation of the plant from [4], showing dependence on. Top: = 0:1, bottom: = 0:01. Linetypes: trials 1 and 10: solid, 2: dashed, 3: dotted, 6: dot-dashed. changed. The graphs on the top with = 0:1 show slow convergence of the error to zero while the graphs on the bottom with = 0:01 show rapid convergence, as expected. The design parameter allows a good control over the convergence rate, as is also evident from the plots of J k. The proposed algorithm requires more knowledge of the plant dynamics than the one in [4] but provides, as reward, an improved convergence rate. In the next simulations, the inuence of on the rate of convergence and the limit error is studied. For the same plant and parameters as above (i.e. = 0:01), Fig. 3 shows the error e 20 (t) after 20 trials, which is very similar to the theoretical limit (25), for dierent values of and r(t) for comparison. As expected, the error is (nearly) zero for = 1 and increases if is smaller than one. Fig. 4 shows plots of the sequences fj k g k0 for dierent values of. The rate of decrease of J k is during the rst few trials not easily related to, but during later trials it is the faster the closer is to unity. Also, the value of J 20 is smaller if is closer to unity (because e 1 is smaller). Generalising from these observations, one can say that a value of close to unity is the preferred choice and robustness is the only reason to choose it smaller than one. The second benchmark example is a hanging pendulum featuring nonlinear dynamics and 14

18 2 Errors and Reference for Different Relaxation Factors e(t), r(t) α = 0.5 α = 0.75 α = 0.9 α = 0.95 α = 0.99 α = 1 r(t) 0 α = 1 α = time t Figure 3: The error after twenty simulations for dierent values of Cost functions for different relaxation factors α = α = 0.75 cost criterion α = 0.9 α = α = 0.99 α = iteration index k Figure 4: Graphs of the value of the performance criterion for dierent values of. 15

19 is included to indicate the robustness of the technique and, in particular, its robustness to nonlinear modelling errors and parameter uncertainties. Its behaviour is described by the equation (ml 2 + I zz ) + f _ + mgl sin = ; (0) = (0) _ = 0 (40) where is the angle between the pendulum and the vertical axis, m = 2 kg is its mass, l = 1 m is its length, I zz = 2=3 kg m 2 is its moment of inertia, g = 10 m/s 2 is gravity, f = 2 kg m 2 /s is a viscous friction constant and the input is the torque acting on the pendulum. For the Iterative Learning Control design, i.e. calculation of K(t) and k (t), the following linearised model with erroneous coecients (namely, m = 1 kg, I zz = 0; f = 0) was used + 10 = : (41) Full state knowledge, i.e. measurements of angular position and velocity, was assumed. Fig. 5 shows the result of the simulation where = 0:002 was used. The reference signal was dened Output and Reference at trials 1, 2, 4, 9, 20 4 y(t), r(t) e(t) Errors at trials 1, 2, 4, 9, 20 u(t) time Input at trials 1, 2, 4, 9, time Performance criterion time trial index Figure 5: Simulation of a pendulum. Linetypes: Trials 1 and 20: solid, 2: dashed, 4: dotted, 9: dot-dashed, r(t): marked with +. to be the signal ~r(t) = 5(t? t 2 =10) sin( 2t ) with 0 t 10 and its derivative, i.e. the 10 position and velocity were tracked. This reference brings the system well away from the point of linearisation and ensures that the nonlinearity is aecting systems dynamics. Because the algorithm includes a feedback term, no extra pre-compensator is required. It is noted that despite the use of a linear model and despite the incorrect physical constants used in this 16

20 model, the algorithm achieves rapid convergence. proposed algorithm. This demonstrates the robustness of the It was found in a number of simulations for a range of rst to fourth order plants that good convergence can be obtained. In nearly all cases, good control over the L 2 norm of e k is achieved with the parameter. Only non-minimum phase plants showed a slower convergence. The causes for this phenomenon were studied in [1]. Overall, the new Iterative Learning Control algorithm was found to be successful in terms of denition 1. 4 Conclusions This paper has developed an Iterative Learning Control algorithm based on optimization principles and has provided a complete convergence analysis of the algorithm in a Hilbert space setting. This setting ensures that the results apply to a large class of systems including continuous and sampled data systems. The abstract form of the results indicates the need to transform the representation of the solution into a causal representation for Iterative Learning Control implementation. This transformation has been provided for the case of continuous systems. It was shown that the causal representation is a combination of feedback (of current trial data) and feedforward (of previous trial data). The inclusion of feedback in the control update rule opens up the possibility of improved robustness of the algorithm when compared with previously reported results. If the state is not available, it must be estimated by an observer or alternatively, output feedback schemes can be used. For these, the proposed state-feedback Iterative Learning Control algorithm and associated analysis serves as a benchmark method for the subject area. In particular, it indicates that sucient process information enables convergent learning to be easily achieved for a wide range of parameter values hence leaving design for performance as the major issue. This should be contrasted with other approaches to Iterative Learning Control which typically require small-gain type conditions for convergence and hence permit only a limited range of parameter values. A preliminary method of including robustness into the algorithm has been developed based on the use of relaxation techniques. These ideas have a strong connection to the notion of cheap optimal control and produce geometric convergence, even in innite dimensional problems, to an approximate and arbitrarily accurate solution to the original Iterative Learning Control problem. A formal examination of more general robustness issues was not included, but the experience in numerical analysis with the related Levenberg-Marquardt method and results of Iterative Learning Control simulations indicate that the algorithm possesses robustness to a useful degree. This issue is topic of present research and will be addressed in future publications. One specic advantage of the proposed algorithm and its interpretation as a linear quadratic tracking and disturbance accommodation problem is that the rate of decrease of the error can be inuenced in a natural and intuitive way by several design parameters (weighting matrices in quadratic costs). This benet is not available in other optimal descent methods, e.g. the steepest descent method in [10]. This is largely due to the simultaneous selection of step direction and step length of the proposed algorithm as compared to the steepest descent method where these are computed sequentially. Furthermore, the classical setting of the algorithm allows the application and study of other typical problems of linear quadratic optimization. Interesting questions and critical points for the actual implementation include the extension of the algorithm 17

21 to nonlinear plants, the possibility of incorporation of uncertainty about the plant, questions of robustness and improvement of the rate of convergence by using more complicated performance criteria. Acknowledgements This research is supported by EPSRC grant number GR/H/48286 and forms part of a collaboration between Professor D. H. Owens of the Centre for Systems and Control Engineering at Exeter University and Dr. E. Rogers of the Department of Electronics and Computer Science at the University of Southampton. References [1] N. Amann and D. H. Owens. Non-minimum phase plants in iterative learning control. In Proc. 2nd Int. Conf. on Intelligent Systems Engineering, Hamburg-Harburg, [2] B. D. O. Anderson and J. B. Moore. Optimal Control { Linear Optimal Control. Prentice Hall, Englewood Clis, N.J., [3] S. Arimoto, S. Kawamura, and F. Miyazaki. Bettering operation of dynamic systems by learning: a new control theory for servomechanism or mechatronic systems. In Proc. 23rd IEEE Conf. on Decision and Control, pages 1064{1069, Las Vegas, Nevada, [4] S. Arimoto, S. Kawamura, and F. Miyazaki. Bettering operations of robots by learning. J. Robotic Systems, 1(2):123{140, [5] M. Athans and P. L. Falb. Optimal Control. McGraw-Hill, New York, [6] D. J. Bell and D. H. Jacobsen. Singular Optimal Control Problems. Academic Press, New York, [7] K. Buchheit, M. Pandit, and M. Befort. Optimal iterative learning control of an extrusion plant. In Proc. IEE Int. Conf. Control '94, pages 652{657, Coventry, [8] D. J. Clements and B. D. O. Anderson. Singular Optimal Control: The Linear-Quadratic Problem, volume 5 of Lecture Notes in Control and Information Sciences. Springer- Verlang, Berlin, [9] B. A. Francis. The optimal linear-quadratic time-invariant regulator with cheap control. IEEE Trans. on Automatic Control, AC-24(4):616{621, [10] K. Furuta and M. Yamakita. The design of a learning control system for multivariable systems. In Proc. IEEE Int. Symp. on Intelligent Control, pages 371{376, Philadelphia, Pennsylvania, [11] T. Kailath. Linear Systems. Prentice Hall, Englewood Clis, N.J., [12] J.-J. Lee and J.-W. Lee. Design of iterative learning controller with VCR servo system. IEEE Trans. on Consumer Electronics, 39(1):13{24,

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter 7 Interconnected