Design strategies for iterative learning control based on optimal control

Selected Topics in Signals, Systems and Control Vol. 2, September 2 Design strategies for iterative learning control based on optimal control Rob Tousain, Eduard van der Meché and Okko Bosgra Mechanical Engineering Systems and Control Group Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands E-mail: r.l.tousain@wbmt.tudelft.nl E-mail: e.g.vandermeche@wbmt.tudelft.nl Abstract. This paper deals with the analysis and synthesis of Iterative Learning Control (ILC) systems using a lifted representation of the plant. In this lifted representation the system dynamics are described by a static map whereas the learning dynamics are described by a difference equation. The properties of the lifted system and in particular the role of non minimum phase zeros and system delays are investigated. Based on the internal model principle a general, integrating update law is suggested. Two methods are proposed for the design of the learning gain, based on optimal control theory. In the latter, a multi-objective design, the convergence speed is optimized subject to a bound on the closed loop variance due to stochastic initial conditions, process disturbances and measurement noise. Efficient tailor-made solutions to the design problems are presented, making optimal use of the specific and nice structure of the lifted system ILC representation. The potential of the design methods is demonstrated on a realistic example. Keywords. Iterative Learning Control, optimal control, variance control, multi-objective control, Algebraic Riccati equation. Introduction Many control systems perform the same task over and over again. The use of feedforward control for this kind of problems is common practice and will generally yield much improved performance compared to pure feedback control. One way to achieve high-quality feedforward controls is through the application of Iterative Learning Control (ILC), an iterative update scheme which improves the quality of the feedforward signal from trial to trial. Iterative Learning Control can be particularly effective in case of imperfect knowledge on the plant. Literature on ILC is already quite extensive. The work on ILC was pioneered by Arimoto et al. (984). Convergence properties for linear and nonlinear ILC have been studied and many design strategies for ILC This paper will be presented at the 4th IEEE Conference on Decision and Control, December 4-7, 2, Orlando, Florida. Copyright of this paper remains with IEEE. schemes were proposed. For a detailed and recent review we refer to (Bien, 998). The paper builds on the lifted system representation which was, to our knowledge, first introduced in relation to Iterative Learning Control by (Phan, 988). In this representation of the learning control system, the system dynamics are described by a static map whereas the dynamics of the learning mechanism are described by a difference equation in the repetition domain. Several authors used the same system representation, mainly for analysis of the learning convergence, see e.g. (Moore, 998; Amann et al., 996). However, a systematic model-based synthesis of learning controllers on the basis of the lifted system representation seems to be lacking. This paper provides two main contributions. First, the properties of the lifted system representation will be investigated in relation to ILC design. The role of system delays and (non minimum phase) zeros, which were not investigated in

e.g. (Phan, 988; Moore, 998; Amann et al., 996), shall be discussed. Secondly, this paper aims to show that the analysis and design of learning update schemes can be done using standard classical and modern (optimal) control methods. This holds even in case of non repeating disturbances and measurement noise, the effect of which can be accounted for in the ILC design. Two design methods shall be presented. The optimal control design problems they entail are of very large dimension in general. Efficient methods for solving them will be presented. Only linear SISO discrete time systems are handled. Extensions to linear multi-variable systems are often obvious. The paper is organized as follows. First, the lifted system representation and the corresponding ILC design problem will be explained in Section 2. Then, in Section 3 we will, using the internal model principle, investigate the steady state convergence properties of the ILC and propose a general update law which will yield asymptotic convergence if the repetition dynamics are stabilized. The stabilization is done by means of static output feedback, the design of which is the main subject of Sections 4 and 5. Section 4 presents a minimum variance approach to ILC design based on the well-known Kalman filter; this approach enables to trade off convergence speed against the minimization of the effect of stochastic disturbances. A single tuning factor determines how both objectives are compromised, however, the trade-off between convergence speed and variance minimization is still done in an intuitive fashion. Therefore, in Section 5 we propose a multi-objective formulation of the ILC design problem where the convergence rate is optimized subject to a prespecified bound on the closed loop variance. The design methods described in the paper are demonstrated on a representative example of a fourth order mechanical system in Section 6. Finally, Section 7 presents some conclusions. 2 Lifted system representation A typical feedback control system is given in Figure. P is the plant, K a feedback controller, y the measured output, u K the controller output, w disturbances, v measurement noise and r the reference signal. Finally, u is a feedforward control. The SISO LTI plant P is given by the following statespace equation (without loss of generality no direct feed through is assumed) { x p P : k+ = Axp k + B(u K,k + u k )+B d w k y k = Cx p k + v () k where w and v are assumed Gaussian white noise with covariance matrices respectively W and V. r K e u K u w P y Fig. : Feedback control system with additional feedforward The controller K is given as follows K : { x c k+ = A K x c k + B K(r k y k ) u K,k = C K x c k + D K(r k y k ) v (2) With K and P as defined above, the closed loop system is given as follows T : { xk+ = Ax k + B v k e k = Cx k + D v k (3) with ( ) ( A B A B B 2 B 3 B 4 ) = C D C D D 2 D 3 D 4 = A BD K C BC K B B d BD K BD K B K C A K B K B K C (4) and where v k =[u k,w k,v k,r] T are the external signals. We now consider the behavior of this system for a specific task, which is to follow a desired reference signal r of finite length N, starting from an initial condition x. We assume that this same task is repeated over a sequence of trials, the aim of Iterative Learning Control being to compute a feedforward signal u in such a way as to converge to a sufficiently small error e. The technique of lifting refers to a construction whereby one lifts a continuous-time signal to a discrete-time one, see (Bamhieh et al., 99). The idea of signal and system lifting is at the basis of sampled data system analysis and control and it is extremely well applicable to the analysis and synthesis of ILC schemes. For discrete time systems lifting boils down to a very straightforward construction, as will be shown next. We introduce the lifted input u l to represent the input to the system in the l th trial: u l R N := [u l,u l,...,u l N ]T.The lifted representation of the error in trial l is given by e l R N := [e l,e l,...,e l N ]T. Based on the lifted representation of all the signals the lifted system is given by P : e l = P u l + P 2 w l + P 3 v l + P 4 r + P x x l (5) 2

where P : R N R N, P 2 : R Nnw R N, P 3 : R N R N,andP 4 : R nx R N are Toeplitz matrices given by G i... G i G i... P i :.. (6).... G i N Gi N 2... Gi and P x is given by P x : ( C T (CA) T... (CA N ) T ) T (7) G i = D i,g i k = CAk B i,k =, 2,..., are the Markov parameters of system (3) Note that the lifted representation of the plant is a static, multivariable system with N outputs. 2. Properties of the lifted system The lifted system representation given by (5) was used earlier by e.g. (Phan, 988; Moore, 998). However, so far not much attention has been paid to the properties of such systems in relation to ILC. From traditional ILC design it is known that special attention needs to be paid to the role of system delays and non-minimum phase zeros. Their effect on the characteristics of the lifted system will be studied next. We will consider the map P only, since its role is essential in the design of learning controllers. Obviously, the matrix P will loose rank in case one or more of the leading Markov parameters are zero. Suppose G,...,G d =,G d. Then, the rank of P will be N d. Further, if the system (3) has zeros outside the unit disk, then the gap between the largest and the smallest non-zero singular value of P will generally be large. This can be seen as follows. Let the non-minimum phase zeros of (3) be given by a,a 2,...,a nz. Then, by the definition of zeros, for each zero a i there exists an initial condition x i and an input u i =[a i,a i,...,an i ] T such that y = P u i + P x x i = (8) Using this expression, an upper bound for the smallest non-zero singular value of P is given by σ(p ) < P xx i (9) ū i Since the norm of ū i increases fast for increasing N, the upper bound on σ(p ) will tend to zero for large N. The rank deficiency of P and the large gap between the smallest and the largest non-zero singular values implies that there exists a subspace of input signals that is not, respectively poorly observable in the measured output. This is undesirable for technical reasons which will become clear in the remainder of this paper. One way to improve the condition of P is by removing one or more columns, corresponding to a shortening of the length of the input vector. Another way will be presented next. Reduced system representation Let P R N N have rank N d. Also, let its singular value decomposition be given by P = UΣV T, with Σ = diag(σ,σ 2,...,σ N ). According to Mirsky (96) the best rank r approximation of P (r N d) in any unitarily invariant matrix norm is given by P r = Udiag(σ,...,σ r,,...,)v T. The null space of P r is spanned by the last N r columns of V. This suggests a parametrization of the inputs u = U p û where U p contains the first r columns of V and û R r is the new input. The generalized reduced representation of the (feedforward part of the) lifted system is then given by ŷ = ˆP û where ˆP = P r U p. Using this type of model reduction, we can remove those input directions which do not contribute to the system output significantly. The reduction proposed here is done on the basis of system knowledge only, a further reduction can be achieved if knowledge on the reference signal is used as well, see e.g. (Hamamoto and Sugie, 999). In the remainder of the paper we will stick to the notation y = P u, with P of dimensions N times N u,for the feedforward part of the lifted system, however all results apply to any rank r order approximation of P. 2.2 Learning control design problem In literature on ILC the initial condition x is often assumed zero, or at least constant for each trial. This assumption is not very realistic and not necessary in our approach. Instead, we will assume x to be a random, zero mean variable with covariance X, where X is the stationary closed loop variance of closed loop system (3): [ ] X = AXA T Bd WBd + T B K VBK T () The random variables w l, v l and x l are uncorrelated, hence an alternative, compact formulation of the lifted system (5) is given by P : e l = P 4 r + P u l + Gˆv () where ˆv R N is a random variable with covariance matrix I and G is defined by the Choleski factor of the composite covariance matrix, i.e. GG T = P 2 W P T 2 + P 3 V P T 3 + P x XP T x. This compact, lifted representation of the plant is depicted in Figure 2. The ILC design problem now can be defined as to find a strict causal feedback control law u = L(e) 3

r ˆv u P e Since P has full column rank, we can choose K such that KP is non-singular. Then, the steady state tracking error is given by L Fig. 2: The lifted plant; dashed line: interconnection with learning controller. such that the feedback interconnection of L and P is stable and such that e converges to a sufficiently small value. We emphasize: learning control is feedback control! 3 The internal model principle - asymptotic convergence We will now discuss the structure of L. update laws in ILC are of the form: Typical L : u l+ = u l + Ke l (2) To explain why this is a proper structure we use the internal model principle. In absence of noise, we want our closed loop system to track the fixed reference input r asymptotically. The internal model principle states that we can achieve this if we include a model of the reference dynamics, for a constant reference this is a bank of integrators, in the controller. This is exactly what happens if we use an update law like (2). The internal model principle assumes that the number of inputs will be at least equal to the number of outputs. Recall that P will generally not be square, so that the internal model principle does not apply. We will next investigate the steady state error properties of the non-square ILC scheme. We first combine the integrating dynamics of the control law with the expression of the lifted plant to derive a new extended plant as follows: P : { u l+ = u l + u, e l = P 4 r + P u l + Gˆv (3) P describes the dynamics of the learning control scheme in the repetition domain. For any stabilizing control u l = Ke l and in the absence of noise, the learning iteration will converge to a fixed point u which is given by KP 4 r + KP u = (4) The repeating reference signal constitutes a vector of constant reference inputs for the lifted system. e = P 4 r P (KP ) KP 4 r (5) which is not equal to zero in general. Zero error tracking can be achieved iff r is chosen such that P 4 r = P z, for any z R Nu. One way to achieve this is by minimizing in least square sense the difference between r and an ideally desired reference r, e.g. min (r r ) T Q (r r ) r,z (6) s.t. P 4 r + P z = Remark In the previous, we considered feedback laws of the form u l+ = u l + Ke l. In literature, e.g. (de Roover, 997), often a so-called robustness filter Q is included, yielding u l+ = Qu l + Ke l. It can easily be shown that this learning scheme will converge to zero steady state off-set only if Q = I. For this reason, we decided to omit Q. It is our proposition that zero steady state off-set needs not be be sacrificed for the sake of robustness and noise attenuation; in this paper, we will show how we deal with noise through a proper design of K. 4 Minimum variance learning A straightforward design strategy for the learning gain is by static decoupling and pole placement, see e.g. (Amann et al., 996). However, such a design strategy does not take the effect of the stochastic initial condition, process disturbances and measurement noise into account explicitly. In this section, we propose a design method which enables the user to trade-off convergence speed against the amplification of noise and disturbances. The proposed minimum variance static output feedback design problem is given as follows min J = lim ( K M M E Σ M l=u lt u l) s.t. u l+ = u l + u l + Fw l (7) e l = P u l + Gˆv l u l = Ke l where the variable w l with covariance I is added for regularization of the problem. Note that omitting w l would yield a singular optimal control problem. If F is chosen equal to αi with a scalar α, then the choice of α determines the trade-off between the amplification of noise and the convergence speed. The gain that solves (7) can be obtained by solving the corresponding estimation Riccati equation. Using the interpretation of K as the Kalman gain, we can develop some intuition for the choice of α. Choosing 4

α close to zero will yield a low-gain K, hence will result in slow convergence and a low variance of the controls and the error. If we choose α then the solution will tend to the dead beat control solution discussed in e.g (Phan, 988). Hence, we can use α to tune the performance of the learning controller. The solution of the estimation Riccati equation for problem (7) can be computed very efficiently via the transformation to a purely diagonal design problem. The transformation proceeds as follows. Let the generalized singular value decomposition of (P F ) T and G T be given as follows (P F ) T = U P CX T, G T = U G SX T (8) where U P and U G are unitary matrices, X is a square matrix of dimension N, S is a diagonal matrix of dimension N, and C = [ C ] where C R Nu Nu is a diagonal matrix. We introduce the state transformation ū = T u, and the output transformation ē = T 2 e, with transformation matrices given by T = UP T F,andT 2 = X respectively. Then, the problem of finding the Kalman gain K for the system in problem (7) is equivalent to the problem of finding a Kalman gain K = T KT 2 for the system ū l+ =ū l + w l, ē l = C T ū l + S v l, where the covariances of w and v are I. It can be easily shown that because the last N N u rows of C T are zero and S is diagonal the last N N u rows of the Kalman gain for the transformed system will be zero as well. Hence, the Kalman gain is given by K =[ K ], where K is the Kalman gain for the square and diagonal system which is obtained by truncating the last N N u rows of the output. For this diagonal problem, the Riccati solution and the corresponding Kalman gain can be obtained by simple element-wise computations. 5 Multi-objective design: convergence speed vs. noise attenuation Ultimately, we would like to synthesize a learning gain in such a fashion that we can directly control the trade-off between noise attenuation and convergence speed. In this section such a design strategy is proposed. If r is chosen such that P 4 r = P z, z R Nu then the lifted system representation can be transformed into =ũ l + u l e l = P ũ l + Gv l (9) ũ l+ where ũ = u+z and with a non-zero initial condition ũ = z. We seek a static output control u l = Ke l such that e converges to a small value according to some control objectives. Let us first investigate the objectives we would like to achieve. Variance reduction One of the goals in the multi-objective design is to limit the steady state variance of the feedforward signal and of the error. Traditional H 2 control minimizes the trace of the covariance. However, because of the interpretation of u as the lifted input or, in case an input parametrization is used (see Section 2.), as the coefficients in the input parametrization, it is in our case more appropriate to consider the variance of each state as an individual objective. This amounts to defining the following objectives: J i = lim M M E ( Σ M l=ũ l2 ) i,i=,...,nu (2) Convergence speed Next to limiting the effect of noise on the feedforward and error signal we require a certain speed of convergence of the learning controller. One way of doing this is by constraining the closed loop eigenvalues to lie in a prescribed region within the unit circle. However, a fast convergence of the learning error in for example a 2-norm sense, does not necessarily mean that all eigenvalues need to lie within a prescribed region. Also, the enforcement of all closed loop eigenvalues may lead to a high gain feedback which can result in poor variance results. Therefore, we will consider a different objective here, namely a deterministic LQR objective. In the presence of noise, this objective can be defined as follows J =Σ l=e(ũ l ) T QE(ũ l )+E( ũ l ) T RE( ũ l ) (2) where E(ũ l ) represents the deterministic component in ũ l due to the effect of the non-zero initial condition ũ = z. Multi-objective design The proposed multiobjective design problem consists of finding a static output feedback K such that the LQR objective is minimized while satisfying a prespecified bound on the closed loop variance. Mathematically: min J K s.t. J i γ, i =,...,N (22) u This type of multi-objective problem has no straightforward solution in general. However, if we choose Q = I and R = βi, then the optimal solution can be found easily. To demonstrate this, we diagonalize the system by applying the input/state transformation ū = T ũ, ū = T u and the output transformation ē = T 2 e. The state/input transformation is given by T = UP T and the output transformation is given by T 2 =[ C O]X, where U P, X, C and O are given by the generalized singular value decomposition of P T and G T : P T = U P [ C O]X T, G T = U G SX T (23) 5

This renders the following transformed system ū l+ =ū l + ū l ē l =ū l + C S v l (24) The objective functions, expressed in the transformed variables become J i = lim M M E ( Σ M l=ū l2 ) i,i=,...,nu (25) J =Σ l=e(ū l ) T E(ū l )+βe( ū l ) T E( ū l ) (26) The control problem defined by system (24) and objectives (25,26) is diagonal, hence its solution will be a diagonal feedback. Let the i-th diagonal element of C, S and ˆK be denoted respectively c i, s i and k i. Then, J and J canbeexpressedintermsofthesolution of the scalar, closed loop Lyapunov functions of the individual loops, s 2 i k2 i J i = p i = c 2 i (k2 i +2k i) N u N u ( ) (βk J = p 2 i = i +) ki 2 +2k i i= i= (27) (28) for 2 < k i <. The minimization of J boils down to the minimization of the individual contributions p i, hence the optimal feedback gain is found by solving a series of N u convex (!) scalar, optimization problems: min k i (βk 2 i +) k 2 i +2k i s 2 i k2 i s.t. c 2 i (k2 i +2k i) γ (29) 2 <k i < By inspection, the solution is given by k i = max(k i,k i ) (3) where k i is obtained by solving the variance constraint p i γ to equality, and k i minimizes p i : k i = 2γc2 i γc 2 i + s2 i k i = +4β 2β (3) (32) The feedback gain for the original problem is found by back transformation: K = T ˆKT2. 6 Example As an example, we consider the servo control problem for a typical fourth order mechanical system. svd value [ ] 2 4 6 8 svd number [ ] Fig. 3: Singular values of P. The plant and the feedback controller are respectively given by ( ) A B = C D 3.95.45.9732.497.9 4...5 3 (.478.365.349.238) ( AK B K C K D K ) ( =.948.8564 ) (33) with a sampling time of.5 ms. A servo task with a duration of 2 samples is considered. Measurement noise with a covariance of 4 is assumed. The singular values of P are plotted in Figure 3. The last 4 singular values of P are computed to be [5.79 9, 2.26 9, 5.23 24, ]. In accordance with the analysis presented in Section 2. this suggest that the leading Markov parameter of the process sensitivity system is zero, which is indeed the case. Also, the very small non-zero singular value rightly indicates the presence of a non minimum phase zero. Further, from Figure 3 it is obvious that there exists a large input subspace which contributes negligibly to the output. This subspace corresponds to the high-frequency input signals: indeed the process sensitivity of the closed loop system has a high-frequency roll-off. We will base our ILC design on a 4 th order norm-optimal approximation of P as in Section 2.. The servo task is defined as to move with constant velocity from position to position in 25 ms, starting at time 25 ms. Based on these operating specifications we compute a reference signal via a least squares optimization problem like (6) where we add a set of inequality constraints to account for input saturation (at -4 respectively +4). The resulting reference trajectory and feedforward signal are displayed in Figure 4. We now consider the design and implementation of a minimum variance learning controller as described in Section 4. In Figure 5 the convergence of the output and the feedforward are displayed for the case 6

position.5.5.5 2 4 6 8 position.5.5.5 2 4 6 8 feedforward 5 5 2 4 6 8 time [ms] feedforward 5 5 2 4 6 8 time [ms] Fig. 4: Reference trajectory and nominal feedforward. α σ 2 (u) σ 2 (e).3 5.76 2.8e-2. 9.23 2.e-2.3 57.56 2.7e-3 Table : Steady state input and error variance for the three Kalman designs. Fig. 5: Output and feedforward signal for Kalman ILC design with α =., solid: trial, dashed: trial 2, dotted: trial. summed squared errors 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 iteration number of measurement noise and non-zero initial conditions for a choice of F = αi, with α equal to.. The decay of the summed square error for this design is plotted in Figure 6, together with the decay for two other choices of α,.3 and.3. The steady state input and error variances for the three designs are given in Table. Note that reducing α yields a significantly decreased effect of the measurement noise on the input variance, however only at the expense of a significant reduction in convergence speed. Next, a multi-objective design is done. In Figure 7 the convergence of the output and the feedforward are displayed for γ =. andβ =. Note that, despite the tight variance constraint, a reasonably quick convergence is established. The decay of the summed square error for designs with respectively γ =.,. and is plotted in Figure 6. Note that the initial decay is fast for all three designs. Compare this to Figure 6 where convergence speed is largely sacrificed for the sake of a limited closed loop covariance. This clearly indicates the benefits of the multi-objective design approach. Finally, the closed loop input and error variance for the three designs is given in Table 2. Fig. 6: Decay of summed square error for the first 2 iterations using Kalman ILC design, solid: α =.3, dashed: α =., dotted: α =.3. 7 Conclusions In this paper we presented a systematic approach to the analysis and synthesis of Iterative Learning Controllers based on a so-called lifted system representation. In this representation the dynamics of the learning mechanism are described by a set of difference equations. The actual plant dynamics transform to a static map in the lifted domain. It is shown that system delays and non-minimum phase zeros lead to respectively singularity and a bad condition of this static map. This can be resolved using a reduced order representation of the lifted system. By applying the internal model principle we showed that the integrating learning update law is a good structural choice since it provides zero error steady state tracking in the absence of noise for any stabilizing gain. A Kalman filter based design method for this learning gain was presented next, which takes the presence of stochastic initial conditions, process noise and measurement noise into account explic- 7

position.5.5.5 2 4 6 8 feedforward 5 5 2 4 6 8 time [ms] Fig. 7: Output and feedforward signal for the multi objective ILC design with γ =., β =. Solid: trial, dashed: trial 2, dotted: trial. summed squared errors 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 iteration number Fig. 8: Decay of summed square error for the first 2 iterations using multi-objective ILC design with β =. Solid: γ =., dashed: γ =., dotted: γ =. itly. Finally, we presented a multi-objective design approach in which a deterministic quadratic criterion is minimized, penalizing the error as well as the change in the feedforward signals, subject to a bound on the steady state covariance of every single input. Efficient solutions, based on diagonalizing state and output transformations were proposed for the two optimal control methods presented. The design strategies are tested on a realistic example, a fourth order mechanical system. The results clearly demonstrate the potential of the multi-objective design. γ σ 2 (u) σ 2 (e)..37 2.9e-2. 3.29 2.4e-2 22.48 2.8e-2 Table 2: Steady state input and error variance for the three multi-objective designs with β =. Bettering operation of robots by learning, Journal of Robotic Systems,, 23-4. Bamieh, B., Boyd Pearson, J., Francis, B.A. and A. Tannenbaum (99). A lifting technique for linear periodic systems with applications to sampleddata control, Systems & Control Letters, 7, 79-88. Bien, Z. and J. Xu (998). Iterative learning control: analysis, design, integration and applications. Kluwer Academic Publishers, Boston. Roover, D. de (997). Motion control of a wafer stage; a design approach for speeding up IC production. Ph.D. Thesis, Delft Univ. Techn., The Netherlands. Golub, G.H. and C.F. van Loan (983). Matrix computations. The John Hopkins University Press, Baltimore. Hamamoto, K. and T. Sugie (999). An iterative learning control algorithm within prescribed input-output subspace, Proc. of the 4th World Congress of IFAC, 5-56. Mirsky, L. (96). Symmetric gauge functions and unitarily invariant norms, Q. Journal of Mathematics, (2),, 5-59. Moore, K.L. (998). Multi-loop control approach to designing iteratively learning controllers, Proc. of the 37 th IEEE CDC, 666-67. Phan, M.A. and R.W. Longman (988). A mathematical theory of learning control for linear discrete multivariable systems, Proc. of the AIAA/AAS Astrodyna-mics Specialist Conference, Minneapolis, MN, 74-746. References Amann, N., Owens, D.H. and E. Rogers (996). Iterative learning control using optimal feedback and feedforwad actions, Int. Journal of Control, 65, 277-293. Arimoto, S. Kawamura, S. and F. Miyazaki (984). 8