Simple Learning Control Made Practical by Zero-Phase Filtering: Applications to Robotics

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 49, NO 6, JUNE 2002 753 Simple Learning Control Made Practical by Zero-Phase Filtering: Applications to Robotics Haluk Elci, Richard W Longman, Minh Q Phan, Jer-Nan Juang, and Roberto Ugoletti Invited Paper Abstract Iterative learning control (ILC) applies to control systems that perform the same finite-time tracking command repeatedly It iteratively adjusts the command from one repetition to the next in order to reduce the tracking error This creates a two-dimensional (2-D) system, with time step and repetition number as independent variables The simplest form of ILC uses only one gain times one error in the previous repetition, and can be shown to converge to the zero-tracking error independent of the system dynamics Hence, it appears very effective from a mathematical perspective However, in practice, there are unacceptable learning transients A zero-phase low-pass filter is introduced here to eliminate the bad transients The main purpose of this paper is to supply a journal presentation of experiments on a commercial robot that demonstrate the effectiveness of this approach, improving the tracking accuracy of the robot performing a high speed maneuver by a factor of 100 in six repetitions Experiments using a two-gain ILC reaches this error level in only three iterations It is suggested that these two simple ILC laws are the equivalent for learning control, of proportional and PD control in classical control system design Thus, what was an impractical approach, becomes practical, easy to apply, and effective Index Terms 2-D systems, iterative learning control, precision motion control, robotics, zero-phase filtering I INTRODUCTION ALARGE number of the applications of feedback control in engineering practice are to situations where the aim is to execute the same tracking command repeatedly Except for very simple commands, this results in repeating deterministic errors in following the specified trajectory The response of a linear discrete-time feedback control system can be written as Manuscript received July 16, 2001; revised January 22, 2002 This work was supported in part by the National Aeronautics and Space Agency under Grant NAG 1-649 This paper was presented in part at the Conference on Information Sciences and Systems, Princeton, NJ, 1994 This paper was recommended by Co-Guest Editor M N S Swamy H Elci is with Monitor Corporation, Turkey (e-mail: Haluk_Elci@monitorcom) R W Longman is with the Department of Mechanical Engineering, Columbia University, New York, NY 10027 USA (e-mail: RWL4@columbiaedu) M Q Phan is with Dartmouth College, Hannover, NH 03755 USA (e-mail: MinhQPhan@dartmouthedu) J-N Juang is with NASA Langley Research Center, Hampton, VA 23681 USA (e-mail: jjuang@larcnasagov) R Ugoletti was with the Lockheed Engineering and Sciences Company, Hampton, VA 23681 USA Publisher Item Identifier S 1057-7122(02)05602-7 a sum of the solution to the homogeneous equation, which includes the influence of initial conditions and is independent of the command, and a particular solution that can be written as a convolution sum over the entire command history This sum contains the command for the present time step in only one of the terms Generically, the solution will not be equal to the command at the current time step Also, in applications it is common to have disturbances that repeat each time the command is given An example of this is the gravity torque disturbance on a robot link when it follows a desired trajectory through the workspace For such situations, iterative learning control (ILC) develops methods to iteratively adjust the command as the task is repeated, nominally aiming to converge to zero-tracking error These iterative methods aim to accomplish this with little knowledge of the system They examine the tracking error in the response for the last run or repetition, and then adjust the command in the next run It is assumed that between each run the system is returned to the same initial condition Repetitive control (RC) is a closely related field in which the command or the disturbance is periodic, and there is no resetting of the initial conditions between successive periods Practical approaches to ILC and to RC can be very similar, but ILC forms a special type of two-dimensional (2-D) system, whereas RC does not In ILC, one independent variable is the repetition number or run number, and this variable can be considered to run from zero to infinity, with zero being a run using feedback control alone The second independent variable is time, and unlike many 2-D systems, this variable has finite duration the number of time steps in the desired trajectory Robots are a prime example of control systems executing the same command repeatedly, and this application formed the basis for the main initial flurry of activity in ILC starting in 1984 Papers that appeared that year include [2] [4], and also submitted that year was the RC paper [5] with the same motivation Reference [6] serves as a precursor to this burst of activity, again motivated by robotics Related early work on multipass processes, that can have ILC as a special case, includes [7], [8] motivated by coal mining Some additional general references in the field include [9] [14] The learning process in ILC can take many forms [7] [14] For example: It can be based on integral control concepts from classical control theory, but applied in repetitions It can be based on contraction mappings in either the time or frequency domains It can be based on indirect adaptive control theory or 1057-7122/02$1700 2002 IEEE

754 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 49, NO 6, JUNE 2002 model reference adaptive control theory operating in time or in repetitions or both It can be based on numerical methods for minimization of a function, or numerical methods for root finding Or one can try to model the system and invert it to find the input needed for the desired output Examples of each of these can be found in the references in [13] and [14] Various authors have approached the ILC problem from a 2-D perspective A discrete time state space approach analogous to the mathematical formulation in [15] converted to 2-D appears in [16] Reference [17] uses a transform domain approach, applying -transforms in both time and in repetitions References [18] [21] give other examples of 2-D approaches to ILC This paper presents in journal form, experimental results on a commercial robot previously only available in conference proceedings, and it gives a full treatment of the mathematical basis for achieving these results Section II below presents a general mathematical formulation for linear ILC appropriate for a 2-D perspective, and then it is specialized in Section III to simple forms of ILC that use only one or two gains The practical difficulties in applying the simplest learning control law are characterized in Section IV, and Section V presents the zero-phase filtering approach to producing good transients and to robustifying the performance Section VI discusses the application of ILC to robotics, while Section VII gives experimental results on a commercial robot showing the practicality of the simple ILC designs II MATHEMATICAL FORMULATION AND GENERAL ILC This section reviews the theoretical background for linear learning controllers as treated in [15], [22] The formulation is digital, making it natural for implementation, and is made as a state variable formulation in the repetition domain, rather than the time or frequency domain This gives a general understanding of the range of possible linear learning control laws A The State Variable Model in the Repetition Domain Consider the general SISO or MIMO linear discrete-time system given by where is the dimensional state vector, is the dimensional input vector, is the dimensional output vector Vector is a forcing function representing any deterministic disturbance that appears each time the command is executed We assume that a feedback controller is operating so that the matrix is the closed loop system matrix and is the command to the feedback system This assumption is natural for most applications, but is not important to the mathematical development A time step repetitive process is considered that starts from the same initial condition each repetition The solution to (1) is (1) (2) Fig 1 Iterative learning control as a 2-D system Define the difference operator that differences values of any variable in two successive repetitions, and When applied to (2) the first and last terms on the right are eliminated, and the resulting equation can be written in matrix form as where The underbars indicate the matrix of histories of variables during a repetition We assume that the product be written in the alternate form (3) (4) is full rank Note that (3) can This equation can be thought of as a state-space representation of system behavior in the repetition domain In addition to the finite nature of the time variable when thinking of ILC as a 2-D problem, the fact that the system matrix in repetitions is the identity matrix makes a somewhat specialized situation Note that the input in this state variable representation in repetitions is the change in the learning control signal from the previous repetition Also, the repetitive disturbances are eliminated in the differencing, making the learning control laws developed equally applicable to systems with or without such disturbances (5)

ELCI et al: SIMPLE LEARNING CONTROL MADE PRACTICAL BY ZERO-PHASE FILTERING 755 B General Formulation of Linear ILC Laws Fig 1 gives a pictorial representation of the 2-D ILC problem, with time step and repetition number as the independent variables A very general form for linear iterative learning control, computes the learning control input at time step in repetition, as a linear combination of all previous inputs and all previous tracking errors [22] This includes all previous times in the current repetition, and then the complete input and error histories in all previous repetitions This can be written in matrix form as The error at time step is defined as where is the desired trajectory And is defined following the rules for in (4) The matrices and relate to the current repetition (or cycle) and are sometimes referred to as current cycle ILC In order to obey causality, ie, not ask for future errors and inputs, these two matrices must be lower triangular In the absence of a direct feedthrough term from input to output in (1), it is not, in general, possible to converge to zero-tracking error using these two terms alone Matrices and are related to previous cycles Hence there is no causality issue, and they can be fully populated if desired When and are included for more than the previous repetition or cycle, the learning law is termed a higher order learning law Reference [22] discusses which of these matrices are considered in various papers in the literature It also develops relationships indicating that one can produce dynamically equivalent ILC with the matrices and eliminated And it is shown that one can create dynamically equivalent learning processes using only matrices and in (6), eliminating all previous cycles, but in order to do this, one must allow these two matrices to be repetition varying III THE SIMPLEST FORMS OF ILC SINGLE-GAIN ILC AND TWO-GAIN ILC The general ILC law in (6) contains a very large number of possible gains, and the number of possible gains increases with every time step and every repetition The first purpose of this paper is to treat the simplest form of ILC that uses only one scalar gain (SISO case), developing what is needed to produce good transients of the learning process, and at the same time make the good transients robust to typical singular perturbation model errors at high frequencies The second purpose is to establish the benefits of including one more learning gain associated with one previous time step In words, the simplest form of learning control, a one gain ILC, does the following in application to the feedback controller for a robot link: If in the last repetition, the robot link was 2 deg too low at time step, then add a constant times 2 deg to the command at time step in the next repetition The two-gain ILC examines what happens when one includes one more gain multiplying the error at time step Mathematically these one and two-gain ILC laws can be written respectively as (6) (7) These can be obtained from (6) by setting the identity matrix, for the single-gain case and in the two-gain case is an all zero matrix except for along the diagonal and along the subdiagonal, and all other and are zero Note that the control at time step in (1) first influences the error at time step, and hence both laws include adjustment to the control action based on error one step ahead, but in the previous repetition This dependence for the single-gain ILC is indicated in Fig 1 by the diagonal arrows At time step in repetition, out of all the possible previous data points in the figure, only the data at the circled point is being used The two-gain ILC adds one more point, the boxed point Another instructive interpretation of the single-gain ILC is as follows Note that when one looks at any time step, and solves the difference equation (7), the control action at simply becomes the learning gain times the sum over all previous repetitions of the errors observed at time step This is the discrete equivalent of a separate integral of the error for each time step, and corresponds to applying integral control concepts in the repetition domain Hence, one can refer to this learning law as integral control based ILC Now, consider an interpretation of the two-gain ILC Looking at the first of (7) we might make it analogous to proportional control, in the sense that the error we are trying to fix by choice of is, and hence it is natural to pick a control law that is proportional to this error in the previous repetition Starting from this point of view, it is natural to ask what would a PD controller (with proportional plus derivative action) look like in this context? Since we are in discrete time, the equivalent of derivative action becomes a finite difference between two time steps divided by the time interval Using a backward difference for this purpose produces the two-gain ILC of (7) A Convergence to Zero-Tracking Error Is Independent of System Dynamics To study convergence to zero-tracking error of the learning laws in (7), note that (5) and (7) can be written as and [15] Combining these equations produces the error propagation equation which indicates that the tracking error tends to zero as for all time steps and for all initial conditions, if and only if all eigenvalues of the coefficient matrix are less than one in magnitude Both ILC laws in (7) have lower triangular matrices (in the single-gain case it is also diagonal) Then, the lower triangular nature of matrix causes to be lower triangular, and the eigenvalues become obvious as simply the diagonal elements of this matrix (SISO case), ie, for both single and two-gain ILC The range of learning control gains producing convergence to zero-tracking error for both ILC laws is then (8) (9) (10)

756 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 49, NO 6, JUNE 2002 Fig 2 A 7-degrees-of-freedom Robotics Research Corporation robot Note that this condition has the somewhat amazing property that convergence to zero-tracking error is independent of the system dynamics contained in matrix One implication is that as far as convergence is concerned, there is no need for stability in the time dimension of the 2-D problem, and of course there is no need to know the matrix Note also that the matrix is a discrete time input matrix, and if it comes from a continuous time system fed by a zero order hold, then it tends to zero as the sample time goes to zero Hence, as the sample time goes to zero the range of learning gains producing convergence to zero-tracking error goes to infinity For the robot problems discussed below that use a sample rate of 400 Hz, the range of gains was from 0 to 91 Many robots use a sample rate of 1000 Hz, and then the range would be roughly two and a half times larger On the other hand, a natural choice of the learning gain, based on the response of a feedback control system to a constant input, is one over the dc gain of the feedback control system For typical robot-link controllers the dc gain is one, so that this choice of gain produces the control law: when the link was 2 deg too low last repetition, add 2 deg to the command Going to higher gains than this will cause over correction of any constant error component and for other low-frequency components For example, at the upper limit of 91, the correction for the link being 2 deg too low is to increase the command in the next repetition by 182 deg This indicates a very wide margin for convergence No reasonable engineer would think of making such a gigantic adjustment for a 2-deg error, but the control law still converges under such circumstances We comment that these same kinds of results apply to using this same learning law to achieving zero-tracking error at sample times in nonlinear systems The main condition needed is that the nonlinear system satisfy a Lipschitz condition [23] B Bad Learning Transients The single-gain learning law was applied to the robot shown in Fig 2 and described later in more detail The RMS tracking error in following the prescribed trajectory decreased by a factor of 35 db in about nine repetitions, or a factor of 50 When the repetitions were extended past 9 going up to 15, the RMS error was increasing And by repetition 15 the robot was making so much noise that we were afraid to continue for fear of damaging the robot Note that this factor of 50 is a very substantial improvement in tracking performance, and in practice one might use this law to reach this error level and then freeze the learning process [24] Fig 3 Experimental and identified model-frequency responses However, the mathematics guarantees convergence to zero error If only we could continue the experiments a few more repetitions, maybe we could see the convergence It is of interest to know what would have happened For this, we can turn to simulation [25] The command to response for the feedback control system for each link of the robot could be modeled reasonably well by a third order system (11) according to the experimental Bode plot given in Fig 3 Here, rad/s (14 Hz),, and rad/s (59 Hz) Simulating the guaranteed stable learning process for the desired trajectory of over 6 s using the robots 400-Hz sample rate resulted in exponential overflow on the computer This exponential overflow is just a bad learning transient on the way to zero-tracking error (see below and [26] for explanations of this phenomenon from several points of view) To make the transient small enough that the computer could handle it, we decreased the length of the desired trajectory to 1 s instead of 6 s, and we changed the sample rate to 100 Hz Also, we simplified the trajectory slightly Then, the computer could simulate the learning process, and the results are shown in Fig 4 [13], [25] The RMS error initially decreased as shown in the inset, from 04330 to 01402 rad at repetition 7 Then, the error increased to a maximum of at repetition 62 132 A numerical zero of is reached at repetition Clearly, in spite of the guaranteed convergence of this learning law, it is not practical C Understanding the Convergence Mechanism and the Bad Learning Transients Examine the components of (8) Because of the lower triangular nature of the matrix, the first component of the error satisfies This formula shows that the error at the first time step will converge monotonically, without any bad transients provided that (10) is satisfied And it will do so independent of the feedback control system dynamics in matrix The second component of the error satisfies

ELCI et al: SIMPLE LEARNING CONTROL MADE PRACTICAL BY ZERO-PHASE FILTERING 757 Fig 4 Simulation of the learning transients of integral control based learning applied to a robot link for the single-gain case One can write the corresponding equations for the two-gain case and see analogous properties Once the error in the first time step has converged, then this equation looks just like the first time step did, and from then on it will converge monotonically (and independent of matrix ) The corresponding equation for time step will have such extra terms Hence, convergence occurs in a wave starting at the first time step and progressing through all time steps The potential for bad transients becomes clear if we note that at the beginning of the trajectory it is reasonable to add 2 deg to the command if the link was 2 deg too low But at time step, many steps into the new trajectory, there have been corrections in the command made for all previous steps In the first repetition, the fact that there was a 2-deg error last repetition at step is most likely no longer relevant after having changed the command to the system for all time steps from 0 to step The error at this step last repetition only becomes clearly relevant once the wave of convergence of zero error has arrived at this time step, so that inputs at previous times are no longer changing And until such a time is reached, the extra terms can add up to a large error Now, to see that not only they can add to a large number, but in most physical systems they will do so, examine the learning process in the frequency domain Suppose that the desired trajectory is long enough that there is a significant portion that is past the settling time of the system, so that we can think in terms of steady-state frequency response For continuous time feedback control systems the number of poles is normally larger than the number of zeros, and the pole excess over the number of zeros, times 90 deg, is the phase of the system at high frequency If the sample rate is high enough that the system dynamics is reasonably represented in discrete time, one will see similar phases in the discrete time system Hence, for some frequency most all systems will have a phase lag of 180 deg Now, instead of considering simply a 2-deg error at a given time step, consider a sinusoidal error of amplitude 2 deg at his frequency The single-gain learning law in (7) says that we should add the learning gain times this sinusoid to the command for the next repetition When this sinusoid is sent through the feedback control system, the 180-deg phase change is the equivalent to having changed the sign on the correction term Hence, the correction will add to the error instead of attenuate the error From a steady-state frequency response point of view, the system looks unstable It is only the convergence in a wave described above that saves the convergence the wave of convergence means that eventually there is no part of the finite time trajectory that can be described by the steady-state frequency response thinking above This thinking suggests that the boundary between improving the error and making it worse might happen around 90-deg phase lag, half way between 0 deg, that fully subtracts from the error, to 180 deg, when it fully adds to the error The actual dividing line is more restrictive than this, and is defined by a unit circle in a Nyquist plane described below Clearly, what is needed is to have a learning process that learns in a more uniform way across the total time interval of the desired trajectory, in spite of the fact that the convergence mechanism that defined the convergence boundary relies on convergence in a wave And furthermore, we want this to be true in spite of the fact that the learning law under consideration, (7), only looks at one or two points for each time step, the circle or the circle and the square in Fig 1 IV FIXING THE BAD LEARNING TRANSIENTS A A Frequency Response Based Good Transient Condition To view the learning process in the frequency domain, take the -transform of (1) and (7) together with the definition of the error, to obtain the analog of (8) This produces and, Combining these equations gives (12) This equation is for the single-gain ILC case For the two-gain ILC law, simply substitute for the in these equations (and note that this moves a zero around and hence can be thought of as including a compensator) Now, note that appears as a transfer function from the error at one repetition to the error at the next repetition We suppose that we ask that the system (nominally a feedback control system) is asymptotically stable so that after transients, one reaches steady-state frequency response Then, if the transfer function amplitude is less than one for all frequencies up to Nyquist (13) then the amplitude of every frequency component of the error will decay monotonically with repetitions for those parts of the trajectory for which steady-state frequency response thinking applies The first form of (13) is for the SISO case The second form we cite to indicate how the approach generalizes to MIMO systems, and here indicates the maximum singular value The is the sample time interval Fig 5 gives understanding of the status of this condition Region 1 refers to the settling time of the system, often considered as four

758 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 49, NO 6, JUNE 2002 Fig 5 Dividing the time interval of the desired trajectory into regions times the longest time constant, and it is the amount of time needed before the effects of initial conditions become small and steady-state frequency response thinking could start to apply Hence, the monotonic behavior from (13) could start to apply in Region 2 In the learning control problem there is another transient involved, the learning process that converges in a wave as described above Region 3 represents the time interval where this wave has already produced near zero error References [27], [14] suggest this frequency response condition (13) as a condition for producing good learning transients, as described here These good transients are obtained in the smaller of Regions 2 and 4 where poor transients were encountered, and they apply in a uniform way across this interval as desired These two references also show that satisfying condition (13) guarantees that the true stability condition (10) is satisfied Thus (13) is a sufficient condition for stability, guaranteeing convergence even if the settling time is so long that Region 2 does not exist Of course, in that case there is no statement about monotonicity of the learning process B Producing Good Transients by a Frequency Cutoff In practice it is unlikely that there is any learning gain that will satisfy (13) The reasons are the same as the frequency response thinking at the end of Section III The inequality boundary in (13) can be viewed as a unit circle centered at in the Nyquist polar plot of with Frequencies for which the inequality is satisfied produce monotonic decay of the associated frequency component of the error, and the decay factor is the radial distance from to the point on the Nyquist plot for that frequency For a reasonable learning gain and a reasonable feedback control system, the plot will always start out inside the unit circle at frequency zero As the frequency increases, typical systems develop phase lag, and once the lag reaches 90 deg, then the plot is definitely outside the unit circle So amplification of error components starts before reaching 90 deg lag And the radial distance for any frequency that puts the plot outside, indicates the amount by which the error at that frequency is amplified each repetition (for steady-state response parts of the trajectory) A fix to the amplification problem, ie, the poor transients problem, is then to simply cut off the learning for frequencies outside the unit circle This produces good learning transients at the expense of no longer asking for zero error, ie, not trying to fix components of the error for frequencies outside the unit circle We will later argue that such a cutoff is also needed in practice for purposes of robustness of good transients in ILC, and for purposes of not working the hardware too hard We will also consider the improvement in performance available by a simple two-gain learning control as a compensator to keep the plot inside the unit circle to higher frequencies A close to ideal filter can be generated using for example a high-order Butterworth filter, and applying it in a zero-phase manner This means that the filter is run through the data going forward in time producing attenuation above the cutoff, and also introducing some phase lag Then, the resulting signal is filtered again, but the filtering is done in reverse time This produces more attenuation above the filter cutoff, and the reversed time introduces phase lead that cancels the phase lag of the previous filtering There are some subtleties in handling the initial conditions for the filters as discussed in [28] Reference [29] gives various ways of representing the zero-phase filtering, using matrices and transforms In matrix form, using to represent the zero-phase filter, the single-gain learning control law becomes (14) Using the transfer function version, the equivalents of (12) and (13) become (15) (16) From (16), it is clear how one picks the cutoff to make all steady-state frequency components of the error monotonically nonincreasing in (15) Looking at (16) it appears that a causal low-pass filter instead of a zero-phase filter could produce the desired monotonic behavior But the final error levels reached are determined by the sensitivity transfer function from command and disturbance to error [29] (17) The zero-phase filtering is needed so that below the cutoff frequency looks like zero To whatever extent it is nonzero, there is a forcing function that produces nonzero error after convergence, even for frequencies below the cutoff This equation can be used to predict how much error will remain as a result of cutting off the learning, once the learning process has reached steady-state behavior The law (14) presumes that one filters the total signal being applied to the feedback controller Another option is to just filter the learning part of the signal and keep the original command unfiltered Which gives better final error levels depends on the original frequency distribution of the error, and can be predicted by comparing the corresponding sensitivity transfer function formulas [25] C On the Need for the Cutoff for Robustness We have introduced a frequency cutoff in order to produce practical learning transients Instead, designers might be tempted to work hard making a dynamic learning law replacing the single-gain to satisfy condition (13) for all frequencies, based on whatever model they have It is suggested here that a frequency cutoff will normally still be needed in practice for good transience robustness, ie, to prevent possible long

ELCI et al: SIMPLE LEARNING CONTROL MADE PRACTICAL BY ZERO-PHASE FILTERING 759 term bad transients Suppose that one is able to cancel the transfer function of the system model, ie, the learning gain is replaced by the reciprocal of the feedback controller transfer function model Provided the model was perfect (and has a stable inverse, which is usually not the case in discrete time systems), this would make the left-hand side of (13) equal to zero, satisfying the inequality For any system model, there are some dynamics that are not included This can be an extra vibration mode, or an extra pole that is hard to identify, an amplifier that is not quite a perfect gain at high frequencies, or a body assumed rigid that is slightly flexible, etc The learning control law (7) insists on getting zero-tracking error, so that it keeps pushing to eliminate all error An extra parasitic pole in continuous time introduces an extra 90 deg lag in the Nyquist plot An extra 90 deg will send the plot outside the unit circle, violating (13) The learning process is unforgiving If it sees some error, no matter how small, it will start working on it And if that error, although perhaps negligible from an engineering point of view, is associated with this much extra phase lag, the steady-state frequency response part of the signal (Region 2 or 4) will have error components that grow monotonically Our claim is that in the context of learning control, singularly perturbed systems are generic in real world applications, and these perturbations are a fundamental issue in obtaining good transient performance This sensitivity is in contrast to stability of feedback control systems, where errors of the system model up at high frequencies approaching Nyquist are normally irrelevant to system stability Stability is determined at lower frequencies where the Bode magnitude plot crosses the zero db line, determining the phase margin With a positive phase margin, the phase and magnitude plots at higher frequencies can do whatever they want, provided they do not return to 0 db, and the system is still stable The amount of initial error associated with these parasitic poles (unmodeled poles) could be very small, making it take a long time before the growth is evident In one set of repetitive control experiments it took about 2650 repetitions before growth was evident [13] The only thing that stops the dynamics of parasitic poles from producing unwanted growth of the error is the quantization level in digital to analog and analog to digital converters Once the error that can normally grow is below the last digit retained, then the learning process can no longer accumulate this error [30] In addition to the above considerations, one is likely to want a cutoff in order to be considerate of the hardware When one asks for zero-tracking error at all frequencies up to Nyquist, significant components of the error far above the bandwidth of the feedback control system require a very large corrective signal This can wear out the hardware quickly, and may use substantial energy D Simplicity of Tuning Consider the single-gain ILC (the two-gain case will be discussed later) This law involves only one gain and the choice of a cutoff frequency ie, there are only two parameters to adjust We suggest that this simplicity makes the single-gain law in learning control analogous to proportional control in classical control system design the simplest possible design, one that one can apply routinely And if it gives the desired performance, the design is done If it does not, one can go to a more sophisticated law with more gains To design an ILC, one picks whatever order Butterworth (or other low-pass) filter one wants Then, a learning gain of one over the dc gain of the feedback control system is natural in that it fixes constant errors in one repetition (in steady state) A higher gain over corrects for such errors and therefore is not a logical choice in practice A smaller gain can be desirable because once steady-state operation is reached as the repetitions progress, the fluctuations in the error are reduced that occur as a result of random effects going through the learning law Also, a smaller learning gain can allow a somewhat higher cutoff [13] With these considerations it is easy to pick a reasonable learning gain Note that there is nothing crucial or sensitive in this process Now, consider how one picks the cutoff It can be done a priori in the design process, or it can easily be adjusted empirically In most situations a priori determination of the cutoff is best made by performing a frequency response test on the feedback control system (whoever designed the feedback control system is likely to already have made such a plot) This is often done feeding in a white noise signal Then, one has an experimental graphical representation of, and one can simply use it in (13) to determine what the upper limit is on when the cutoff has to happen Then, check that it works with the chosen zero-phase low-pass filter in (16), lowering the cutoff until it does To adjust the cutoff empirically, start with a high cutoff, and run the system If the RMS error does not behave monotonically, reduce the cutoff until it does One can even make a self tuning learning controller to do this [33] Another simple approach starts with a high cutoff, and when the error starts to grow, examines the frequency content using a discrete Fourier transform to determine what frequencies are growing, and then set the cutoff below these frequencies Hence, one really needs no modeling to use this learning control law The tuning of the two parameters is extremely easy and straightforward And the only drawback is that the amount of the error that is eliminated, the error below the cutoff, is determined by nature As in design of classical controllers, one starts simple and makes more sophisticated controllers only if the desired error levels are not reached In this case, if desired, the next step is to design a compensator for that keeps the plot inside the unit circle to higher frequencies, allowing one to use a higher cutoff E More Comments on Good Transients, Frequency Cutoffs, and Two-Gain Tuning Consider the use of the single-gain ILC on first order systems The Nyquist plot related to (13) can easily stay within the unit circle indicating that there is no problem of bad transients for true first order systems And this is close to the only category of systems for which this can be said, the other option being higher order systems with a pole excess of one in continuous time As soon as one goes to a second order system (or a pole excess if two or more) the phase lag should be enough to go out of the

760 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 49, NO 6, JUNE 2002 unit circle using any reasonable Nyquist frequency However, the comment on still needing to use a frequency cutoff in practice still applies any parasitic pole that is actually present will make the system have a larger phase lag and exhibit in practice bad transients Now, consider the two-gain ILC applied to first order systems Solving a first order system for the control input that will produce zero error in the next time step produces a two-gain control law Seen another way, the inverse of the system matrix is a matrix with the structure of the two-gain learning matrix Hence, if we use this inverse as the two-gain learning matrix, the left-hand side of (13) becomes zero (if the model is correct), and the inequality for good transients is satisfied Moreover, the error decays to zero after one repetition according to (12) So, the two-gain law can convert the asymptotic convergence of the single-gain law into convergence in one repetition for first order systems The same comments on the need for a cutoff due to singular perturbations applies again What can the two-gain controller do for higher order systems? Many feedback control systems have a Bode plot with a real pole coming before any complex conjugate poles In this case, we can tune the second gain to cancel this part of the dynamics in the frequency response Closed loop controllers for robots will almost always have this form, with the real pole being introduced by the designer to limit the bandwidth and keep from exciting vibrations This suggest that we tune the two-gain ILC as follows, produce an overall gain related to the reciprocal of the feedback control dc gain as before, and then adjust the second gain correspondingly so that it cancels this first pole of the system V ILC AND ROBOTICS A ILC Literature Related to Robotics As mentioned, the motivation for the flurry of activity in learning control starting in 1984 was robots doing repetitive operations [2] [6] Since that time there have been numerous experimental works applied to robot manipulators, eg, [1], [32] [47] The approaches focused on continuous-time [32] [37], discrete-time [38] [41], or frequency domain formulations [42], as well as neural networks [43] Of course, final implementations have all been digital, and this motivates starting with a digital approach as is done here The literature often takes the following approach The standard form for the second-order nonlinear dynamic equations for multiple rigid body motion is used Typically, a conventional fixed gain controller such as a proportional plus derivative controller is employed for purposes of assuring stability but not high accuracy [2], [32] [38], [42], [43] The learning control is used to generate a torque feedforward signal that is added to the torque signal from the feedback PD control to improve tracking accuracy Some authors rely on a linearized model between the torque feedforward signal and the remaining tracking error to be corrected [2], [33] [36], [39], [42] Others use an additional nonlinear feedforward signal [32], [37], [38], [40] or a neural network to assure convergence for the nonlinear robot equations In contrast to the common use of a torque feedforward signal in the literature, we apply the learning concept to the inputoutput relation of the feedback control system, and make the learning control generate an altered position command trajectory to improve the tracking error The emphasis is put on the input-output dynamics of the feedback control system, rather than the torque-to-position properties of the robot equations This approach is simple to implement since it only requires having access to the feedback data and modifying the command which can be done in software Injecting a torque feedforward signal into an existing controller [42] can be more difficult Perhaps the common use of toque-to-output learning control in the literature is due to the emphasis on the full nonlinear robot dynamic equations However, the approach used here when augmented with a compensator [27], [44] can get quite close to the reproducibility level of the robot hardware This means that use of simple linear thinking can get close to the theoretical limit of the learning process, and no use of more complicated robot models can possibly produce significantly better results Unlike many of the references, no additional sensor information is needed such as velocity and acceleration measurements, nor are the derivatives of the desired trajectory employed The literature talks about type and type learning control, where the former works on the position error while the latter looks at the derivative of the position error This issue arises because in continuous time the robot equations for a single link from torque input to position output have the property that the product of the continuous time and matrices is zero, making type learning fail This issue does not appear in the approach used here, which focuses on the command to output response of the robot feedback controllers and uses only position feedback information in the updates This paper makes the journal presentation of experiments reported in [1] Other papers presenting experiments done by the authors on the same robot include [27], [44], and [45] References [46], and [47] are more recent works taking approaches more closely related to those of the authors B The Robot Dynamics The experiments were performed on a redundant 7-degrees-of-freedom, all revolute joint Robotics Research Corporation K-series 807iHP manipulator (see Fig 2) with a Type 2 robot controller (maximum workspace radius without tool is 089 m, maximum payload 20 lb) All information about the robot parameters including inertias, damping values, etc were available from the manufacturer, including the control loop parameters The authors developed a full scale nonlinear dynamic model of the system for dynamic simulations in the DADS (dynamic analysis and design system) commercial program package combined with the control loops The resulting model is quite complex, and any learning control law that seeks to use the full nonlinear equations, as suggested in the literature, will be quite cumbersome If the flexibility effects of the harmonic drives are taken into consideration, the system order becomes 28 without including the rather high-order analog controllers for each joint They make extensive use of compensators, and not only have position and velocity feedback loops, but also there is a current loop on the amplifiers for the brushless dc motors, and there is a torque loop driven by

ELCI et al: SIMPLE LEARNING CONTROL MADE PRACTICAL BY ZERO-PHASE FILTERING 761 a strain-gage feedback from the output side of the harmonic drive This allows influence over the flexibility of the harmonic drives, and is used to compensate gravity and reduce the influence of frictional effects The maximum joint speeds range from 55 /s for the largest link to 180 /s for the smallest The maximum sampling rate is 400 Hz which is used in the learning control experiments discussed below Much of the iterative learning control literature for robotics seeks to use the full nonlinear multibody dynamics equations Let us think about what would be involved in applying these results First, the main results only apply to multiple rigid body models and would not normally be applied with the flexibility of the harmonic drives included This flexibility characteristic of nearly all commercial robots, produces vibration frequencies, and these vibrations are in fact the main difficulty facing the control system designer, normally causing him to pick a bandwidth that cuts off control action before the first vibration mode frequency One can still use the nonlinear ILC results with this flexibility, if one includes the inertias of the armatures of each motor as extra rigid bodies Then, the control laws ask for angle encoders not only on the links but also on the armatures, which is not likely to be available Further difficulty occurs because the existing controllers from the manufacturer do not have the form of multibody dynamics equations, and hence, they must be replaced by whatever the learning law asks for Most likely, current loops, torque loops, gravity compensation, and any filters and compensators that the experienced classical control system designer puts into his feedback design would be a challenge to incorporate into the nonlinear ILC design mathematics, and still be able to prove convergence And finally, the control law would require real time computation of nonlinear dynamics of at least 28th order for our 7-degrees-of -freedom robot This is all a big task, and would only be undertaken in practice if it offered significant benefits over simpler approaches Our robot experimental results [27] show that one can get close to the reproducibility level of the robot without this complexity Here we apply the single-gain and the two-gain ILC laws to the Robotic Research Corporation robot These laws are implemented in a decentralized manner, applying a separate independent learning control to the feedback controller for each link of the robot The cutoff frequency is picked based on frequency response tests of the individual link controllers Experimental responses were obtained for 18 small amplitude sinusoidal inputs to one of the joints and the results are shown in Fig 3 All other joints were active, but had zero position command inputs No other information about the robot and its dynamics is needed to tune the controllers And as mentioned above, we don t really need this information either, we could simply tune the cutoff purely empirically, cutting it down until bad transients are eliminated There is, of course, no need to fit the Bode plot with a transfer function model, but we note that it can be fit by the third-order model of (11) Essentially the same transfer function was obtained for all joints This is related to the fact that when a robot manufacturer tunes the controllers for each joint, it is logical to have each of them exhibit the same time constant Since all joints must work together to execute the robot tasks, there is no point in having some joints have significantly faster time constants than others It is instructive to interpret the third order transfer function that fits the data The first order pole was introduced by the gains of the joint control loops and produces a bandwidth of 14 Hz The control system designer has introduced this to attenuate the control actions well before the first resonance in the system, to avoid exciting vibrations Since all joints will normally participate in the first vibration mode of the system, each joint needs the same bandwidth to limit the excitation of this mode The second-order term is attributed to the first vibration mode of the 14-degrees-of-freedom multibody system, having flexible interconnections between the bodies in the form of harmonic drives (here we must include the seven inertias of the motor armatures on one end of each drive) Some comments on this linear systems approach to designing an ILC for robots that are governed by nonlinear equations 1) Perhaps the majority of nonlinear systems being controlled in the world have controller designs made using linear theory An example is the design of feedback controllers for each joint of a robot If it works, one cannot complain 2) We will see that a factor of 100 improvement in RMS tracking error is obtained fast and easily with these two simple linear ILC laws, and in [27] including a secondorder compensator gets about another factor of 10 This is very close to the reproducibility level of the robot, so there is no way that a more complicated robot model could significantly improve performance, and, therefore, no motivation to try to implement more complicated approaches 3) The vast majority of robots use harmonic drives, and these are likely to have gear ratios such as 80 to 1 or 160 to 1 The gear ratios appear squared in the nonlinear dynamic equations, and make the coupling terms between bodies significantly smaller by comparison to non coupling terms This helps to understand why one can use separate uncoupled learning controllers for each joint (and also why the robot manufacturer uses uncoupled feedback control designs for each joint) 4) One might ask, is the Bode-plot information so independent of input amplitude that we can rely on one such plot and use linear thinking? The experiments were run with small amplitude sinusoidal inputs But this was almost a necessity Once the frequency gets up a bit to say 10 Hz, moving mass with substantial amplitude requires a lot of energy The kinetic energy in an oscillating mass with a given amplitude, goes up as the square of the frequency Hence, above the really low frequency range, the amplitudes one encounters in operation are necessarily small, and linear thinking becomes reasonable Also, the vibration frequencies are not seriously changed by moving the robot location in the work space 5) The experimental results suggest that it is the presence of vibration modes that is the most severe limiting factor in obtaining good robot tracking, and that the dynamic nonlinearities are not the most basic limitation to performance