Feedback-error control

Similar documents
Radial Basis Function Networks: Algorithms

Statics and dynamics: some elementary concepts

MATH 2710: NOTES FOR ANALYSIS

Design of NARMA L-2 Control of Nonlinear Inverted Pendulum

Principal Components Analysis and Unsupervised Hebbian Learning

2-D Analysis for Iterative Learning Controller for Discrete-Time Systems With Variable Initial Conditions Yong FANG 1, and Tommy W. S.

An Analysis of Reliable Classifiers through ROC Isometrics

Controllability and Resiliency Analysis in Heat Exchanger Networks

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

Robust Predictive Control of Input Constraints and Interference Suppression for Semi-Trailer System

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Session 12 : Monopropellant Thrusters

Machine Learning: Homework 4

Meshless Methods for Scientific Computing Final Project

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

Notes on Instrumental Variables Methods

An Improved Calibration Method for a Chopped Pyrgeometer

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Approximating min-max k-clustering

Lecture 1.2 Units, Dimensions, Estimations 1. Units To measure a quantity in physics means to compare it with a standard. Since there are many

Topic 7: Using identity types

MODEL-BASED MULTIPLE FAULT DETECTION AND ISOLATION FOR NONLINEAR SYSTEMS

Radar Dish. Armature controlled dc motor. Inside. θ r input. Outside. θ D output. θ m. Gearbox. Control Transmitter. Control. θ D.

State Estimation with ARMarkov Models

Oil Temperature Control System PID Controller Algorithm Analysis Research on Sliding Gear Reducer

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [quant-ph] 20 Jun 2017

Multivariable PID Control Design For Wastewater Systems

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

MODULAR LINEAR TRANSVERSE FLUX RELUCTANCE MOTORS

Generalized analysis method of engine suspensions based on bond graph modeling and feedback control theory

Hidden Predictors: A Factor Analysis Primer

Best approximation by linear combinations of characteristic functions of half-spaces

Multivariable Generalized Predictive Scheme for Gas Turbine Control in Combined Cycle Power Plant

ECE 534 Information Theory - Midterm 2

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Real Analysis 1 Fall Homework 3. a n.

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

CMSC 425: Lecture 4 Geometry and Geometric Programming

integral invariant relations is not limited to one or two such

Multiplicative group law on the folium of Descartes

Recursive Estimation of the Preisach Density function for a Smart Actuator

Understanding DPMFoam/MPPICFoam

Indirect Rotor Field Orientation Vector Control for Induction Motor Drives in the Absence of Current Sensors

+++ Modeling of Structural-dynamic Systems by UML Statecharts in AnyLogic +++ Modeling of Structural-dynamic Systems by UML Statecharts in AnyLogic

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls

Distributed Rule-Based Inference in the Presence of Redundant Information

THE 3-DOF helicopter system is a benchmark laboratory

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

Observer/Kalman Filter Time Varying System Identification

Universal Finite Memory Coding of Binary Sequences

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

A Simple Fuzzy PI Control of Dual-Motor Driving Servo System

Applications to stochastic PDE

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Neural network models for river flow forecasting

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics

Chapter 1 Fundamentals

0.6 Factoring 73. As always, the reader is encouraged to multiply out (3

The Second Law: The Machinery

Numerical Linear Algebra

DEVELOPMENT AND VALIDATION OF A VERSATILE METHOD FOR THE CALCULATION OF HEAT TRANSFER IN WATER-BASED RADIANT SYSTEMS METHODS

Sets of Real Numbers

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

ADAPTIVE CONTROL METHODS FOR EXCITED SYSTEMS

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Recent Developments in Multilayer Perceptron Neural Networks

which is a convenient way to specify the piston s position. In the simplest case, when φ

Genetic Algorithm Based PID Optimization in Batch Process Control

Characteristics of Beam-Based Flexure Modules

Information collection on a graph

Title. Author(s)Okamoto, Ryo; Hofmann, Holger F.; Takeuchi, Shigeki; CitationPhysical Review Letters, 95: Issue Date

Lower bound solutions for bearing capacity of jointed rock

Multiple Resonance Networks

PHYSICAL REVIEW LETTERS

On split sample and randomized confidence intervals for binomial proportions

CSE 599d - Quantum Computing When Quantum Computers Fall Apart

Session 5: Review of Classical Astrodynamics

5. PRESSURE AND VELOCITY SPRING Each component of momentum satisfies its own scalar-transport equation. For one cell:

Determining Momentum and Energy Corrections for g1c Using Kinematic Fitting

RUN-TO-RUN CONTROL AND PERFORMANCE MONITORING OF OVERLAY IN SEMICONDUCTOR MANUFACTURING. 3 Department of Chemical Engineering

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Metrics Performance Evaluation: Application to Face Recognition

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Information collection on a graph

Modeling Pointing Tasks in Mouse-Based Human-Computer Interactions

Figure : An 8 bridge design grid. (a) Run this model using LOQO. What is the otimal comliance? What is the running time?

CHAPTER-5 PERFORMANCE ANALYSIS OF AN M/M/1/K QUEUE WITH PREEMPTIVE PRIORITY

The Noise Power Ratio - Theory and ADC Testing

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm

Spin as Dynamic Variable or Why Parity is Broken

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel

Uniform Law on the Unit Sphere of a Banach Space

Transcription:

Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller which learns to generate the correct control inuts to drive a system s state variables along a desired trajectory. It has been successfully used in a number of control alications, including robot arm control [87], an automatic car braking system [92], and learning nonlinear thruster dynamics to control an underwater robotic vehicle [11]. FBE was originally intended by Kawato to be an abstract model of some of the low level motor areas of the brain, although in the engineering context it has outgrown its biological relevance. The motivation for studying FBE here is that it is a recursor to the FOX controller, with some of the same oerational roerties (although FOX is far more flexible). 4.2 The basics This section describes the essentials of FBE control. The arguments are based around a continuous-time system, but identical arguments aly to discrete-time systems. The tyical FBE system is shown in figure 4.1. The system being controlled, F, obeys the dynamical equation: ẏ = F (y, x) (4.1) In general no restrictions are laced on the function F, but this work is mostly concerned with hysical mechanical systems such as robotic maniulators. The vector y(t) is the controlled system s state vari- y d (t) A x a (t) x(t) F ẏ(t) t o y(t) d dt e (error) x q (t) Q Figure 4.1: The feedback-error controller. 51

52 CHAPTER 4. FEEDBACK-ERROR CONTROL y d (t) d dt A erfect inverse x(t) F ẏ(t) t o y(t) Figure 4.2: The feed-forward ath. ables (e.g. ositions and velocities) and x(t) is the vector of control variables (e.g. forces or torques). The vector y d (t) is the desired value of y(t), sulied by some external agent. The urose of the controller, of course, is to make sure y(t) follows y d (t). The controller has two main comonents. First, there is a feed-forward ath through A, which is a stateless trainable module (such as a CMAC) that imlements x a = A(y d, ẏ d ) (4.2) Second, there is a feedback ath through Q, which is a stateless fixed controller that imlements x q = Q(y d, y) (4.) To understand figure 4.1 the feed-forward and feedback aths are examined searately. The convention used is that control oeration starts at time t =. 4. The feed-forward ath Figure 4.2 shows the feed-forward ath of the controller. In this figure ( ) ẏ = F y, A(y d, ẏ d ) (4.4) Now consider this: if y() = y d () then it is ossible for A to outut a value of x which makes ẏ() = ẏ d (), assuming that such a value exists (because A is given the value of y that is also given to F ). Extending this argument forward in time, if A is a erfect inverse model of the dynamic system it can maintain y(t) = y d (t) indefinitely for t >. In other words, it can control the system erfectly by itself. For this to work the initial desired and actual states are required to be the same (y() = y d ()) because A has no way of knowing the value of y(t) other than the assumtion that y(t) always equals y d (t). If ẏ d is not available then the feed-forward controller will not work, because y d can t suly information about both the current and desired states of the dynamic system. A urely feed-forward controller is not ractical for two reasons. First, the controller A can never be made erfect, and second, any noise or disturbance in the system will cause y and y d to diverge. 4.4 The feedback ath Figure 4. shows the feedback ath of the controller, which imlements x q = Q(y d, y) (4.5)

4.5. FEED-FORWARD AND FEEDBACK TOGETHER 5 y d (t) x(t) F ẏ(t) t o y(t) Q Figure 4.: The feedback ath. The form of Q is arbitrary, although it is usually something like a simle roortional-lus-derivative controller. If Q is chosen well enough then y will tend towards y d over time. With only the feedback ath theoretical erfection can not be achieved (as with the feed-forward ath), but ractical, stable and robust control is ossible. 4.5 Feed-forward and feedback together To get the best of both tyes of control the feed-forward and feedback aths can be ut together as shown in figure 4.1. The feed-forward ath rovides instant control signals (x a ) to drive y along the trajectory y d. The feedback ath stabilizes the system by comensating for any imerfections in A or any disturbances. Q rovides control signals (x q ) to restore y to the set-oint y d when the two diverge. If the dynamic system is nonlinear then a linear Q can only be chosen otimally for one oerating oint. It will be assumed that F is such that a Q can be found that works for the entire state sace (albeit with varying degrees of efficiency). Systems which meet this constraint include robotic maniulators. This assumtion will be imortant when FBE training is considered. 4. Feedback-error training In general it is hard to determine the function A analytically for a dynamic system. Even if F is known exactly (an unlikely rosect for all but the simlest systems), comuting an inverse has many roblems. Thus A is made adative (i.e. arameterized by some weight vector w). In ractice A is usually some kind of neural network. The innovation of feedback-error control is that the training (error) signal e given to A is just the feedback signal x q (hence the name feedback-error ). Here is the reasoning behind this: if x q = then y will normally equal y d. In this case the system is erfectly controlled, so the desired error signal is also zero because no adjustment to A is necessary. If x q then the feedback controller is trying to correct the state y to match y d. For better control, this correction should have been alied by the feed-forward controller some time earlier. Thus x q is given as an error signal to A so that at this oint in the future A will aly a better signal x a and Q will have less to do (i.e. x q will be smaller). The signal x q has the interretation this is what x a should have been. This relies on the generalization roerties (local or global) of A. A is trained to minimize x q and thereby become the inverse of F. During training there is a transfer of control influence from the unskilled feedback controller loo (x q ) to the skilled feed-forward controller (x a ). Note that during training, x a is moved in the direction of e (i.e. the error signal is not a target). Now, if x a = A(y d, ẏ d, w) (4.)

54 CHAPTER 4. FEEDBACK-ERROR CONTROL t = y y d local generalization area y(t) (actual) y d (t) (desired) y(t) (new actual) y d (t) (desired) Figure 4.4: How FBE controller training works. where w is a vector of weights controlling A, and if it is assumed that e = de/dx a (where E is some hyothetical global scalar error) then the gradient descent training algorithm is ẇ = α [ ] d T xa e (4.7) d w Here α is a learning rate constant. Note that for the controller to be effective over a certain volume of the state sace, the states exerienced during training must cover that volume. 4.7 Why it works Why is the FBE training scheme successful? Figure 4.4 shows how a ossible state-sace trajectory for y changes during training. Assume that the controller is being reeatedly trained for the same trajectory y d (t), and that y() = y d (). Assume also that A imlements local generalization so that the training signal e(t) affects a small region around the inut oint in A (this assumtion is not strictly necessary but it will clarify the argument). During each training cycle the same trajectory (y d (t), ẏ d (t)) will be observed by A, so in effect a function of one time inut is being trained. In the early training cycles, y and y d will start off the same and then gradually diverge (though not too much because of the feedback ath). Now, if y y d near the start of the trajectory, a nonzero error signal x q will be associated with y d. When this same desired trajectory is observed in subsequent training runs, the imroved x a will be asserted by A, and the divergence of y from y d will be reduced. The argument roceeds by induction: as each art of the trajectory y(t) converges to y d (t), the conditions are set for the convergence of later arts of y(t). Thus the entire system will eventually converge. This argument can be extended to the general case where y d is free roaming and not going through identical cycles. 4.8 Some other FBE features 4.8.1 Ste changes in y d Ste changes in the desired state y d will not be handled well with this system, as then ẏ d would be infinite. Instead a trajectory with a smooth first derivative must be comuted and assed to the controller. 4.8.2 Unskilled feedback When the controller is fully trained, feedback has little effect on the control erformance. This is accetable in a erfect system, because then y always equals y d. But in a real system, noise and other disturbances will affect y so that y y d. This is a roblem, because the feed-forward controller does

4.9. AN EXAMPLE 55 y d (t) A x a (t) x(t) F ẏ(t) t o y(t) d dt e x q (t) Q t A (a) x a (t) x(t) F ẏ(t) t o y(t) e x q (t) y d (t) Q (b) Figure 4.5: (a) Standard feedback-error (FBE) control. (b) The inuts to A can be relaced by the time t if y d is always the same one-dimensional trajectory. not get any information about the true state of y, it just assumes y = y d. Thus any discreancies must be dealt with by the unskilled feedback controller. So, although the feed-forward controller will coe with the system nonlinearity, noise erformance and disturbance rejection are deendent on Q alone. 4.9 An examle It will now be demonstrated how FBE erforms on a simle second order system, which corresonds to a unit mass at osition being driven along a straight line by a force x: y = [ ], ẏ = [ ] = F (y, x) = Q is a roortional-lus-derivative feedback controller that tries to get the osition and velocity equal to their reference values: [ x ] (4.8) x q = k ( d ) + k v ( d ) (4.9) where k and k v are gain constants. The adative controller A is a CMAC neural network. First a simlification will be made to the standard FBE controller shown in figure 4.5a. If the FBE system is reeatedly trained for the same reference trajectory y d (t) then A s inut [y d (t), ẏ d (t)] will describe a one-dimensional ath arameterized by time t. Thus the inuts to A can be relaced by the time t to get effectively the same system, as shown in figure 4.5b. This simlification will hel the following discussion. Figure 4. shows the result of 8 training iterations with a continuous reference trajectory that has y d () = y(). The system arameters are given in the figure cation. Note that one iteration is a single run through the entire reference trajectory. The FBE controller is successful in this case the

5 CHAPTER 4. FEEDBACK-ERROR CONTROL d - - d Figure 4.: Performance of the FBE second order examle system with a continuous reference trajectory that has y d () = y(). Convergence has almost been achieved after 8 training iterations. A time ste Euler simulation was used with a ste size of.1. The controller A was a single inut CMAC with inut quantization levels and n a =. The CMAC was trained with a learning rate of.5. The feedback controller had k = k v =.5. osition and velocity have mostly converged to their references (although the velocity still has some unconverged oscillations about its reference). As has already been exlained, FBE does not achieve convergence along the entire trajectory at once. Instead, earlier times converge before later times. This is demonstrated in figure 4.7, which shows the same system as in figure 4. excet the grah was generated after only training iterations. The osition and velocity track their reference values for early times, but diverge for later times. 4. Arbitrary reference trajectory FBE convergence requires two things: 1. That y d () = y(), i.e. that the reference trajectory starts at the actual system starting state. In general, any oint where y is constrained to be different from y d will be incorrectly trained, i.e. there will be divergence instead of convergence. 2. That the reference trajectory is continuous, i.e. that its derivative is bounded above and below. This is obviously necessary if the reference derivative must be comuted. However, it is still a requirement in the modified FBE system where t is A s only inut, because of the revious requirement. If y d has a discontinuity then there will be a region after that where y will not be able to match the desired state. Figures 4.8 4. show what haens when these requirements are violated. Figure 4.8 shows the result of training iterations when the actual and reference starting states differ. The osition oscillates around the reference, and the oscillations increase with further training.

4.11. OUTPUT LIMITING 57 d - - d Figure 4.7: Partial training of the system described in figure 4. ( iterations). Convergence has not been achieved for later times. Figure 4.9 shows the result of training iterations with a discontinuous reference trajectory. The osition has mostly converged to its reference but the velocity is oscillating about its reference, esecially near the discontinuity. Figure 4. shows the same system after 7 training iterations. The velocity oscillations have increased and are causing noticeable oscillations in osition as well. This system continues to diverge with further training. Why are the reference trajectory requirements necessary to revent divergence? Consider the oints where y is somehow forced to be different from y d (these will be called mismatch oints ). This obviously haens when the starting reference and actual states differ. It also haens when there is a reference trajectory discontinuity, because no amount of control effort is sufficient to track such a ste change. At the mismatch oints the error e will always be nonzero. The error e integrates x a, so x a at these oints will get larger each time the trajectory asses through them. Now, x a acts to restore the actual state to the reference, but there is no braking force alied and so the reference is overshot. The training rocess does not rovide any limiting influences on x a, so the overshoot gets larger over time. The effects of the overshoot roagate out from the mismatch oints, resulting in oscillations. This effect will be called the over-training roblem. To clarify the roblem: When the actual state is close to the reference trajectory, the error value x q has the meaning this is what x a should have been. Elsewhere x q is just acting as a restoring force. These two different meanings of the error signal (corrective and restorative) conflict. 4.11 Outut limiting One way to solve the over-training roblem is to limit the magnitude of x a somehow. This can be done by training x a towards zero after each time ste (this is called outut limiting ). Thus there will be two cometing effects at the mismatch oint: the error training from e and the target training towards zero. Eventually an equilibrium will be reached, at which oint training will have converged. This is demonstrated in figure 4.11, which shows the system of figure 4. with a reference discon-

58 CHAPTER 4. FEEDBACK-ERROR CONTROL d - - Figure 4.8: The system of figure 4. when the actual and reference starting states differ ( iterations). d - - d Figure 4.9: The system of figure 4. when the reference trajectory has a discontinuity at t = ( iterations).

4.11. OUTPUT LIMITING 59 d - - d Figure 4.: The system of figure 4. when the reference trajectory has a discontinuity at t = (7 iterations). d - - d Figure 4.11: The system of figure 4. with a reference discontinuity, an incorrect starting state, and outut limiting with a learning rate of.2 (5 iterations).

CHAPTER 4. FEEDBACK-ERROR CONTROL tinuity and an incorrect starting state. Outut limiting has been used with a learning rate of.2. The grah was generated after 5 iterations, at which oint convergence had been achieved. The osition tracks its reference reasonably well, excet around the mismatch oints. The rate at which the osition reacquires the reference after a reference discontinuity is deendent on the outut limiting learning rate. 4.12 Conclusion The feedback-error (FBE) control scheme has been described, and it has been shown that its utility is limited by the fact that it requires (1) a known starting state and (2) a continuous reference trajectory to be generated by an external agent. These roblems can be artially solved by outut limiting, although this is unsatisfactory. FBE is a starting oint for the develoment of the FOX controller: the next chater will derive FOX from a first-rinciles analysis of the roblem that FBE is trying to solve. It will be shown that the FOX algorithm can be regarded as a generalization of FBE. Exeriments on FOX will subsequently show that FBE has some hidden roblems: in articular, it is only caable of controlling systems that have a zero-order eligibility model.