LQ Control of a Two Wheeled Inverted Pendulum Process

Uppsala University Information Technology Dept. of Systems and Control KN,HN,FS 2000-10 Last rev. September 12, 2017 by HR Reglerteknik II Instruction to the laboratory work LQ Control of a Two Wheeled Inverted Pendulum Process Preparation exercises: All exercises in Section 4. Reading instructions: Glad-Ljung Swedish version: Chapters 5.65.7, 8.5, 9.19.4. English version: Chapters 5.65.7, 8.4, 9.19.4. Name Assistant's comments Program Year of reg. Date Passed prep. ex. Passed lab. Sign Sign

Contents 1 Introduction 1 2 Short theoretical background 1 2.1 LQ................................. 1 2.1.1 The standard LQ problem................ 1 2.1.2 Integral action in LQ control.............. 2 2.1.3 Reference signals in LQ control............. 3 2.2 Observer.............................. 3 2.3 LQG................................ 4 2.4 Discretization........................... 5 3 The two wheeled inverted pendulum process 5 3.1 System description........................ 5 3.2 Physical modeling........................ 5 3.3 Linearization........................... 7 4 Preparation exercises 9 5 Laboratory work 12 5.1 Observer.............................. 12 5.2 Tuning the LQG controller.................... 13 5.3 Integral action.......................... 15 6 Appendix 17

1 Introduction This laboratory work is based on Computer Exercise 3. Therefore it is advisable to have Computer Exercise 3 fresh in mind before starting this laboratory work. The goal of this laboratory work is to illustrate how LQG design can be used for controller synthesis for nontrivial systems. It is emphasized that the controller design is an iterative process, and that simulations and tests on the real system are of vital importance. The system examined in this laboratory work is a two wheeled inverted pendulum (TWIP) process, which is described in Section 3. The system is multivariable, nonlinear and unstable, which makes it nontrivial to control. In Sec. 2 a brief summary of the theory employed is given. 2 Short theoretical background In this section the LQ problem, and how to incorporate integral action and reference signal inputs are discussed. Observers, especially the Kalman lter is also briey discussed. A more thorough presentation of LQG is found in Chapter 9 in Glad-Ljung. The standard model (as in Equation (9.4) in Glad-Ljung) of the system is ẋ = Ax + Bu + Nv 1, (1) z = Mx, (2) y = Cx + v 2, (3) where z is the performance signal, y is the measured output signal, and v 1 and v 2 are white noise with intensities Ev 1 v T 1 = R 1, Ev 2 v T 2 = R 2, Ev 1 v T 2 = R 12 (4) For simplicity it is assumed that v 1 and v 2 are independent, so that R 12 = 0. 2.1 LQ 2.1.1 The standard LQ problem The standard LQG problem is to minimize a quadratic criterion for a linear system. That is, to nd the controller that, when applied to the system, minimizes the criterion. The control error, i.e., the discrepancy between the performance signal and the reference signal, is denoted e(t) = z(t) r(t). The aim of the control is 1

to keep e(t) as small as possible, without using an unreasonably large input signal. The standard formulation of LQ treats the regulation problem 1, where r 0. This means that e = z. The criterion to be minimized then is 1 V = E lim T T T 0 ( z(t) T Q 1 z(t) + u(t) T Q 2 u(t) ) dt, (5) where Q 1 and Q 2 are symmetric matrices. Q 2 is positive denite and Q 1 is positive semidenite. The behavior of the regulator can be adjusted by choosing Q 1 and Q 2. By choosing a large Q 2 relative Q 1, a high penalty on the input signal in criterion (5) will be enforced, giving a controller that uses a small input signal, but to the price of a slower regulation of the control error, and vice versa for a large Q 1 relative Q 2. 2.1.2 Integral action in LQ control A drawback with state feedback is that no inherent integral action can be obtained. If integral action is desired (this is, according to the basic course in automatic control, often desired) it must be enforced into the problem formulation. There are several ways of doing this (see Section 9 in Glad- Ljung). A direct, and perhaps a bit ad hoc approach to introduce integral action into the control loop, is to add a term to the criterion (5). This term should penalize the integrated control error. Let ɛ(t) = t 0 e(τ)dτ, then the added term would be ɛ(t) T Q ɛ ɛ(t). To deal with this within the LQG framework, ɛ is introduced as an extra, ctitious output signal of the system. This implies that ɛ also must be included as additional states in the state space description: [ẋ ] [ ] [ A 0 x, ] [ ] [ ] [ ] B N 0 v1 = + u +, (6) ɛ M 0 ɛ 0 0 I r [ ] [ ] [ ] z M 0 x =, (7) ɛ 0 I ɛ y = [ ] C + v 2. (8) 0 ] [ x ɛ In (6), the reference signal is entering the system together with the process noise v 1. This is only to emphasize that from the controllers point of view, r may be regarded as a disturbance. The modied criterion then becomes 1 T ( [z(t) V = E lim T ɛ(t) T ] [ Q 1 0 T T 0 0 Q ɛ ] [ z(t) ɛ(t) ] + u(t) T Q 2 u(t) ) dt 1 For the regulation problem the control objective is to outperform disturbances that causes the performance signal to deviate from zero. 2

[ ] The LQ design is thus performed with Q Q1 0 1 = and Q 0 Q 2 as the penalty ɛ matrices, using the augmented system (6)-(8). 2.1.3 Reference signals in LQ control When treating the servo problem, when r 0, the criterion should be appropriately modied. However, here r will be conned to be piece-wise constant, like set point changes. In this case LQ will give a feedback gain that is the same as for the regulation problem (see Theorem 9.2 in Glad-Ljung). 2.2 Observer To apply state feedback requires that the full state vector is available. If the state vector is directly measurable, i.e. y(t) = x(t), the feedback can be applied as shown in Fig. 1a, where L denotes the feedback gain matrix obtained from the LQ design. However if the state is not directly measurable an observer, G obs (s) can be used to reconstruct the state from the measured output and input signals to the system, y(t), u(t) respectively. Note that this requires that the system is observable. The observer takes as input the measured input and output signals, y(t) and u(t), of the system, and gives as output an estimate, ˆx(t), of the state x(t). From the estimated state, feedback can be applied as shown in Fig. 1b. There are many dierent ways in which the observer can be designed. If the system is linear and the process and measurement noise, v 1 and v 2 are white Gaussian noise, the optimal observer is the Kalman lter (see Section 5.7 in Glad-Ljung), in the sense that it minimizes the expected square estimation error Π x = E[(x(t) ˆx(t)) T (x(t) ˆx(t))]. (9) However, in the rst task of the laboration a more primitive, non model based, observer will be used. It reconstructs the state as linear combinations of the integrated and dierentiated output signals of the system. Consider the transfer functions G d (s) = s sτ + 1, G i (s) = 1/s. The transfer function G d (s) will act as a lter that dierentiates and low pass lters the input signal. The low pass ltering is used to reduce the 3

u(t) -L G(s) x(t) y(t) u(t) G(s) ^x(t) -L G obs(s) y(t) (a) Pure state feedback (b) Feedback form estimated states Figure 1 high frequency noise in the signal that will otherwise be amplied by the dierentiation, as dierentiation acts as a high pass lter. Setting τ = 0 gives G d (s) = s and corresponds to a pure dierentiation. The transfer function G i (s) will integrate the input signal. By using G d (s) and G i (s) the components of y(t) can be dierentiated and integrated. From the integrated and dierentiated signals the state x(t) can be reconstructed. An observer, G di (s) using this method to reconstruct the state for the TWIP system is given in Appendix. 2.3 LQG The LQG problem concerns uncertain linear systems disturbed by additive white Gaussian noise, having incomplete state information (i.e. the full state vector is not available for feedback). The optimal control law has the familiar observer based state feedback structure (see Theorems 9.1 and 9.2 in Glad Ljung) u(t) = Lˆx(t) + L r r(t), (10) where ˆx(t) is the optimal estimate of x(t), obtained by the Kalman lter for the system (1)(3). The gain L only depends on Q 1 and Q 2, while the Kalman lter only depends on R 1 and R 2 (and R 12 if nonzero), this is known as the separation principle. Note that the LQG-controller is optimal only for the situation when the system is exactly described by (1)(3), and when v 1 and v 2 are white Gaussian noises. This is a very idealized situation. In almost all practical cases one or more of the conditions for the LQG framework are not fullled. In this case study the system is not even linear. Hence, it can not be expected that the obtained controller will be optimal. Still, for many systems the LQG approach is useful due to its simplicity and to its ability to generate controllers that behave rather well, also in non-idealized situations. In practice R 1 and R 2 are unknown and may, together with Q 1 and Q 2, be regarded as design variables. There are some guidelines in how to choose these design matrices. 4

For simplicity the discussion is conned to diagonal choices of Q 1 and Q 2. Each element in these matrices can be interpreted as a penalty of the corresponding signal component. The rule of thumb then is that this signal component will be small in the closed loop simulation if it is penalized considerable. That is, if the corresponding diagonal entry in the diagonal matrices are chosen large. Still, it is stressed that this is only a rule of thumb and that the design procedure should be regarded as an iterative procedure supported by closed loop simulations. 2.4 Discretization A computer works in discrete time, and therefore eects caused by this must be incorporated into the synthesis procedure. This problem will be disregarded here the controller synthesis in the laboratory work is performed in continuous time, yielding continuous time controllers. For the implementation, these are approximated by sampling controllers 2, obtained with the Matlab function c2d. The sampling period for the TWIP process is T = 0.005 seconds. 3 The two wheeled inverted pendulum process 3.1 System description The system is a robot balancing on two wheels, as shown in Fig. 2. It is built from pieces of the LEGO Mindstorms NXT kit. The main component is the brick which is a small embedded computer to which sensors and actuators can be connected. The actuators of the system are two DC motors to which the wheels are attached. The motors have built in encoders, able to measure the rotated shaft angle with a resolution of 1. A gyroscope measuring the angular velocity with a resolution of 1 /s, is mounted at the top of the robot. 3.2 Physical modeling To facilitate any kind of analysis of the system, a model is needed. The model should describe the behavior of the system in an acceptable way. Preferably it should be a mathematical model. Such a model can be obtained in many dierent ways. For instance one can get an empirical model by performing experiments on the system. Another way is to use a priori knowledge about the system. For a mechanical system, like the TWIP process, the laws of 2 See Chapter 4 in Glad-Ljung for more thorough comments on sampling and sampled systems. 5

ψ Left wheel ϕ 0 ξ. θ r ϕ ξ (a) Model from the side (b) Model from above. Right wheel Figure 2: Model of the two wheeled inverted pendulum process. In subgure 2a φ, ξ and θ r denotes the pitch angle, traveled distance and the rotated shaft angle for the right wheel respectively. In subgure 2b φ denotes the yaw angle relative some reference angle φ 0 and ξ denotes the translational speed. physics serve as a priori knowledge. For this system classical mechanics will be exploited to obtain a mathematical model. The derivation of this model will not be shown here, since it involves some rather tedious algebraic maneuvers. Instead the presentation is conned to the principal elements of the derivation. To begin with, the degrees of freedom of the system should be determined. Then some generalized coordinates, reecting the degrees of freedom, should be chosen. The balancing robot has three degrees of freedom represented by ξ, φ and ψ as shown in Fig 2, which are also the variables chosen as the generalized coordinates. Let q = [ ξ φ ψ ] T denote the generalized coordinates. To simplify the modeling, the friction is assumed to be negligible, and is thus omitted. The mathematical model is derived with the aid of Lagrange's equation. The Lagrangian for the system is L(q, q) = T V, where T and V are the total kinetic and potential energies of the system, respectively, expressed in the generalized coordinates q and their time derivatives q. Lagrange's equation then is d L dt q L = F, (11) q where F is the generalized external force acting on the system. Despite its very compact form in (11), Lagrange's equation is a system of second 6

order nonlinear dierential equations one dierential equation for each generalized coordinate. For the balancing robot process there are hence three equations. Both sides in (11) are thus vector valued functions. The entries in F represent the generalized external forces acting in the directions of the generalized coordinates. For instance, the generalized force in the ψ direction is the external torque exerted around the axis going through the center of the wheels. These torques are caused by the DC motors, gravity and disturbances such as e.g. a hit on the top of the robot. Now a mathematical model is obtained, but for the purpose here, it would be more suitable to have the model as a system of rst order dierential equations, i.e., a state space description. A natural choice of state vector is the generalized coordinates and their time derivatives. The state vector is thus ξ x = [ ] q = q φ ψ ξ. (12) φ ψ In order to get the state space description, an expression for ẋ = dx dt must be found, an expression that only involves x and F. The simple part is dq dt = q, d which is a part of the state vector. However, dt q = q must be solved for in (11). This yields the state space description ẋ = f(x, F ), which has the generalized external forces as input signal. It would be preferable, though, to have the voltage over the DC-motors as input signals, rather than the forces they cause. This must somehow be included in the model. Fortunately, this can be interpreted as if state feedback is applied. Thus, no new state variables are needed. The obtained nonlinear state space model is then ẋ = f(x, u), (13) [ ] ul where x is the state vector as in (12), u = is the input signal available for control. u l and u r are the voltage applied to the left and right DC motor respectively. u r 3.3 Linearization In order to be able to control the system with an LQ-controller, the dynamics are linearized around the origin (an equilibrium point). The origin is dened 7

as x 0 = [ 0 0 0 0 0 0 ] T. That is, the robot is standing still at a vertical position. The state space model is of the form: ẋ = Ax + Bu + Nw, z = Mx, y = Cx + v, with the state vector x, measurement vector y, performance signal z and the controllable input signal u given by where ξ - traveled distance φ - yaw angle ψ - pitch angle x = [ ξ φ ψ ξ φ ψ ] T, y = [ θ l θ r ψ ] T, z = [ φ ξ ] T, u = [ u l u r ] T, u l - voltage applied to left motor u r - voltage applied to right motor θ l - left shaft angle (measured by a tachometer in the motor assembly) θ r - right shaft angle (measured by a tachometer in the motor assembly) The process noise is modeled as a disturbance on the input signal, which means that N = B. The matrices A, B are given by A = f x, B = f u, evaluated at x = 0, u = 0. Numerically the matrices are given by 0 0 0 1 0 0 0 0 0 0 1 0 A = 0 0 0 0 0 1 0 0 1.9 28.1 0 1.1, 0 0 0 0 30.5 0 0 0 88.1 295.4 0 11.8 8

and B = 0 0 0 0 0 0 1.1 1.1 15.5 15.5 11.9 11.9, 25 2 1 0 0 0 C = 25 2 1 0 0 0, 0 0 0 0 0 1 As the performance signal is given by z = [ φ ξ ] T, it follows that M = [ ] 0 1 0 0 0 0. 0 0 0 1 0 0 The poles of the linearized system are given by p = [ 0 40.9 7.4 6.4 0 30.5 ]. When working with linearized models, it is very important to remember that the linear model is an approximation. It may be viewed as a rst order Taylor expansion of the nonlinear model, around some operation point. The further away from this point, the origin in this case, the less accurate the linear description is. Thus the linear model should be regarded as a local model. 4 Preparation exercises The feedback gain L in LQG control only depends on the matrices Q 1 and Q 2 in the criterion (5). These matrices are design parameters in the LQG design. If they are chosen as Q 1 = k Q 1 and Q 2 = k Q 2, for some constant matrices Q 1 and Q 2, and some positive scalar k, the feedback gain L will be the same, regardless of how k is chosen. 9

Question 4.1: Use the criterion (5) to motivate why this is the case. Answer: Consider a rst order SISO system Y (s) = 1 U(s) (14) s α with the pole located at α, and assume that we want to determine a state feedback control law u(t) = L x(t) + L r r(t), such that the output y(t) of the system follows the reference signal r(t). Question 4.2: (a) Write the system (14) on state space form. (b) Write the closed loop system in state space form. Answer: The next task is to compute the feedback gain L using LQ-design and study how the value of L and the closed loop system pole depends on the quotient ρ = Q 2 /Q 1. To change the quotient, use the xed value Q 2 = 1 but let Q 1 be variable. 10

Question 4.3: Note: This preparation exercise is related to Task 5.3. (a) Determine L by nding the positive denite solution to the continuous time Riccati equation (see Thm. 9.1 in Glad-Ljung). In particular, determine lim ρ in the following two cases: (i) Stable open loop system, α < 0 (ii) Unstable open loop system, α 0 (b) Study how the pole location of the closed loop system varies with varying values of ρ, in the two cases (i) and (ii). (c) In case (i) and (ii) in task (a), dierent values for L will be found. Try to give an intuitive reasoning why this is. Answer: 11

5 Laboratory work In the laboratory work LQG will be used to design controllers for the balancing robot process. The controllers will be analyzed, simulated and tested on the real system. In the separate sheet "Instructions to run the two wheeled inverted pendulum process", instructions of how to use the lab equipment are given. Read this carefully and then proceed with the tasks below. 5.1 Observer The rst laboratory task is to try a controller that uses the observer G di (s) to obtain a state estimate ˆx, i.e. the block G obs (s) in Fig. 1b will be given by G di (s) (see Sec. 2.2). Task 5.1 Use lq_twip to generate a controller. Set Q 1 = I, Q 2 = ρi and start with ρ = 1. Try to increase/decrease the value of ρ and see how the system reacts in terms of fastness and noise sensitivity. To see any signicant change of behaviour the value might have to be changed by a factor of 10. Simulate the controller and try it on the real system, using the GUI. Finally, set ρ = 0.01 (and try it, if you have not done so already ) and proceed to the next task. Note: Due to disturbances the robot can tend to drift slightly backwards or forwards. This problem will be resolved at a later stage of the laboration and do not have to be considered for now. The controller obtained by using G di (s) gives noisy estimates that results in a rather poor control performance. In the next task the observer G di (s) is replaced by a Kalman lter to improve the quality of the estimated state vector ˆx. Task 5.2 Generate a controller using lqg_twip. This controller uses a Kalman lter to estimate the state. As the main focus is on the control in this laboration the pre-specied values R 1 = I and R 2 = 10 5 I, for the Kalman lter, are given. Use the values Q 1 = I, Q 2 = ρi, with ρ = 0.01 as in the previous task. Simulate the controller and try it on the real system. 12

Question 5.1: Comment on the behavior to the controller compared to the controller used in task 5.1 Answer: 5.2 Tuning the LQG controller Task 5.3 In preparation exercise 4.3 it was found that the closed loop system poles approaches some xed values asymptotically as ρ. For the TWIP process the yaw angle φ is associated with a marginally stable pole at α 1 = 0. The translational velocity ξ is associated with an unstable pole at α 2 7.4. Create and execute controllers with increasing values of ρ and see how the control of the yaw angle and the translational velocity changes. Mainly study the changes by simulation, but also try to execute some controllers on the real system to study the behaviour. 13

Question 5.2: Comment on the results. Compare to your ndings in preparation exercise 4.3. Answer: The next task is to ne tune the controller by changing individual elements of Q 1. Go back to using ρ = 0.01. The diagonal elements of Q 2 will penalize the input voltage to the left and right motor respectively, which should naturally have the same penalty. As it is only the ratio between the elements in Q 1 and Q 2 that aects the controller performance (as found in preparation exercise 4) it suces to make changes to the elements of Q 1. [ ] q1 0 Task 5.4 Let Q 1 =. Try to adjust values of q 0 q 1, q 2 to achieve a 2 controller that makes the system easy to steer using the reference signals available. Recall that z = [ φ ξ ], which means that q 1, q 2 will penalize the control error for the yaw angle and the velocity respectively. Change the values of q 1 and q 2 one at a time and observe the behavior of the system. Especially study the magnitude of the input signal, and the fastness of the response to reference signals. Note: If the penalty q 2 becomes too large the system will get a shaky behavior, depending on that the bandwidth of the system becomes too high for the used sampling rate. This is a phenomenon that is not discussed in this course. If this occurs try lowering the value of q 2. 14

Question 5.3: Comment on the behavior for dierent values on the penalty in terms of input signal magnitude and fastness of the response of the system. Answer: Contact the instructor when a suitable controller has been found. 5.3 Integral action Task 5.5 The controller obtained by lqg_twip is designed to, in theory, give a static gain equal to one from the reference signal to the output. Use the controller from the previous task. Make a step response in the velocity of 0.2 m/s on the real system and see if the static gain equals one for this particular reference value. Note that by tapping the up arrow once, a step of 0.1 m/s will be applied in the velocity reference signal. Question 5.4: Is the static gain equal to one? Why is this? Answer: The standard approach to eliminate control errors is to introduce integral action in the control loop. This can be done in several ways. Here the approach in Section 2.1.2 will be used. Task 5.6 The function lqi_twip gives a LQG controller with integral action. Set Q ɛ = I and use the same Q 1, Q 2, R 1 and R 2 as obtained in task 5.4. Try the controller on the real system and study the same step response 15

as in the previous task, 0.2 m/s. Question 5.5: What is the static gain now? Answer: Also note that using integral action makes the robot stay in the same spot when zero reference in the velocity is applied, as opposed to the controller without integral action that tends to slowly drift away. [ ] qɛ1 0 Task 5.7 Let Q ɛ =. Try increasing/decreasing the values of q 0 q ɛ1 ɛ2 and q ɛ2 one by one and observe the behaviour of the system by simulation and execution on the real system. Question 5.6: Comment on the behaviour for small and large values of Q ɛ in terms of how fast the static error vanishes and how oscillatory the system is. Answer: Task 5.8 Try to ne tune to the controller such that the system is as easy as possible to control using the available reference signals. When a suitable controller have been obtained, show the nal result to the lab assistant, and discuss the ndings of tasks performed during the lab. 16

6 Appendix Here the details of how the observer G di (s) given in Sec. 2.2 is constructed are described. First note the following relation θ l 1 0 0 θ r 0 1 0 ψ θ l = 0 0 G i (p) G d (p) 0 0 θ ṙ 0 G d (p) 0 ψ 0 0 1 } {{ } G t(p) Now as [ θ l θ r ψ ] T = H [ ξ φ ψ ] T where (see the matrix C) it follows that ξ φ ψ ξ = φ ψ [ H 1 0 0 H 1 25 2 1 H = 25 2 1 0 0 1 θ l θ ] r ψ θ l = θ ṙ ψ θ l [ H 1 0 ] 0 H 1 θ ṙ ψ G t (p) [ ] 1 H 0 G obs (p) = G di (s) = G 0 H t (p) can thus be used as an observer, that takes as input the measured signal y and give as output an estimate, ˆx, of the state vector x. θ l θ ṙ ψ 17