Linear Quadratic Optimal Control Topics

Size: px

Start display at page:

Download "Linear Quadratic Optimal Control Topics"

Blaze Floyd
6 years ago
Views:

1 Linear Quadratic Optimal Control Topics Finite time LQR problem for time varying systems Open loop solution via Lagrange multiplier Closed loop solution Dynamic programming (DP) principle Cost-to-go function computed from DP Infinite time LQ problem for LTI systems Convergence of P(t, t f ) Closed loop stability P as solution of ARE via Hamiltonian matrix Selection of Q, R and S Robustness (Return difference inequality) 1 Symmetric root-locus and cheap control 2 Some extensions Discrete time LQ Pole-placement within a pre-defined region Frequency shaping 1 Not covered in Spring Not covered in Spring 2008 M.E. University of Minnesota 212

2 Linear Quadratic (LQR) Optimal Control - Motivation Pole-placement approach allows ones to choose where to place the poles SI feedback gain unique MI feedback gain non-unique (e.g. need Hautus- Keyman Lemma or eigenvector placement) Main issue: where should we place the poles??? Should consider trade-off between performance, robustness and control effort. LQ technique tries to do some trade-off without specifying desired poles locations M.E. University of Minnesota 213

3 Finite Time Linear Quadratic Optimal Regulator Problem m input, u R m, n state system with x R n : ẋ = A(t)x + B(t)u; x(0) = x 0. (27) Find open loop control u(τ), τ [t 0, t f ] such that the following objective function is minimized: J(u,x 0, t 0 ) = 1 2 xt (t f )Sx(t f )+ 1 2 tf t 0 [ x T (t)q(t)x(t) + u T (t)r(t)u(t) ] dt (28) Q(t) = Q T (t) and S are symmetric positive semidefinite n n matrices R(t) = R T (t) is a symmetric positive definite m m matrix. Notice that x 0, t 0, and t f are fixed and given data. M.E. University of Minnesota 214

4 The control goal generally is to keep x(t) close to 0 3, especially, at the final time t f, using little control effort u. To wit, notice in (28) x T (t)q(t)x(t) penalizes the transient state deviation, x(t f ) T Sx(t f ) penalizes the finite state u T (t)r(t)u(t) penalizes control effort. Output regulation: If y = C(t)x is the output, we can define: Q(t) = C T (t)w(t)c(t) where W(t) is a symmetric, positive definite output weighting matrix. 3 LQ can be modified for the trajectory tracking case M.E. University of Minnesota 215

5 General Finite Time Optimal Control Plant: ẋ = f(x,u, t); x(t 0 ) = x 0 given. Time interval: t [t 0, t f ]. Cost function to be minimized: J(u( ), x 0 ) = φ(x(t f )) + tf t 0 L(x(t), u(t), t)dt First term is the final cost and the second term is the running cost. Problem: Find u(t), t [t 0, t f ] such that J(x 0, u( )) is minimized, subject x(t) satisfying the plant equation x(t 0 ) = x 0 given. Solution is to convert constrained optimal control into unconstrained optimal control using Lagrange multiplier λ(t) R n : J(u, x 0 ) = J(u( ), x 0 ) + tf t 0 λ T (t)[f(x,u,t) ẋ]dt. M.E. University of Minnesota 216

6 Note that d dt (λt (t)ẋ(t)) = λ T (t)x(t) + λ T (t)ẋ. So tf t 0 λ T ẋdt = λ T (t f )ẋ(t f ) λ T (t 0 )ẋ(t 0 ) tf t 0 λt xdt. Let us define the so called Hamiltonian function H(x, u, t) := L(x, u,t) + λ T (t)f(x,u,t). Necessary condition for optimality: Variation of the modified cost δ J with respect to all feasible variations δx(t) and δu(t) and δλ(t) should vanish. Using integration by parts: λẋ = λx λx, J = φ(x(t f )) λ T (t f )x(t f ) + λ T (t 0 )x(t 0 ) + tf t 0 H(x(t), u(t), t) + λ(t)x(t)] dt M.E. University of Minnesota 217

7 δ J = [φ x λ T ]δx(t f ) + λ T (t 0 )δx(t 0 ) + + tf t 0 [H x + λ T ]δx + H u δudt tf t 0 δλ T [f(x(t), u(t), t) ẋ]dt Since x(t 0 ) = x 0 is fixed, δx(t 0 ) = 0. Otherwise, other variations δx(t), δu(t) or δλ(t) are all feasible. Hence, λ = H x = L f λt x x (29) ẋ = f(x, u,t) (30) H u = L u λt f u = 0 (31) λ T (t f ) = φ x (x(t f)) (32) x(t 0 ) = x 0. (33) This is a set of 2n differential equations (in x and λ) with split boundary conditions at t 0 and t f : x(t 0 ) = x 0 and λ T (t f ) = φ x (x(t f )). M.E. University of Minnesota 218

8 Finite Time LQ Regulator Solution Open loop formulation: With L(x, u, t) = x T (t)q(t)x(t) + u T (t)r(t)u(t) φ(x(t f )) = x T (t f )Sx(t f ) f(x, u, t) = A(t)x + B(t)u Let λ(t) R n be the Lagrange multiplier. Using the above definitions in Eqs.(29)-(33), the optimal control is given by (see (31)): u o (t) = R 1 B T (t)λ(t) where λ(t) and x(t) satisfy the Hamilton-Jacobi equation ((29)-(30)): (ẋ ) ( )( A(t) B(t)R = 1 B T (t) x λ Q(t) A T (t) λ) }{{} Hamiltonian Matrix - H(t) with boundary conditions given by (see (32)-(33)): x(t 0 ) = x 0 ; λ(t f ) = Sx(t f ). (34) M.E. University of Minnesota 219

9 Boundary conditions specified at initial time t 0 and final time t f (two point boundary value problem). In general, these are difficult to solve requires iterative methods such as shooting method. Optimal control is open loop. It is computed by first computing λ(t) for all t [t 0, t f ] and then applying u o (t) = R 1 B T (t)λ(t). Open loop control is not robust to disturbances or uncertainties. M.E. University of Minnesota 220

10 Closed loop control solution Consider X 1 (t) R n n and X 2 (t) R n n satisfying the Hamilton-Jacobi equation: ) ( ) (Ẋ1 A(t) B(t)R = 1 B T (t) Ẋ 2 Q(T) A T (t) }{{} Hamiltonian Matrix - H ( X1 X 2 ) with X 1 (t f ) non-singular (e.g. X 1 (t f ) = I n n ), and X 2 (t f ) = SX 1 (t f ). This requires solving the 2n n differential equations backward in time. Claim: Assuming that X 1 (t) is invertible for all t [t 0, t f ]. Then, we can express x(t) and λ(t) satisfying the Hamilton-Jacobi equation by: ( ) x(t) λ(t) = ( ) X1 (t) v X 2 (t) for some constant v R n. Moreover, λ(t) = [X 2 (t)x 1 1 (t)]x(t) M.E. University of Minnesota 221

11 This can be shown by direct substitution, with v = X1 1 (t 0)x 0, that x(t) and λ(t) satisfy the Hamilton- Jacobi equation, as well as the boundary conditions. This result implies that the optimal control can be expressed as closed loop state-feedback u o (t) = R 1 B T (t)λ(t) = R 1 B T (t)p(t)x(t) where P(t) := X 2 (t)x 1 1 (t) Rn n. Differentiating P(t) := X 2 (t)x1 1 (t) and using Hamilton-Jacobi equation (for X 1 (t) and X 2 (t)), we find that P(t) satisfies the continuous time Riccati differential equation (CTRDE): P(t) = A T (t)p(t) P(t)A(t) + P(t)B(t)R 1 (t)b T (t)p(t) Q(t); (35) with boundary condition P(t f ) = S. P(t) is symmetric and positive semi-definite. It is symmetric because S is symmetric. To show that P(t) is positive semi-definite, we first interpret P(t) in M.E. University of Minnesota 222

12 terms of the performance index. The claim is that the minimum cost is: J(u o, x 0, t 0 ) = 1 2 xt 0 P(t 0 )x 0. where u o (t) is the optimal control. From the optimal control, u o (t) = R 1 (t)b T (t)p(t)x(t), and the form of the quadratic cost function Eq.(28), we know 4 that J o (x 0, t 0 ) = J(u o, x 0, t 0 ) = 1 2 xt 0 P(t 0 )x 0. for some positive semi-definite matrix P(t 0 ). To show that P(t0 ) = P(t 0 ), we need to understand the Dynamic Programming Principle. 4 Note that the closed loop system is linear so that x(t) = Φ(t, t0 )x 0 M.E. University of Minnesota 223

13 Dynamic Programming (DP) Principle Consider a shortest path problem in which we need to traverse a network from state i 0 and to reach state 5 with minimal cost. Cost to traverse an arc from i j is a ij > 0. Cost to stay is a ii = 0 for all i. Since there are only 4 non-destination states, state 5 can be reached in at most N = 4 steps. Total cost is sum of the cost incurred, i.e. if the (non-optimal) control policy π is , then J(π) = a 22 + a 23 + a 34 + a 45 Goal is to find the policy that minimizes J. M.E. University of Minnesota 224

14 As an optimization problem the space of 4 step policy has a cardinality of 5 4 = 625 DP algorithm: We start from the end stage (N = 4), i.e. you need to reach the state 5 in one step. Suppose that you are in state i, the cost to reach state 5 is min{a i5 } = a i5 The optimal and only choice for the next state, if currently at state i, is u (i,n) = 5. The optimal cost-to-go is J (i,n) = a i5. Node u J (i,n) Consider the N 1st stage, and you are in state i. We can have the policy π : i j 5. Since the minimum cost to reach state 5 from state j is J (j, N), the optimal control policy is: min j (a ij + J (j, N)) = min{a i1 + a 15, a i2 + a 25,..., a i5 + a 55 } M.E. University of Minnesota 225

15 For i = 4 (for instance), j a 4j a 4j + J (j, N) Thus, the j the optimizes this is: j = 4 (stay put) so that u (4, N 1) = 4 and J (4, N 1) = 3. Doing this for each i, we have at stage N 1, Optimal policy: u (i,n 1) = arg min j (a ij + J (j, N)) Optimal cost-to-go: J (i,n 1) = min j (a ij + J (j, N)) Node u (i,n 1) J (i,n 1) If we are at the N 2nd stage, and you are in state i, M.E. University of Minnesota 226

16 Optimal policy: u (i,n 2) = arg min j (a ij + J (j, N 1)) Optimal cost-to-go: J (i,n 2) = min j (a ij + J (j, N 1)) Node u (i,n 2) J (i,n 2) Notice that from state 2, the 3 step policy has a lower cost of 4.5 than the 2 step policy with a cost of 5.5. Repeating the propagation procedure for the optimal policy and optimal cost-toabove until N = 1. Then the optimal policy is u (i,1) and the minimum cost is J (i,1). M.E. University of Minnesota 227

17 The optimal sequence starting at i 0 is: i 0 u (i 0, 1) u (u (i 0,1),2) u (u (u (i 0,1), 2),3) 5 Remarks: At each stage k, the optimal policy u (i,k) is a state feedback policy. i.e. it determines what to do depending on the state that you are in. Policy and optimal cost-to-go are computed backwards in time (stage) At each stage, the optimization is done on the space of intermediate states, which has a cardinality of 5. The large optimization problem with cardinality of 5 4 has been reduced to 4 simpler optimization problem with cardinality of 5 each. The tail end of the optimal sequence is optimal - this is the Dynamic Programming Principle. i.e. if M.E. University of Minnesota 228

18 the optimal 4 step sequence π 4 starting at i 0 is: i 0 u (i 0, 1) u (u (i 0,1),2) u (u (u (i 0,1), 2),3) 5 then the sub-sequence π 2 u (i 0, 1) u (u (i 0,1),2) u (u (u (i 0,1),2),3) 5 is the optimal 3 step sequence starting at u (i 0, 1). This is so because if π 3 is another 3 step sequence starting at u (i 0,1) with a strictly lower cost than π 3, then the 4-step sequence i 0 π 3 will also have a lower cost than π 4 = i 0 π 3 which is assumed to be optimal. M.E. University of Minnesota 229

19 Dynamic Programming (DP) Principle Continuous time System: ẋ = f(x(t), u(t), t), x(t 0 ) = x 0, Cost index: J(u( ), t 0 ) = tf t 0 L(x(t), u(t), t)dt + φ(x(t f )). (36) Suppose that u o (t), t [t 0, t f ] minimizes (36) subject to x(t 0 ) = x 0 and x o (t) is the associated state trajectory. Let the minimum cost achieved using u o (t) be: J o (x 0, t 0 ) = argmin u(τ),τ [t0,t f ]J(u( ), t 0 ) Then, for any t s.t. t 0 t + t t f, the restriction of the control u o (τ) to τ [t + t, t f ] minimizes J(u( ), t 0 + t) = tf t 0 + t L(x(t), u(t), t)dt+φ(x(t f )). subject to initial condition x(t 0 + t) = x o (t 0 + t). i.e. u o (τ) is optimal over the sub-interval. M.E. University of Minnesota 230

20 Typical application of DP Solve the optimal control problem for sub-interval [t 1, t f ] with arbitrary initial states, x(t 1 ) = x 1. Let the control that is optimal be u(t) = u o (t, t 1, x 1 ) and let J o (x 1, t 1 ) be the optimal cost given initial state x(t 1 ) = x 1. Now consider t 0 < t 1. The optimal control u o (t, t 0, x 0 ) for the interval [t 0, t f ] with initial states, x(t 0 ) = x 0 is given as follows. For t 0 t t 1, u o (t, t 0, x 0 ) is the u(t) that minimizes: t1 t 0 L(x(t), u(t), t)dt + J o (x(t 1 ), t 1 ) subject to ẋ(t) = f(x(t), u(t), t). Notice that x(t 1 ) is unknown a-priori since it depends on u(t). For t 1 t t f, the optimal control u o (t, t 0, x 0 ) = u o (t, t 1, x(t 1 )) M.E. University of Minnesota 231

21 where x(t 1 ) is the state achieved at t = t 1 from the initial state x 0 using optimal control u o (t, t 0, x 0 ) over the interval [t 0, t 1 ]. This procedure can be repeated by taking the initially time further and further back. Note: The optimal cost J o (x,t) is the cost-to-go function at time t. M.E. University of Minnesota 232

22 Relating P(t) to cost-to-go Let us apply DP to the LQ case (note: without the 1/2 for simplicity): L(x, u, t) = x T Q(t)x + u T R(t)u f(x, u, t) = A(t)x + Bu J = tf t 0 L(x, u,t)dt + φ(x(t f )). At t = t f, the cost-to-go function is simply: Hence, P(tf ) = S. J o (x,t f ) = x T Sx = x T P(tf )x Let t 1 = t f and consider t = t 1 t where t is infinitesimally small. The optimal control at t given the state x(t) is minimize min u(t) L(x, u, t) t + J o (x(t 1 ), t 1 ) Now, x(t 1 ) = x(t) + f(x(t), u(t), t) t. Thus, we M.E. University of Minnesota 233

23 minimize w.r.t. u(t), [ x(t) T Q(t)x(t) + u T (t)r(t)u(t) ] t+ J o (x(t) + [A(t)x(t) + B(t)u(t)] t, t 1 ) [ x(t) T Q(t)x(t) + u T (t)r(t)u(t) ] t + x(t) P(t 1 )x(t) + [x T (t)a T (t) + u T (t)b T (t)] P(t 1 )x(t) t + x T (t) P(t 1 )[A(t)x(t) + B(t)u(t)] t Differentiating w.r.t. u(t), we get the optimal control policy: u ot R(t) + x T (t) P(t 1 )B(t) = 0 u o (t) = R 1 (t)b T (t) P(t 1 )x(t) The updated optimal cost-to-go function is: J o (x(t), t) [ x(t) T Q(t)x(t) + u ot (t)r(t)u o (t) ] t + x(t) P(t 1 )x(t) + [x T (t)a T (t) + u ot (t)b T (t)] P(t 1 )x(t) t + x T (t) P(t 1 )[A(t)x(t) + B(t)u o (t)] t M.E. University of Minnesota 234

24 This shows that J o (x(t), t) x T (t) P(t 1 )x(t) + x T (t) [ A T (t) P(t 1 ) + P(t 1 )A(t) P(t 1 )B(t)R 1 (t)b T (t) P(t 1 ) + Q(t) ] x(t) t = x T (t) P(t)x(t) where ( P(t1 ) P(t) ) = [ A T (t) P(t 1 ) + P(t 1 )A(t) P(t 1 )B(t)R 1 (t)b T (t) P(t 1 ) + Q(t) ] t (37) Thus, we have shown that at t, J o (x(t), t) = x T (t) P(t)x. Let t t 1, t t t and repeat the process and we get the update recursion in Eq.(37). M.E. University of Minnesota 235

25 As t 0, we have Eq.(37) becomes: P(t) = A T (t) P(t) + P(t)A(t) P(t)B(t)R 1 (t)b T (t) P(t) + Q(t); which is exactly the Riccati differential equation as before. Hence P(t) = P(t). Note: Since x T (t)p(t)x(t) = tf t 0 [ x T (τ)q(τ)x(τ) + u T (τ)r(τ)u(τ) ] dτ + x T (t f )Sx(t f ) for any x(t), P(t) is positive semi-definite for any t t f. M.E. University of Minnesota 236

26 Finite time LQ Summary The finite time LQ regulator problem is solved by the control: u (t) = R 1 (t)b T (t)p(t)x(t) (38) where P(t) R n n is the solution to the continuous time Riccati Differential Equation (CTRDE): P(t) = A T (t)p(t) P(t)A(t) + P(t)B(t)R 1 (t)b T (t)p(t) Q(t); with boundary condition P(t f ) = S. P(t) is positive-semi definite The minimum cost achieved using the above control: J (x 0, t 0 ) := min u( ) J(u, x 0 ) = 1 2 xt 0 P(t 0 )x 0 M.E. University of Minnesota 237

27 Remarks 1. The control formulation works for time varying systems, e.g. nonlinear systems linearized about a trajectory. 2. The optimal control law is in the form of a time varying linear state feedback with feedback gain K(t) := R 1 (t)b T (t)p(t), although the control problem is formulated to ask for an open loop control. The open loop optimal control can be obtained, if so desired, by integrating (27) with the control (38). It is, however, much better to utilize feedback than to use openloop. 3. P(t) is solved backwards in time from t f t 0 and should be stored in memory before use. 4. The matrix function P(t) is associated with the socalled cost-to-go function. If at time t, t 0 t t f and the state happens to be x(t), then, the control policy (38) for the remaining time period [t, t f ] is also optimal for the problem (28) J(u,x(t), t, t f ) (i.e. with t 0 substituted by t and x 0 substituted by M.E. University of Minnesota 238

28 x(t)). In this case, the minimum cost is min u J(u,x(t), t) = 1 2 xt (t)p(t)x(t) M.E. University of Minnesota 239

29 Infinite Horizon LQ ẋ = Ax + Bu; x(t 0 ) = x 0 = min J(u,x 0 ) u tf t 0 [x T Qx + u T Ru] dt + x T (t f )Sx(t f ) Solve P(t, t f ) in (35) backwards in time. Does P(t, t f ) exist (i.e. does it converge when t )? If lim t P(t, t f ) = lim tf P(t, t f ) = P does exist, there is a constant state feedback gain given by: K = R 1 B T P. Will the closed loop system: ẋ = (A BK)x be stable? If lim t P(t, t f ) = lim tf P(t, t f ) = P does exist, we know that it must satisfy P(t) = 0, i.e. A T P + P A P BR 1 B T P + Q = 0. (39) M.E. University of Minnesota 240

30 which is called the Algebraic Riccati equation (ARE). In that case, which solution of ARE does the asymptotic solution of (35) correspond to? M.E. University of Minnesota 241

31 Boundedness of P(t, t f ) Solve P(t) backwards in time t = t f according to the CTRDE P(t) = A T (t)p(t) P(t)A(t) + P(t)B(t)R 1 (t)b T (t)p(t) Q(t); with boundary condition P(t f ) = S. In this section, we assume that S = 0. This is reasonable for t 0 since the running cost over an infinite horizon should dominate over cost at the terminal time. See textbook for cases when S 0. Proposition If (A, B) is controllable (or just stabilizable) then for any t < t f, P(t, t f ) < M where M is a positive definite matrix, in that: for all x R n, x T P(t, t f )x < x T Mx Moreover, as T, P(t, t+t) (which is the same as P(t T, t) converges to some matrix positive definite matrix P. Proof: We give the proof for the (A, B) controllable. Let t > 0 be an arbitrary fixed time interval. For M.E. University of Minnesota 242

32 any initial time t < t f t and initial state x 0, we can design a control u(τ), τ [t, t + t] such that x(t + t) = 0; and u(τ) = 0 for τ > t + t. The cost associated with this control is finite and is independent of t. By choosing different x 0, we can define a positive definite matrix M such that x T Mx is the cost for initial state x using the control thus constructed. Secondly, notice that for any > 0, = J(u,t, t f ) = J(u,t, t f + ) tf t [ x T Qx + u T Ru ] tf + [ dτ + x T Qx + u T Ru ] dτ t f } {{ } 0 The optimal cost for the interval [t 0, t f + ] must be greater than or equal to the optimal for J(u, t, t f ), i.e. the optimal cost increases as the time interval increases for the same initial condition. Suppose not, the [t, t f ] portion of the optimal cost for the [t, t f + ] would be less than the supposed optimal for the interval [t, t f ], which is a contradiction. M.E. University of Minnesota 243

33 This shows that for any and x R n, x T P(t, t f )x x T P(t, t f + )x = x T P(t, t f )x x T Mx. From analysis, we know that a non-decreasing, upper bounded function converges, thus, x T P(t, t f + )x converges as. By choosing various x, a matrix P can be constructed s.t. for any x, x T P(t, t f + )x x T P x. M.E. University of Minnesota 244

34 Stability PropositionLet Q = C T C and suppose that (A, B) is stabilizable. If (A, C) is observable (or detectable), then optimal closed loop control system is stable. ẋ = (A BR 1 B T P )x Furthermore, if (A, C) is observable, then P is positive definite. Otherwise, P is positive semidefinite. Proof Suppose that (A, C) is detectable but the closed loop system is unstable. Let ν be the unstable eigenvector of A BR 1 B T P such that λν = (A BR 1 B T P )ν; Re(λ) > 0. Let x(t 0 ) = ν be the initial state. Then, x(t) = e λ(t t 0) ν. Since (A, B) is stabilizable, t 0 x T Qx dt < ; t 0 u T Ru dt < M.E. University of Minnesota 245

35 We assume λ is real below for simplicity. If λ is complex, we need to consider both λ and λ simultaneously. Then, since e λ(t t 0) > 1 for all t t 0 > 0, t 0 ν T Qν e 2λ(t t 0) dt < Cν = 0. t 0 u T Ru dt = ν T [PBR 1 B T P]ν R 1 B T P ν = 0. t 0 e 2λ(t t 0) dt < This implies that (A BR 1 B T P )ν = Aν = λν This contradicts the assumption that (A, C) is detectable, since, ( λi A C ) ν = ( 0 0). Hence, (A, C) detectable, implies that the closed loop system is stable. M.E. University of Minnesota 246

36 To show that P is strictly positive definite when (A, C) is observable, suppose that P is merely positive semi-definite so that, x T 0 P x 0 = t 0 x T C T Cx + u T Ru dt = 0 for some initial state x(t 0 ) = x 0. This implies that for all t, u T (t)ru(t) = 0 or u(t) = 0.; and Cx(t) = 0. Or, for all t, ẋ = Ax; Cx = 0. This is not possible if (A, C) is observable. If (A, C) is merely detectable. Let ν be an unobservable eigenvector. Then, for x(t 0 ) = ν, u = 0 is the optimal control and x(t) = e λ(t t0) ν is the state trajectory, since t o x T (t)c T Cx(t) dt = 0. Thus, ν T P ν = 0. M.E. University of Minnesota 247

37 Solving ARE via the Hamiltonian Matrix For the infinite time horizon LQ problem, with (A, B) stabilizable and (A, C) detectable, the steady state solution of the CTRDE P must satisfy the Algebraic Riccati Equation (ARE) (i.e. by setting P = 0): A T P + P A P BR 1 B T P + Q = 0. (40) This is a nonlinear algebraic quadratic matrix equation. There are generally multiple solutions. Is it possible to solve this without integrating the CTRDE (35)? Recall that the solution P(t) can be obtained from the matrix Hamilton equation: ) ( ) (Ẋ1 A BR = 1 B T Ẋ 2 Q A }{{ T } Hamiltonian Matrix - H ( X1 X 2 ) (41) with boundary conditions: X 1 (t f ) invertible and X 2 (t f ) = SX 1 (t f ) so that P(t) = X 2 (t)x1 1 (t). Denote the 2n eigenvalues and 2n eigenvectors of H by respectively: {λ 1, λ 2,..., λ 2n }, {e 1, e 2,..., e 2n } M.E. University of Minnesota 248

38 Let us choose n pairs of these: ( F Λ = diag {λ i1, λ i2,..., λ in }, G) = ( e i1 e i2... e in ), Proposition Let P := GF 1 where the columns of ( F G) R 2n n are n of the eigenvectors of H. Then, P satisfies the Algebraic Riccati Equation (40). Proof: We know that P(t) = X 2 (t)x1 1 (t) where X 1 (t) and X 2 (t) satisfy the Hamiltonian differential equation (41). For P = GF 1 to satisfy Eq.(40), one needs only show that P(t) = 0 when X 1 (t) = F and X 2 (t) = G. This is so because ( F G) Λ = ( ) A BR 1 B T Q A }{{ T } Hamiltonian Matrix - H ( F G). (42) so that P = G dx 1 1 dt + dx 2 F dt F 1 G = GF 1 FΛF 1 + GΛF 1 = 0. M.E. University of Minnesota 249

39 This proposition shows that there are Cn 2n (i.e. 2n!/n!) solutions of P, depending on which n of the 2n eigenvectors of H are picked to define F and G. PropositionSuppose that (A, B) is stabilizable and (A, C) is detectable. Then, the eigenvalues of H are symmetrically located across the imaginary and real axes with no eigenvalues on the imaginary axis. Proof: Consider a invertible coordinate transformation T = ( ) I 0, T 1 = P I n ( ) I 0. P I n Hence, T 1 HT = A BR 1 B T }{{ P } BR 1 B T A c 0 (A BR 1 B T P ) }{{ T } A T c Since T 1 HT and H share the same eigenvalues, this shows that H contains the eigenvalues of A c as well M.E. University of Minnesota 250

40 as of A T c. Hence, n eigenvalues of H must lie on the closed RHP, and n eigenvalues lie on the closed LHP. In other words, the eigenvalues of H are symmetrically located about both the real and imaginary axes. Further, A c (and hence H) cannot have any eigenvalues on the imaginary axis. For, otherwise, the optimal cost will be infinite. Hence, H must have n eigenvalues on the open LHP, and n on the open RHP. Since we know that the closed loop system matrix: A c = A BR 1 B T P = A BR 1 B T GF 1. must be stable if (A, B) is stabilizable and (A, C) is detectable, we have the following result. Proposition Suppose that (A, B) is stabilizable and (A, C) is detectable. The steady state ( solution of the F CTRDE is the P = GF 1 where are chosen G) to consist of the n eigenvectors that correspond to the stable eigenvalues of H. M.E. University of Minnesota 251

41 Proof: Since A c = A BR 1 B T P = A BR 1 B T GF 1 ( A c F = [A, BR 1 B T F ] = FΛ G) where the last equality is obtained from Eq.(42). Hence, diag(λ) consists of the eigenvalues of A c, and columns of F are the eigenvectors. Since (A, B) and (A, C) are stabilizable and detectable, A c is stable. Thus, Λ must have negative real parts. Remark Integrating the Hamiltonian matrix is not a good idea, either in forward time or in reverse time, since either way will be unstable. Integrating the Riccati backwards in time is more reliable. The Hamiltonian matrix is useful for solving for the solution to the ARE though, via its eigenvalues and eigenvectors. M.E. University of Minnesota 252

42 Selection of Q and R The quality of the control design depends on the choice of Q and R (and for finite time S also). How should one choose these? Some suggestions here, most taken from (Anderson and Moore, 1990). Generally an iterative design/simulation process is needed; If there is a specific output z = Cx that need to be kept small, choose Q = C T C. Use physically meaningful state and control variables and use physical insights to select Q and R. Choose Q and R to be diagonal in the absence of information about coupling. Obtain acceptable excursions: x i (t) x i,max, u i (t) u i,max, x i (t f ) x i,f max Then choose Q, R and S to be inversely proportional to x 2 i,max, u2 i,max and x2 i,f max respectively. M.E. University of Minnesota 253

43 Off diagonal terms in Q reflect coupling. e.g. to coordinate( x 1 = kx ) 2, one can choose C = [1 k] so 1 k that Q = k k 2. One can add other objectives to Q. For finite time regulator problem with time interval T. The ratio of the running cost objective and the terminal cost objective should be scaled by 1/T and the dimension of x R n and u R m : tf =t 0 +T t 0 1 nt xt Qx+ 1 mt ut Ru dt+x T (t f )Sx(t f ). where Q, R, S are selected based on separate x(t), u(t) and x(t f ) criteria. Additional relative scalings should be iteratively determined. If R = diag[r 1, r 2, r 3 ] and after simulation, u 2 is too large, increase r 2 ; If after simulation, state x 3 is too large, modify Q such that x T Qx x T Qx + γx 2 3 etc. If performance is related to frequency, use frequency weighting (see below). M.E. University of Minnesota 254

44 Summary from Infinite Horizon LQ Performance criteria ẋ = Ax + Bu; x(t 0 ) = x 0. J(u,x 0, t 0, t f ) = 1 2 Solution: tf t 0 [ x T (t)qx(t) + u T (t)ru(t) ] dt P(t) = A T P(t) + P(t)A P(t)BR 1 B T (t)p + Q(t); P(t f ) = 0. If (A, B) is controllable (or stabilizable), then P = P(t ) (the same as P(t 0 ) as t f ) exists. The optimal control is: u o (t) = R 1 B T }{{ P } x(t). K and the cost performance is: J o (x 0 ) = 1 2 xt 0 P x 0. M.E. University of Minnesota 255

45 Furthermore, let Q = C T C. If (A, C) is observable (or detectable), then: is stable. A BK Thus, if (A, B) controllable or at least stabilizable and with suitable choice of R - must be positive definite Q = C T C - (A, C) should be observable or at least detectable then LQ control methodology automatically generates a feedback gain K such that A BK is stable. Solution of ARE can be generated by solving the eigenvalues / eigenvectors of the 2n 2n Hamiltonian matrix in Eq.(34). The eigenvalues are symmetrically located w.r.t. both the real and imaginary axis. The true solution (i.e. P = P(t f )) is the one associated with the stable eigenvalues of the Hamiltonian matrix. M.E. University of Minnesota 256

46 LQ Regulator for Discrete Time Systems Consider the discrete time system: x(k + 1) = A(k)x(k) + B(k)u(k); x(0) = x 0. with the performance criteria given by: J(u( ), x 0 ) = 1 2 xt (k f )Sx(k f ) k f 1 [ x T (k)qx(k) + u T (k)ru(k) ]. k=k 0 The optimal control is given by: u o (k) = K(k) x(k) K(k) = [R(k) + B T (k)p(k + 1)B(k)] 1 B T (k)p(k + 1)A(k) where P(k) is the solution to the discrete time Riccati difference equation: P(k) =Q(k) + A T (k)p(k + 1)A(k) A T P(k + 1)B(k) [ R + B T (k)p(k + 1)B(k) ] 1 B T (k)p(k + 1)A(k); P(k f ) = S. M.E. University of Minnesota 257

47 The optimal cost-to-go at time k is: J o (x(k), k) = 1 2 xt (k)p(k)x(k). Notice that positive definiteness condition on R(k) implies that: R(k) + B T (k)p(k + 1)B(k) is invertible. Exercise: Derive the discrete time LQ result using dynamic programming. M.E. University of Minnesota 258

48 Discrete time LTI LQR For A, B, Q and R being constants, we have: P(k ) P satisfies the Algebraic Discrete Time Riccatti Equation: A T P A A T P B[R+B T P B] 1 B T PA+Q = P If (A, B) is controllable and (A, C) (where Q = C T C) is observable, then P is the unique positive definite solution. If (A, C) is only detectable, then P is the positive semi-definite solution. The feedback gain is then: K = [R + B T P B] 1 B T P A The closed loop system is stable, meaning that all eigenvalues of A BK have magnitudes less than 1 (lie in the unit disk centered at the origin). M.E. University of Minnesota 259

49 Other LQ topics - see notes for details Infinite LQ design methodology ensures a set of gains such that the closed loop poles are on the open LHP for continuous time system, and inside a unit disk for a discrete time system. The above properties can be exploited to ensure that closed loop poles lie in a certain region (left of α or within a disk). Frequency weighting can be used to penalized control or performance. Approach is to design weighting filters, and then convert the Frequency Shaped LQ into a standard LQ problem. The LQ gains satisfy a so-called Return Difference Equality, from which robustness properties can be derived. Asymptotic closed loop pole locations as r (expensive control) or r 0 (cheap control) can be derived using the Symmetric Root Locus (a consequence of the return difference equality). The optimal cost-to-go is: J o (x,k) = x T P(k)x. M.E. University of Minnesota 260

50 Eigenvalue placements LQR can be thought of as a way of generating stabilizing feedback gains. However, exactly where the closed loop poles are in the LHP is not clear. We now propose a couple of ways in which we can exert some control over them. The idea is to transform the problem. In this section, we assume that (A, B) is controllable, and (A, C) is observable where Q = C T C. M.E. University of Minnesota 261

51 Guaranteed convergence rate To move the poles so that they are at least to the left of α (i.e. if the eigenvalues of A BK are λ i, we want Re(λ i ) < α, hence more stable), we solve an alternate problem. Since ẋ = A x + Bu Re(eig(A BK)) < 0 Thus, setting A = A + αi, we solve the LQ problem for the plant: ẋ = (A + αi)x + Bu. This ensures that the eigenvalues of Re((A + αi) BK) < 0. Notice that (A + αi) BK and A BK have the same eigenvectors. Thus, the eigenvalues of A BK, say λ i and those of A + αi BK, σ i, are related by λ i = σ i α. Since Re(σ i ) < 0, Re(λ i ) < α. M.E. University of Minnesota 262

52 Eigenvalues to lie in a disk A more interesting case is to ensure that the eigenvalues of the closed loop system lie in a disk centered at ( α,0) and with radius ρ < α. This, in addition to specifying the convergence rate to be faster than α ρ, it also specifies limits for the damping ratio, so that the system will not be too oscillatory. The idea is to use the discrete time LQ solution, which ensures that the eigenvalues of A BK lie in a unit disk centered at the origin. We need to scale the disk and to translate it. Let the continuous time plant be: ẋ = Ax + Bu If we solve the discrete time LQ problem for the plant, x(k + 1) = 1 ρ A x(k) + 1 ρ Bu(k) then, the eigenvalues of 1 ρ (A BK) would lie in the unit disk and the eigenvalues of (A BK) would lie in the disk with radius ρ, both centered at the origin. M.E. University of Minnesota 263

53 Using the same trick as before, we now translate the eigenvalues by α by setting A = A + αi. In summary, if we use the discrete time LQ control design method for the plant x(k + 1) = 1 ρ (A + αi)x(k) + 1 ρ Bu(k) then, the eigenvalues of 1 ρ ((A + αi) BK) would lie within the unit disk centered at the origin. This implies that the eigenvalues of ((A + αi) BK) lie in a disk of radius ρ centered at the origin. Finally, this implies that the eigenvalues of A BK lie in a disk or radius ρ centered at ( α,0). M.E. University of Minnesota 264

54 Frequency Shaping Original LQ problem is specified in the time domain. The cost function is the L 2 norms of the control, and of z = Q 1 2x. Frequency domain sometimes more useful. example For In dual stage actuator, one actuator prefers large amplitude low frequency, the other prefers high frequency small amplitude disturbances lie within a narrow bandwidth Robustness are easier to specify in the frequency domain (e.g. in loop shaping concepts) Parseval theorem For a squared integrable function h(t) R p with ht (t)h(t)dt <, h T (t)h(t)dt = 1 2π H (jw)h(jw)dw (43) where H(jw) is the fourier transform or as H(s = jw), i.e. the Laplace transform of h(t) evaluated at s = jw. H (jw) denotes the conjugate transpose M.E. University of Minnesota 265

55 of H(jw). Hence, for H(s) with real coefficient, H (jw) = H( jw) T. Parseval theorem states that the energy (L 2 norm) in the signal can be evaluated either in the frequency or in the time domain. So, suppose that we want to optimize the criteria in the frequency domain as: J(u) = 1 2π X (jw)q 1(jw)Q 1 (jw)x(jw) + U (jw)r 1(jw)R 1 (jw)u(jw) dw (44) This says that the state and control weightings are given by Q(w 2 ) = Q 1(jw)Q 1 (jw); R(w 2 ) = R 1(jw)R 1 (jw). If we define X 1 (jw) = Q 1 (jw)x(jw), U 1 (jw) = R 1 (jw)u(jw), then J(u) = 1 2π X 1(jw)X 1 (jw) + U 1(jw)U 1 (jw) dw M.E. University of Minnesota 266

56 Now, apply Parseval Theorem in reverse, J(u) = x T 1 (t)x 1 (t) + u T 1 (t)u 1 (t) dt. (45) If we know the dynamics of x 1 and u 1 is the control input, then we can solve using the standard LQ technique. We express the filters Q 1 (s) and R 1 (s) as filters (e.g. low pass and high pass) with the actual state and input of the system x(t) and u(t) as inputs, and frequency weighted state x 1 (t) and u 1 (t) as outputs: Q 1 (s) = C Q (si A Q ) 1 B Q + D Q (46) R 1 (s) = C R (si A R ) 1 B R + D R (47) which says that in the time domain: and similarly, ż 1 = A Q z 1 + B Q x (48) x 1 = C Q z 1 + D Q x (49) ż 2 = A R z 2 + B R u (50) u 1 = C R z 2 + D R u. (51) M.E. University of Minnesota 267

57 Hence we can define an augmented plant: d x z 1 = A 0 0 B Q A Q 0 x z 1 + B 0 dt 0 0 A R z 2 z 2 B R u(t) or with x = [x;z 1 ; z 2 ], etc. x = Ā x + Bu. Since u 1 = ( 0 0 C R ) x + DR u x 1 = ( D Q C Q 0 ) x the cost function Eq.(45) becomes: ) J(u) = ( x T u T)( Q e N )( x T dt (52) N R e u where Q e = DQ TD Q DQ TC Q 0 C Q TD Q CQ TC Q CR TC R M.E. University of Minnesota 268

58 N = 0 0 ; R e = DRD T R. CR TD R Eq.(52) is still not in standard form yet because of the off diagonal block N. We can convert Eq.(52) into the standard form if we consider: u(t) = R 1 e N x + v (53) The integrand in Eq.(52) becomes: )( )( ) T ( x v T)( I N T Re 1 Qe N T I 0 0 I N R e Re 1 N I)( x v = ( x ) T v T)( Q e N T Re 1 N 0 0 R e)( x v Then, define Q = Q e N T R T e N, R = Re (54) and new state dynamics: x = (Ā BR 1 e N) x + Bv (55) M.E. University of Minnesota 269

59 and cost function, J(v) = x T Q x + v T Rv dt. (56) Eqs.(55)-(56) are then in the standard LQ format. The stabilizability and detectability conditions are now needed for the the augmented system (what are they?). M.E. University of Minnesota 270

Extensions and applications of LQ

Extensions and applications of LQ 1 Discrete time systems 2 Assigning closed loop pole location 3 Frequency shaping LQ Regulator for Discrete Time Systems Consider the discrete time system: x(k + 1) =