Lecture 4 Continuous time linear quadratic regulator

EE363 Winter 2008-09 Lecture 4 Continuous time linear quadratic regulator continuous-time LQR problem dynamic programming solution Hamiltonian system and two point boundary value problem infinite horizon LQR direct solution of ARE via Hamiltonian 4 1

Continuous-time LQR problem continuous-time system ẋ = Ax + Bu, x(0) = x 0 problem: choose u : [0, T R m to minimize J = T 0 ( x(τ) T Qx(τ) + u(τ) T Ru(τ) ) dτ + x(t) T Q f x(t) T is time horizon Q = Q T 0, Q f = Q T f 0, R = RT > 0 are state cost, final state cost, and input cost matrices... an infinite-dimensional problem: (trajectory u : [0, T R m is variable) Continuous time linear quadratic regulator 4 2

Dynamic programming solution we ll solve LQR problem using dynamic programming for 0 t T we define the value function V t : R n R by V t (z) = min u T t ( x(τ) T Qx(τ) + u(τ) T Ru(τ) ) dτ + x(t) T Q f x(t) subject to x(t) = z, ẋ = Ax + Bu minimum is taken over all possible signals u : [t, T R m V t (z) gives the minimum LQR cost-to-go, starting from state z at time t V T (z) = z T Q f z Continuous time linear quadratic regulator 4 3

fact: V t is quadratic, i.e., V t (z) = z T P t z, where P t = P T t 0 similar to discrete-time case: P t can be found from a differential equation running backward in time from t = T the LQR optimal u is easily expressed in terms of P t Continuous time linear quadratic regulator 4 4

we start with x(t) = z let s take u(t) = w R m, a constant, over the time interval [t, t + h, where h > 0 is small cost incurred over [t, t + h is t+h t ( x(τ) T Qx(τ) + w T Rw ) dτ h(z T Qz + w T Rw) and we end up at x(t + h) z + h(az + Bw) Continuous time linear quadratic regulator 4 5

min-cost-to-go from where we land is approximately V t+h (z + h(az + Bw)) = (z + h(az + Bw)) T P t+h (z + h(az + Bw)) (z + h(az + Bw)) T (P t + hp t )(z + h(az + Bw)) ( ) z T P t z + h (Az + Bw) T P t z + z T P t (Az + Bw) + z T P t z (dropping h 2 and higher terms) cost incurred plus min-cost-to-go is approximately z T P t z+h ( ) z T Qz + w T Rw + (Az + Bw) T P t z + z T P t (Az + Bw) + z T P t z minimize over w to get (approximately) optimal w: 2hw T R + 2hz T P t B = 0 Continuous time linear quadratic regulator 4 6

so w = R 1 B T P t z thus optimal u is time-varying linear state feedback: u lqr (t) = K t x(t), K t = R 1 B T P t Continuous time linear quadratic regulator 4 7

HJ equation now let s substitute w into HJ equation: z T P( t z z T P t z+ ) +h z T Qz + w T Rw + (Az + Bw ) T P t z + z T P t (Az + Bw ) + z T P t z yields, after simplification, P t = A T P t + P t A P t BR 1 B T P t + Q which is the Riccati differential equation for the LQR problem we can solve it (numerically) using the final condition P T = Q f Continuous time linear quadratic regulator 4 8

Summary of cts-time LQR solution via DP 1. solve Riccati differential equation P t = A T P t + P t A P t BR 1 B T P t + Q, P T = Q f (backward in time) 2. optimal u is u lqr (t) = K t x(t), K t := R 1 B T P t DP method readily extends to time-varying A, B, Q, R, and tracking problem Continuous time linear quadratic regulator 4 9

Steady-state regulator usually P t rapidly converges as t decreases below T limit P ss satisfies (cts-time) algebraic Riccati equation (ARE) A T P + PA PBR 1 B T P + Q = 0 a quadratic matrix equation P ss can be found by (numerically) integrating the Riccati differential equation, or by direct methods for t not close to horizon T, LQR optimal input is approximately a linear, constant state feedback u(t) = K ss x(t), K ss = R 1 B T P ss Continuous time linear quadratic regulator 4 10

Derivation via discretization let s discretize using small step size h > 0, with Nh = T x((k + 1)h) x(kh) + hẋ(kh) = (I + ha)x(kh) + hbu(kh) J h 2 N 1 k=0 ( x(kh) T Qx(kh) + u(kh) T Ru(kh) ) + 1 2 x(nh)t Q f x(nh) this yields a discrete-time LQR problem, with parameters Ã = I + ha, B = hb, Q = hq, R = hr, Qf = Q f Continuous time linear quadratic regulator 4 11

solution to discrete-time LQR problem is u(kh) = K k x(kh), K k = ( R + B T 1 Pk+1 B) BT Pk+1 Ã P k 1 = Q + ÃT Pk Ã ÃT Pk B( R + 1 BT Pk B) BT Pk Ã substituting and keeping only h 0 and h 1 terms yields P k 1 = hq + P k + ha T Pk + h P k A h P k BR 1 B T Pk which is the same as 1 h ( P k P k 1 ) = Q + A T Pk + P k A P k BR 1 B T Pk letting h 0 we see that P k P kh, where P = Q + A T P + PA PBR 1 B T P Continuous time linear quadratic regulator 4 12

similarly, we have K k = ( R + B T 1 Pk+1 B) BT Pk+1 Ã = (hr + h 2 B T Pk+1 B) 1 hb T Pk+1 (I + ha) R 1 B T P kh as h 0 Continuous time linear quadratic regulator 4 13

Derivation using Lagrange multipliers pose as constrained problem: minimize J = 1 2 T 0 x(τ)t Qx(τ) + u(τ) T Ru(τ) dτ + 1 2 x(t)t Q f x(t) subject to ẋ(t) = Ax(t) + Bu(t), t [0, T optimization variable is function u : [0, T R m infinite number of equality constraints, one for each t [0, T introduce Lagrange multiplier function λ : [0, T R n and form L = J + T 0 λ(τ) T (Ax(τ) + Bu(τ) ẋ(τ)) dτ Continuous time linear quadratic regulator 4 14

Optimality conditions (note: you need distribution theory to really make sense of the derivatives here... ) from u(t) L = Ru(t) + B T λ(t) = 0 we get u(t) = R 1 B T λ(t) to find x(t) L, we use T 0 λ(τ) T ẋ(τ) dτ = λ(t) T x(t) λ(0) T x(0) T 0 λ(τ) T x(τ) dτ from x(t) L = Qx(t) + A T λ(t) + λ(t) = 0 we get λ(t) = A T λ(t) Qx(t) from x(t) L = Q f x(t) λ(t) = 0, we get λ(t) = Q f x(t) Continuous time linear quadratic regulator 4 15

Co-state equations optimality conditions are ẋ = Ax + Bu, x(0) = x 0, λ = A T λ Qx, λ(t) = Q f x(t) using u(t) = R 1 B T λ(t), can write as d dt [ x(t) λ(t) = [ A BR 1 B T Q A T [ x(t) λ(t) 2n 2n matrix above is called Hamiltonian for problem with conditions x(0) = x 0, λ(t) = Q f x(t), called two-point boundary value problem Continuous time linear quadratic regulator 4 16

as in discrete-time case, we can show that λ(t) = P t x(t), where P t = A T P t + P t A P t BR 1 B T P t + Q, P T = Q f in other words, value function P t gives simple relation between x and λ to show this, we show that λ = Px satisfies co-state equation λ = A T λ Qx λ = d dt (Px) = Px + Pẋ = (Q + A T P + PA PBR 1 B T P)x + P(Ax BR 1 B T λ) = Qx A T Px + PBR 1 B T Px PBR 1 B T Px = Qx A T λ Continuous time linear quadratic regulator 4 17

Solving Riccati differential equation via Hamiltonian the (quadratic) Riccati differential equation P = A T P + PA PBR 1 B T P + Q and the (linear) Hamiltonian differential equation d dt [ x(t) λ(t) = [ A BR 1 B T Q A T [ x(t) λ(t) are closely related λ(t) = P t x(t) suggests that P should have the form P t = λ(t)x(t) 1 (but this doesn t make sense unless x and λ are scalars) Continuous time linear quadratic regulator 4 18

consider the Hamiltonian matrix (linear) differential equation d dt [ X(t) Y (t) = [ A BR 1 B T Q A T [ X(t) Y (t) where X(t), Y (t) R n n then, Z(t) = Y (t)x(t) 1 satisfies Riccati differential equation Ż = AT Z + ZA ZBR 1 B T Z + Q hence we can solve Riccati DE by solving (linear) matrix Hamiltonian DE, with final conditions X(T) = I, Y (T) = Q f, and forming P(t) = Y (t)x(t) 1 Continuous time linear quadratic regulator 4 19

Ż = d dt Y X 1 = Ẏ X 1 Y X 1 ẊX 1 = ( QX A T Y )X 1 Y X 1 ( AX BR 1 B T Y ) X 1 = Q A T Z ZA + ZBR 1 B T Z where we use two identities: d dt (F(t)G(t)) = F(t)G(t) + F(t)Ġ(t) d dt ( F(t) 1 ) = F(t) 1 F(t)F(t) 1 Continuous time linear quadratic regulator 4 20

Infinite horizon LQR we now consider the infinite horizon cost function J = 0 x(τ) T Qx(τ) + u(τ) T Ru(τ) dτ we define the value function as V (z) = min u subject to x(0) = z, ẋ = Ax + Bu 0 x(τ) T Qx(τ) + u(τ) T Ru(τ) dτ we assume that (A, B) is controllable, so V is finite for all z can show that V is quadratic: V (z) = z T Pz, where P = P T 0 Continuous time linear quadratic regulator 4 21

optimal u is u(t) = Kx(t), where K = R 1 B T P (i.e., a constant linear state feedback) HJ equation is ARE Q + A T P + PA PBR 1 B T P = 0 which together with P 0 characterizes P can solve as limiting value of Riccati DE, or via direct method Continuous time linear quadratic regulator 4 22

Closed-loop system with K LQR optimal state feedback gain, closed-loop system is ẋ = Ax + Bu = (A + BK)x fact: closed-loop system is stable when (Q,A) observable and (A, B) controllable we denote eigenvalues of A + BK, called closed-loop eigenvalues, as λ 1,..., λ n with assumptions above, Rλ i < 0 Continuous time linear quadratic regulator 4 23

Solving ARE via Hamiltonian [ A BR 1 B T Q A T [ I P = [ A BR 1 B T P Q A T P = [ A + BK Q A T P and so [ I 0 P I [ A BR 1 B T Q A T [ I 0 P I = [ A + BK BR 1 B T 0 (A + BK) T where 0 in lower left corner comes from ARE note that [ I 0 P I 1 = [ I 0 P I Continuous time linear quadratic regulator 4 24

we see that: eigenvalues of Hamiltonian H are λ 1,..., λ n and λ 1,..., λ n hence, closed-loop eigenvalues are the eigenvalues of H with negative real part Continuous time linear quadratic regulator 4 25

let s assume A + BK is diagonalizable, i.e., T 1 (A + BK)T = Λ = diag(λ 1,..., λ n ) then we have T T ( A BK) T T T = Λ, so [ [ T 1 0 A + BK BR 1 B T 0 T T 0 (A + BK) T [ Λ T = 1 BR 1 B T T T 0 Λ [ T 0 0 T T Continuous time linear quadratic regulator 4 26

putting it together we get [ [ [ T 1 0 I 0 I 0 0 T T H P I P I [ [ T 1 0 T 0 = T T P T T H PT [ Λ T = 1 BR 1 B T T T 0 Λ T T [ T 0 0 T T and so thus, the n columns of H [ T PT [ T PT the stable eigenvalues λ 1,..., λ n = [ T PT Λ are the eigenvectors of H associated with Continuous time linear quadratic regulator 4 27

Solving ARE via Hamiltonian find eigenvalues of H, and let λ 1,..., λ n denote the n stable ones (there are exactly n stable and n unstable ones) find associated eigenvectors v 1,..., v n, and partition as [ v1 v n = [ X Y R 2n n P = Y X 1 is unique PSD solution of the ARE (this is very close to the method used in practice, which does not require A + BK to be diagonalizable) Continuous time linear quadratic regulator 4 28