Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 /
LQG in continuous time Recall that for problems with dynamics and cost the optimal control law is u = dx = (a (x) + B (x) u) dt + C (x) dω ` (x, u) = q (x) + 2 ut R (x) u v t = q + a T v x + 2 tr CC T v xx We now impose further restrictions (LQG system): R B T v x and the HJB equation is dx = (Ax + Bu) dt + Cdω ` (x, u) = 2 xt Qx + 2 ut Ru q T (x) = 2 xt Q T x 2 vt x BR B T v x Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 2 /
Continuous-time Riccati equations Substituting the LQG dynamics and cost in the HJB equation yields v t = 2 xt Qx + x T A T v x + 2 tr CC T v xx 2 vt x BR B T v x We can now show that v is quadratic: v (x, t) = 2 xt V (t) x + α (t) At the final time this holds with α (T) = 0 and V (T) = Q T. Then α 2 xt Vx = 2 xt Qx + x T A T Vx + 2 tr CC T V 2 xt VBR B T Vx Using the fact that x T A T Vx = x T VAx and matching powers of x yields Theorem (Riccati equation) V = Q + A T V + VA VBR B T V α = CC 2 tr T V Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 3 /
Linear feedback control law When v (x, t) = 2 xt V (t) x + α (t), the optimal control u = R B T v x is u (x, t) = L (t) x L (t), R B T V (t) The Hessian V (t) and the matrix of feedback gains L (t) are independent of the noise amplitude C. Thus the optimal control law u (x, t) is the same for stochastic and deterministic systems (the latter is called LQR). Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 4 /
Linear feedback control law When v (x, t) = 2 xt V (t) x + α (t), the optimal control u = R B T v x is u (x, t) = L (t) x L (t), R B T V (t) The Hessian V (t) and the matrix of feedback gains L (t) are independent of the noise amplitude C. Thus the optimal control law u (x, t) is the same for stochastic and deterministic systems (the latter is called LQR). Example: dx = udt + 0.2dω ` (x, u) = 0.5u 2 q T (x) = 2.5x 2 Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 4 /
LQG in discrete time Consider an optimal control problem with dynamics and cost x k+ = Ax k + Bu k ` (x, u) = 2 xt Qx + 2 ut Ru Substituting in the Bellman equation v k (x) = min u f` (x, u) + v k+ (x 0 )g and making the ansatz v k (x) = 2 xt V k x yields 2 xt V k x = min u 2 xt Qx + 2 ut Ru + 2 (Ax + Bu)T V k+ (Ax + Bu) The minimum is u k (x) = L kx where L k, Theorem (Riccati equation) V k = Q + A T V k+ (A BL k ) R + B T V k+ B B T V k+ A. Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 5 /
Summary of Riccati equations Finite horizon Continuous time V = Q + A T V + VA VBR B T V Discrete time V k = Q + A T V k+ A A T V k+ B R + B T V k+ B B T V k+ A Average cost Continuous time ( care in Matlab) 0 = Q + A T V + VA VBR B T V Discrete time ( dare in Matlab) V = Q + A T VA A T VB R + B VB T B T VA Discounted cost is similar; first exit does not yield Riccati equations. Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 6 /
Relation between continuous and discrete time The continuous-time system ẋ = Ax + Bu ` (x, u) = 2 xt Qx + 2 ut Ru can be represented in discrete time with time-step as x k+ = (I + A) x k + Bu k ` (x, u) = 2 xt Qx + 2 ut Ru In the limit! 0 the discrete Riccati equation reduces to the continuous one: V = Q + (I + A) T V (I + A) (I + A) T V B R + B V B T B T V (I + A) V = Q + V + A T V + VA VB R + B VB T B T V + o 2 0 = Q + A T V + VA VB R + B T VB B T V + o 2 Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 7 /
Encoding targets as quadratic costs The matrices A, B, Q, R can be time-varying, which is useful for specifying reference trajectories xk, and for approximating non-lqg problems. The cost xk the state vector as xk 2 can be represented in the LQG framework by augmenting ex = x and writing the state cost as A 0, ea = 0 2 ext eq k ex = 2 ext D T k D k ex where D k = I, x k and so Dk ex k = x k x k., etc. If the target x is stationary we can instead include it in the state, and use D = [I, I]. This has the advantage that the resulting control law is independent of x and therefore can be used for all targets. Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 8 /
Optimal estimation in linear-gaussian systems Consider the partially-observed system x k+ = Ax k + Cω k y k = Hx k + Dε k with hidden state x k, measurement y k, and noise ε k, ω k N (0, I). Given a Gaussian prior x 0 N (bx 0, Σ 0 ) and a sequence of measurements y 0, y, y k, we want to compute the posterior p k+ (x k+ ). We can show by induction that the posterior is Gaussian at all times. Let p k (x k ) be N (bx k, Σ k ). This will act as a prior for estimating x k+. Now x k+ and y k are jointly Gaussian, with mean and covariance xk+ Abxk E = y k xk+ Cov y k = Hbx k CC T + AΣ k A T HΣ k A T AΣ k H T DD T + HΣ k H T Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 9 /
Kalman filter Lemma If u, v are jointly Gaussian with means bu, bv and covariances Σ uu, Σ vv, Σ uv = Σ T vu, then u given v is Gaussian with mean and covariance E [ujv] = bu + Σ uv Σvv (v bv) Cov [ujv] = Σ uu Σ uv Σvv Σ vu Applying this to our problem with u = x k+ and v = y k yields Theorem (Kalman filter) The mean bx and covariance Σ of the Gaussian posterior satisfy bx k+ = Abx k + K k (y k Hbx k ) Σ k+ = CC T + (A K k H) Σ k A T K k, AΣ k H T DD T + HΣ k H T Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 0 /
Duality of LQG control and Kalman filtering LQG controller State dynamics: Kalman filter Estimated state dynamics: x k+ = (A BL k ) x k + Cε k bx k+ = (A K k H) bx k + K k y k Gain matrix: L k = R + B T V k+ B B T V k+ A Backward Riccati equation: Gain matrix: K k = AΣ k H T DD T + HΣ k H T Forward Riccati equation: V k = Q + A T V k+ (A BL k ) Σ k+ = CC T + (A K k H) Σ k A T Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 /
Duality of LQG control and Kalman filtering LQG controller State dynamics: Kalman filter Estimated state dynamics: x k+ = (A BL k ) x k + Cε k bx k+ = (A K k H) bx k + K k y k Gain matrix: L k = R + B T V k+ B B T V k+ A Backward Riccati equation: Gain matrix: K k = AΣ k H T DD T + HΣ k H T Forward Riccati equation: V k = Q + A T V k+ (A BL k ) Σ k+ = CC T + (A K k H) Σ k A T This form of duality does not generalize to non-lqg systems. However there is a different duality which does generalize (see later). It involves an information filter, computing Σ instead of Σ. Emo Todorov (UW) AMATH/CSE 579, Winter 204 Winter 204 /