Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control problem estimation-control separation principle solution via dynamic programming 10 1

Linear stochastic system linear dynamical system, over finite time horizon: x t+1 = Ax t + Bu t + w t, t = 0,..., N 1 with state x t, input u t, and process noise w t linear noise corrupted observations: y t = Cx t + v t, t = 0,..., N y t is output, v t is measurement noise x 0 N(0, X), w t N(0, W), v t N(0, V ), all independent Linear Quadratic Stochastic Control with Partial State Observation 10 2

Causal output feedback control policies causal feedback policies: input must be function of past and present outputs roughly speaking: current state x t is not known u t = φ t (Y t ), t = 0,..., N 1 Y t = (y 0,..., y t ) is output history at time t φ t : R p(t+1) R m called the control policy at time t closed-loop system is x t+1 = Ax t + Bφ t (Y t ) + w t, y t = Cx t + v t x 0,..., x N, y 0,..., y N, u 0,..., u N 1 are all random Linear Quadratic Stochastic Control with Partial State Observation 10 3

Stochastic control with partial observations objective: J = E with Q 0, R > 0 ( N 1 t=0 ( x T t Qx t + u T t Ru t ) + x T N Qx N ) partially observed linear quadratic stochastic control problem (a.k.a. LQG problem): choose output feedback policies φ 0,..., φ N 1 to minimize J Linear Quadratic Stochastic Control with Partial State Observation 10 4

Solution optimal policies are φ t (Y t ) = K t E(x t Y t ) K t is optimal feedback gain matrix for associated LQR problem E(x t Y t ) is the MMSE estimate of x t given measurements Y t (can be computed using Kalman filter) called separation principle: optimal policy consists of estimating state via MMSE (ignoring the control problem) using estimated state as if it were the actual state, for purposes of control Linear Quadratic Stochastic Control with Partial State Observation 10 5

LQR control gain computation define P N = Q, and for t = N,...,1, P t 1 = A T P t A + Q A T P t B(R + B T P t B) 1 B T P t A set K t = (R + B T P t+1 B) 1 B T P t+1 A, t = 0,..., N 1 K t does not depend on data C, X, W, V Linear Quadratic Stochastic Control with Partial State Observation 10 6

Kalman filter current state estimate define ˆx t = E(x t Y t ) (current state estimate) Σ t = E(x t ˆx t )(x t ˆx t ) T (current state estimate covariance) Σ t+1 t = AΣ t A T + W (next state estimate covariance) start with Σ 0 1 = X; for t = 0,..., N, Σ t+1 t Σ t = Σ t t 1 Σ t t 1 C T (CΣ t t 1 C T + V ) 1 CΣ t t 1, = AΣ t A T + W define L t = Σ t t 1 C T (CΣ t t 1 C T + V ) 1, t = 0,..., N Linear Quadratic Stochastic Control with Partial State Observation 10 7

set ˆx 0 = L 0 y 0 ; for t = 0,..., N 1, ˆx t+1 = Aˆx t + Bu t + L t+1 e t+1, e t+1 = y t+1 C(Aˆx t + Bu t ) e t+1 is next output prediction error e t+1 N(0, CΣ t+1 t C T + V ), independent of Y t Kalman filter gains L t do not depend on data B, Q, R Linear Quadratic Stochastic Control with Partial State Observation 10 8

Solution via dynamic programming let V t (Y t ) be optimal value of LQG problem, from t on, conditioned on the output history Y t : V t (Y t ) = min E φ t,...,φ N 1 ( N 1 ) (x T τ Qx τ + u T τ Ru τ ) + x T NQx N Y t τ=t we ll show that V t is a quadratic function plus a constant, in fact, V t (Y t ) = ˆx T t P tˆx t + q t, t = 0,..., N, where P t is the LQR cost-to-go matrix (ˆx t is a linear function of Y t ) Linear Quadratic Stochastic Control with Partial State Observation 10 9

we have V N (Y N ) = E(x T NQx N Y N ) = ˆx T NQˆx N + Tr(QΣ N ) (using x N Y N N(ˆx N,Σ N )) so P N = Q, q N = Tr(QΣ N ) dynamic programming (DP) equation is V t (Y t ) = min u t E ( x T t Qx t + u T t Ru t + V t+1 (Y t+1 ) Y t ) (and argmin, which is a function of Y t, is optimal input) with V t+1 (Y t+1 ) = ˆx T t+1p t+1ˆx t+1 + q t+1, DP equation becomes V t (Y t ) = min u t E ( x T t Qx t + u T t Ru t + ˆx T t+1p t+1ˆx t+1 + q t+1 Y t ) = E(x T t Qx t Y t ) + q t+1 + min u t ( u T t Ru t + E(ˆx T t+1p t+1ˆx t+1 Y t ) ) Linear Quadratic Stochastic Control with Partial State Observation 10 10

using x t Y t N(ˆx t, Σ t ), the first term is E(x T t Qx t Y t ) = ˆx T t Qˆx t + Tr(QΣ t ) using ˆx t+1 = Aˆx t + Bu t + L t+1 e t+1, with e t+1 N(0, CΣ t+1 t C T + V ), independent of Y t, we get E(ˆx T t+1p t+1ˆx t+1 Y t ) = ˆx T t A T P t+1 Aˆx t + u T t B T P t+1 Bu t + 2ˆx T t A T P t+1 Bu t + Tr ( (L T t+1p t+1 L t+1 )(CΣ t+1 t C T + V ) ) using L t+1 = Σ t+1 t C T (CΣ t+1 t C T + V ) 1, last term becomes Tr(P t+1 Σ t+1 t C T (CΣ t+1 t C T +V ) 1 CΣ t+1 t ) = TrP t+1 (Σ t+1 t Σ t+1 ) Linear Quadratic Stochastic Control with Partial State Observation 10 11

combining all terms we get V t (Y t ) = ˆx T t (Q + A T P t+1 A)ˆx t + q t+1 + Tr(QΣ t ) + TrP t+1 (Σ t+1 t Σ t+1 ) + min u t (u T t (R + B T P t+1 B)u t + 2ˆx T t A T P t+1 Bu t ) minimization same as in deterministic LQR problem thus optimal policy is φ t(y t ) = K tˆx t, with K t = (R + B T P t+1 B) 1 B T P t+1 A plugging in optimal u t we get V t (Y t ) = ˆx T t P tˆx t + q t, where P t = A T P t+1 A + Q A T P t+1 B(R + B T P t+1 B) 1 B T P t+1 A q t = q t+1 + Tr(QΣ t ) + TrP t+1 (Σ t+1 t Σ t+1 ) recursion for P t is exactly the same as for deterministic LQR Linear Quadratic Stochastic Control with Partial State Observation 10 12

Optimal objective optimal LQG cost is J = EV 0 (y 0 ) = q 0 + E ˆx T 0 P 0ˆx 0 = q 0 + TrP 0 (X Σ 0 ) using ˆx 0 N(0, X Σ 0 ) using q N = TrQΣ N and q t = q t+1 + Tr(QΣ t ) + TrP t+1 (Σ t+1 t Σ t+1 ) we get J = using Σ 0 1 = X N Tr(QΣ t ) + t=0 N TrP t (Σ t t 1 Σ t ) t=0 Linear Quadratic Stochastic Control with Partial State Observation 10 13

we can write this as J = N Tr(QΣ t )+ t=0 N TrP t (AΣ t 1 A T +W Σ t )+Tr(P 0 (X Σ 0 )) t=1 which simplifies to where J = J lqr + J est J lqr = Tr(P 0 X) + N Tr(P t W), t=1 J est = Tr((Q P 0 )Σ 0 ) + N Tr((Q P t )Σ t ) + Tr(P t AΣ t 1 A T ) t=1 J lqr is the stochastic LQR cost, i.e., the optimal objective if you knew the state J est is the cost of not knowing (i.e., estimating) the state Linear Quadratic Stochastic Control with Partial State Observation 10 14

when state measurements are exact (C = I, V = 0), we have Σ t = 0, so we get N J = J lqr = Tr(P 0 X) + Tr(P t W) t=1 Linear Quadratic Stochastic Control with Partial State Observation 10 15

Infinite horizon LQG choose policies to minimize infinite horizon average stage cost J = lim N optimal average stage cost is N 1 1 N E t=0 ( x T t Qx t + u T t Ru t ) J = Tr(QΣ) + Tr(P( Σ Σ)) where P and Σ are PSD solutions of AREs P = Q + A T PA A T PB(R + B T PB) 1 B T PA, Σ = A ΣA T + W A ΣC T (C ΣC T + V ) 1 C ΣA T and Σ = Σ ΣC T (C ΣC T + V ) 1 C Σ Linear Quadratic Stochastic Control with Partial State Observation 10 16

optimal average stage cost doesn t depend on X (an) optimal policy is u t = Kˆx t, ˆx t+1 = Aˆx t + Bu t + L(y t+1 C(Aˆx t + Bu t )) where K = (R + B T PB) 1 B T PA, L = ΣC T (C ΣC T + V ) 1 K is steady-state LQR feedback gain L is steady-state Kalman filter gain Linear Quadratic Stochastic Control with Partial State Observation 10 17

Example system with n = 5 states, m = 2 inputs, p = 3 outputs; infinite horizon A, B, C chosen randomly; A scaled so max i λ i (A) = 1 Q = I, R = I, X = I, W = 0.5I, V = 0.5I we compare LQG with the case where state is known (stochastic LQR) Linear Quadratic Stochastic Control with Partial State Observation 10 18

Sample trajectories sample trace of (x t ) 1 and (u t ) 1 in steady state 2 (xt)1 0 2 0 10 20 30 40 50 1 (ut)1 0 blue: LQG, red: stochastic LQR 1 0 10 20 30 40 50 t Linear Quadratic Stochastic Control with Partial State Observation 10 19

Cost histogram histogram of stage costs for 5000 steps in steady state 500 400 300 200 100 J 0 0 5 10 15 20 25 30 500 400 J lqr 300 200 100 0 0 5 10 15 20 25 30 Linear Quadratic Stochastic Control with Partial State Observation 10 20