EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1
Linear stochastic system linear dynamical system, over finite time horizon: x t+1 = Ax t + Bu t + w t, t = 0,..., N 1 w t is the process noise or disturbance at time t w t are IID with E w t = 0, E w t w T t = W x 0 is independent of w t, with Ex 0 = 0, Ex 0 x T 0 = X Linear Quadratic Stochastic Control 5 2
Control policies state-feedback control: u t = φ t (x t ), t = 0,..., N 1 φ t : R n R m called the control policy at time t roughly speaking: we choose input after knowing the current state, but before knowing the disturbance closed-loop system is x t+1 = Ax t + Bφ t (x t ) + w t, t = 0,..., N 1 x 0,..., x N, u 0,..., u N 1 are random Linear Quadratic Stochastic Control 5 3
Stochastic control problem objective: J = E ( N 1 t=0 ( x T t Qx t + u T t Ru t ) + x T N Q f x N ) with Q, Q f 0, R > 0 J depends (in complex way) on control policies φ 0,..., φ N 1 linear-quadratic stochastic control problem: choose control policies φ 0,..., φ N 1 to minimize J ( linear refers to the state dynamics; quadratic to the objective) an infinite dimensional problem: variables are functions φ 0,..., φ N 1 Linear Quadratic Stochastic Control 5 4
Solution via dynamic programming let V t (z) be optimal value of objective, from t on, starting at x t = z V t (z) = min E φ t,...,φ N 1 ( N 1 τ=t ( x T τ Qx τ + u T τ Ru τ ) + x T N Q f x N ) subject to x τ+1 = Ax τ + Bu τ + w τ, u τ = φ τ (x τ ) we have V N (z) = z T Q f z J = EV 0 (x 0 ) (expectation over x 0 ) Linear Quadratic Stochastic Control 5 5
V t can be found by backward recursion: for t = N 1,...,0 V t (z) = z T Qz + min v { v T Rv + EV t+1 (Az + Bv + w t ) } expectation is over w t we do not know where we will land, when we take u t = v optimal policies have form φ t(x t ) = argmin v { v T Rv + EV t+1 (Ax t + Bv + w t ) } Linear Quadratic Stochastic Control 5 6
Explicit form let s show (via recursion) value functions are quadratic, with form V t (x t ) = x T t P t x t + q t, t = 0,..., N, with P t 0 P N = Q N, q N = 0 now assume that V t+1 (z) = z T P t+1 z + q t+1 Linear Quadratic Stochastic Control 5 7
Bellman recursion is V t (z) = z T Qz + min v {v T Rv + E((Az + Bv + w t ) T P t+1 (Az + Bv + w t ) + q t+1 )} = z T Qz + Tr(WP t+1 ) + q t+1 + min v {v T Rv + (Az + Bv) T P t+1 (Az + Bv)} we use E(w T t P t+1 w t ) = Tr(WP t+1 ) same recursion as deterministic LQR, with added constant optimal policy is linear state feedback: φ t(x t ) = K t x t, K t = (B T P t+1 B + R) 1 B T P t+1 A (same form as in deterministic LQR) Linear Quadratic Stochastic Control 5 8
plugging in optimal w gives V t (z) = z T P t z + q t, with P t = A T P t+1 A A T P t+1 B(B T P t+1 B + R) 1 B T P t+1 A + Q q t = q t+1 + Tr(WP t+1 ) first recursion same as for deterministic LQR second term is just a running sum we conclude that P t, K t are same as in deterministic LQR strangely, optimal policy is same as LQR, and independent of X, W Linear Quadratic Stochastic Control 5 9
optimal cost is J = E V 0 (x 0 ) = Tr(XP 0 ) + q 0 N = Tr(XP 0 ) + Tr(WP t ) t=1 interpretation: x T 0 P 0 x 0 is optimal cost of deterministic LQR, with w 0 = = w N 1 = 0 Tr(XP 0 ) is average optimal LQR cost, with w 0 = = w N 1 = 0 Tr(WP t ) is average optimal LQR cost, for Ex t = 0, Ex t x T t = W, w t = = w N 1 = 0 Linear Quadratic Stochastic Control 5 10
Infinite horizon choose policies to minimize average stage cost J = lim N N 1 1 N E t=0 ( x T t Qx t + u T t Ru t ) optimal average stage cost is J = Tr(WP ss ) where P ss satisfies the ARE P ss = Q + A T P ss A A T P ss B(R + B T P ss B) 1 B T P ss A optimal average stage cost doesn t depend on X Linear Quadratic Stochastic Control 5 11
(an) optimal policy is constant linear state feedback u t = K ss x t where K ss = (R + B T P ss B) 1 B T P ss A K ss is steady-state LQR feedback gain doesn t depend on X, W Linear Quadratic Stochastic Control 5 12
Example system with n = 5 states, m = 2 inputs, horizon N = 30 A, B chosen randomly; A scaled so max i λ i (A) = 1 Q = I, Q f = 10I, R = I x 0 N(0, X), X = 10I w t N(0, W), W = 0.5I Linear Quadratic Stochastic Control 5 13
Sample trajectories sample trace of (x t ) 1 and (u t ) 1 4 2 (xt)1 0 2 4 0 5 10 15 20 25 30 2 (ut)1 1 0 1 0 5 10 15 20 25 30 blue: optimal stochastic control, red: no control (u 0 = = u N 1 = 0) t Linear Quadratic Stochastic Control 5 14
cost histogram for 1000 simulations Cost histogram 200 100 J 0 0 100 200 300 400 500 600 700 200 100 0 0 100 200 300 400 500 600 700 200 100 0 0 100 200 300 400 500 600 700 200 100 J pre J ol J nc 0 0 100 200 300 400 500 600 700 J Linear Quadratic Stochastic Control 5 15
Comparisons we compared optimal stochastic control (J = 224.2) with prescient control decide input sequence with full knowledge of future disturbances u 0,..., u N 1 computed assuming all w t are known J pre = 137.6 open-loop control u 0,..., u N 1 depend only on x 0 u 0,..., u N 1 computed assuming w 0 = = w N 1 = 0 J ol = 423.7 no control u 0 = = u N 1 = 0 J nc = 442.0 Linear Quadratic Stochastic Control 5 16