2. LINEAR QUADRATIC DETERMINISTIC PROBLEM Notations: For a vector Z, Z = Z, Z is the Euclidean norm here Z, Z = i Z2 i is the inner product; For a vector Z and nonnegative definite matrix Q, Z Q = Z, QZ is L 2 -norm with the Kernel Q: Z, QZ = i,j Z i Q ij Z j. is transposition symbol; grad is symbol of gradient; A controlled process X t is defined by a linear vector differential equation with control action Ut: Ẋ t = atx t + ctut 2.1 subject to the fixed initial condition X = x. Here X t and Ut are vectors of sizes k and r, and at k k, ct k r are matrix valued functions of the argument t. The control Ut has to be chosen such that to track a smooth vector valued function φt, of the size k, in a sense of the minimization of cost functional Jx, φ; U = X T φt 2 h + T X t φt 2 Ht + Ut 2 Rt dt, 2.2 where h k k, Ht k k and Rt r r are nonnegative definite matrices Rt is uniformly in t nonsingular. 2.1. Preliminaries. Set Y t = X t φt. Since φt is assumed to be differentiable, by 2.1 we find the differential equation for Y t Ẏ t = aty t + [ atφt φt ] + ctut 2.3 subject to Y = X φ := y. Notice also that the cost functional Jx, φ : U is transformed to T Jy, φ; U = Y T 2 h + Y t 2 Ht + Ut 2 Rt dt. 2.4 As in Lecture 1, we introduce the Bellman function T V t, y = Y s 2 Hs + Us 2 Rs ds min Y T 2 h + Us:s t T t 1
and apply Bellman s principle of optimality: t+δ V t, y = min Us:t s t+δ where Y t = y and for s > t t Y s 2 Hs + Us 2 Rs Ẏ s = asy s + [ asφs φs ] + csus. ds + V t + δ, Y t+δ, 2.5 With a help of 2.5 we derive heuristically the Bellman equation V t, y = min y 2 Ht + U 2 Rt t U [ + grad y V t, y aty + [ atφt φt ] ] 2.6 + ctu subject to boundary condition V T, y = y 2 h. 2.2. The optimal control. We find now [ U t, y = argmin y 2 Ht+ U 2 Rt+grad y V t, y aty+ [ atφy φt ] ] +ctu U Obviously, this procedure is reduced to a minimization in U of the quadratic form QU := U 2 Rt + grad y V t, yctu. Since QU is the quadratic form U solves the equation grad U QU =. Notice that grad U U 2 Rt = 2U Rt, so that Hence, grad U QU = 2U Rt + grad y V t, yct. U t, y = 1. 2 R 1 tc t grad y V t, y 2.7 Then, the Bellman equation is transformed to V t, y = y 2 Ht + 1 grad t 4 y V t, yctr 1 2 Rt + grad y V t, y {aty + [ ]} atφy φt 1 2 grad yv t, yctr 1 yc t grad y V t, y. 2
Owing to grad y V t, yctr 1 tc 1 t grad y V t, y = grad 4 y V t, yctr 1 t we find that V t, y = y 2 Ht 1 grad t 4 y V t, yctr 1 2 Rt { + grad y V t, y aty + [ atφt φt ]}. 2 Rt 2.8 As in Lecture 1, we shall find a solution of 2.8 as quadratic form in y with nonnegative definite matrix Γt: V t, y = y Γty + y Bt + Qt; in particular, then grad y V t, y = 2y Γt + B t. Substituting that V t, y and grad y V t, y in 2.8 we arrive at the identity y Γty + y Ḃt + Qt y 2 Ht 1 2y Γt + B t ctr 1 tc t 2y Γt + B t 4 + 2y Γt + B t aty + [ ] atφt φt which provide, with 2y Γtaty y Γtaty + y a tγty, the differential equations Γt = Ht ΓtctR 1 c tγt + Γtat + a tγt a Ḃt = t ΓtctR 1 c t Bt + 2Γt [ atφt φt ] Qt = 1 4 B tctr 1 tc tbt 2.9 subject to the boundary conditions ΓT = h, QT = and QT =. Hence, by 2.7, U t, y = R 1 tc t Γty + Bt. 2.1 Thus, the optimal control for the original problem is defined as follows U t, Xt = R 1 tc t Γt { Xt φt } + Bt where Ẋ t = atx t + ctu t, X t. Remark. If φt atφt, then Bt and Qt. 3
2.3. Infinite horizon. From application point of view, it makes sense to analyze a case of very large time T, that is T =. To simplify an analysis, we assume that all matrices are time-independent, i.e. φt, and Jx, U = Ẋ t = ax t + cut, 2.11 X t 2 H + Ut 2 R dt. 2.12 The main problem for this setting is a requirement that there exists a control action Ut for which Jx, U <. We give conditions, guaranteeing the latter, expressed in terms of matrices a, c and H. Recall that matrix R is nonsingular. With T k, we consider a family of cost functionals J Tk x, U k 1. For every k, we have Tk J Tk x, U = X t 2 H + Ut 2 R dt. For fixed k, denote the optimal control by U k t. From the obtained above result we know that by Remark Bt, Qt U k t = R 1 c Γ k t, where Γ k t solves the Riccati equation Γ k t = a Γ k t + Γ k a + H Γ k tcr 1 c Γ k subject to the boundary condition Γ k T =. Moreover, min J Tk x, U = x 2 Γ Us: s T k. k We show that x 2 is an increasing sequence. Γ k k 1 Denote by Xt k the controlled process associated with optimal control U k t on [, T k ]. Then Tk x 2 Γ k = Xt k 2 H + U k t R 2 dt and so x 2 Γ k+1 = Tk+1 Tk Tk X k+1 t 2 H + U k+1 t 2 R dt X k+1 t 2 H + U k+1 t 2 R dt since T k+1 > T k X k t 2 H + U k t 2 R dt since U k is optimal = x 2 Γ k. 4
Consequently, lim k x 2 exists but we can not be sure that this limit is finite. Γ k To prove lim k x 2 Γ k <, notice that it suffices to choose some control action Ũt such that Jx, Ũ <. Then, whereas x 2 J Γ k T k x, Ũ Jx, Ũ, for any k, we get the desired property. Assuming the matrix 1 A = e as cc e a s ds is nonsingular. Then, taking { c Ũt = e a t 1x e as cc e ds a s t 1, t > 1, where x = X and Xt = a X t + cũt, we find X 1 = e a x + = e a [I =. e at cũtdt e at cc e a t dt 1 ] e at cc e a t dt x The latter allows us to conclude that X t, t 1, and at the same time Jx, Ũ = J 1x, Ũ <. So, it remains to prove that A is nonsingular matrix. Theorem. The matrix A is nonsingular, if the block-matrix of size k kr is of the full rank k. 2.13 Ga, c = c ac... a k 1 c. 2.14 Proof. Assume A is singular. Since A is nonnegative definite, there is vector Z so that = Z e at cc e a t dsz = Z e at cc e a t Zdt. On the other hand, Z e at cc e a t Z, being nonnegative and continuous in t, is equal zero for all t [, 1]. Hence, Z e at c. Differentiating this identity in t and letting 5
t =, we get z e at c t= = z c = d z e at c = z ac = dt t=...... d j dt j z e at c t= = z a j c =...... Consequently, Z Ga, cg a, cz = k 1 j= Z ca j a j c Z =, so that the matrix Ga, cg a, c is singular. The contradiction obtained, finishing the proof. Definition. The pair of matrices a, c is said to be controllable, if the matrix Ga, c is of rank k. Thus, if a, c is controllable, lim k x 2 exists for any x and is finite. Then, Γ k obviously, lim k Γ k exists as well and denote this limit by Γ. Since Γ k t = a Γ k t + Γ k ta + H Γ k tcr 1 c Γ k t subject to Γ k T =, the limit Γ solves the algebraic Riccati equation a Γ + Γa + H ΓcR 1 c Γ =. 2.15 The next fact we use without proof is that 2.15 possesses the unique positive definite solution provided that the matrix H Hc gh, c =. Hc k 1 is of the full rank k. In particular, it holds if H is nonsingular. Definition. The pair of matrices H, c is said to be observable, if the matrix gh, c is of rank k. 2.3.1. The optimal control for infinite horizon. Since xγ k 2 J Tk x, U Jx, U for any k, and lim k x 2 Γ k = xγ 2, we find the lower bound The optimal control regarding to [, T k ] is Jx, U x 2 Γ. 2.16 U t, X t = R 1 tc tγ k tx t. 6
Hence a control U t, X t = R 1 tc tγx t. is a candidate to be the optimal control for the infinite horizon. The check that supposition, let us consider a quadratic form X t 2 Γ. Write d dt X t 2 Γ = Ẋ t ΓXt + Xt ΓẊ t = X t a + U X t c ΓXt + Xt Γ axt + cu Xt ] [ = Xt [a ΓcR 1 c ΓXt + Xt Γ a cr 1 c Γ ]X t = Xt a Γ + Γa 2P cr 1 c Γ Xt = Xt H + ΓcR 1 c Γ Xt Hence, we find that for any T > = X t HX t X t ΓcR 1 RR 1 c ΓX t = X t HX t U X t RU X t. J T x, U = x 2 Γ X T 2 Γ x 2 Γ. Thereby, Jx, U = lim T J T x, U x 2 Γ. So, by 2.16 we have Jx, U = x 2 Γ, that is U X t is the optimal control. 1. Home work: Vector case. Feedback control. Bellman equation Let X t, t =, 1,..., N be the controlled vector sequence sequence defined by linear vector-matrix recursion X t+1 = ax t + cut, where U t is the control. X t and U t are vectors of sizes kand r respectively. a = a k k and c = c k r are known matrices. X = x and x is known. It is required to choose the optimal control U t minimizing the cost functional h k k, H k k are nonnegative definite matrices, R r r is positive definite matrix T 1 Jx, U = X T 2 h + X t 2 H + Ut 2 R. 2.17 1. Derive Bellman s equation for the Bellman function [ T 1 V x, t = X T 2 h + min U s:t s T 2. Find the optimal control. t=1 7 s=t X s 2 H + Us 2 R ]