Stochastic Nonlinear Stabilization Part II: Inverse Optimality Hua Deng and Miroslav Krstic Department of Mechanical Engineering denghua@eng.umd.edu http://www.engr.umd.edu/~denghua/ University of Maryland College Park, MD 7 fax: ( {977 krstic@eng.umd.edu ( {7 ( {6 http://www.engr.umd.edu/~krstic/ Submitted to Systems and Control Letters January, 997 Abstract After considering the stabilization of a specic class of stochastic nonlinear systems in a companion paper, in this second part, we address the classical question of when is a stabilizing (in probability controller optimal and show that for every system with a stochastic control Lyapunov function it is possible to construct a controller which is optimal with respect to a meaningful cost functional. hen we return to the problem from Part I and design an optimal backstepping controller whose cost functional includes penalty on control eort and which has an innite gain margin. Keywords: stochastic nonlinear systems, control Lyapunov functions, backstepping. his work was supported in part by the National Science Foundation under Grant ECS-9-86, in part by the Air Force Oce of Scientic Research under Grant F9696.
Introduction After considering the stabilization of strict-feedback stochastic systems in a companion paper [], in this paper we present the following new results: For general stochastic systems ane in the control and noise inputs, we design stabilizing control laws which are also optimal with respect to meaningful cost functionals. his result is a stochastic counterpart of the inverse optimality result of Freeman and Kokotovic [] for systems with deterministic uncertainties. Furthermore, we show how our backstepping design from [], which achieves stability, can be redesigned to also achieve inverse optimality. In contrast to the design of Pan and Basar [9], our cost functional includes penalty on the control eort and has an innite gain margin [](namely, the property that it remains stabilizing when multiplied by any constant no smaller than one, which is one of the main advantages of inverse optimality. Stochastic Control Lyapunov Functions Consider the system which, in addition to the noise input w, has a control input u: dx = f(xdt g (xdw g (xudt; (. where f( = ; g ( = and u IR m. We say that the system is globally asymptotically stabilizable in probability if there exists a control law u = (x continuous away from the origin, with ( =, such that the equilibrium x = of the closed-loop system is globally asymptotically stable in probability. Denition. A smooth positive denite radially unbounded function V : IR n IR is called a stochastic control Lyapunov function (sclf for system (. if it satises ( inf L uir m f V (g r @ V @x g L g V u < ; 8x 6= : (. he existence of an sclf guarantees global asymptotic stabilizability in probability, as shown in the following theorem whose proof we incorporate into the proof of heorem. for completeness. heorem. (Florchinger [] he system (. is globally asymptotically stabilizable in probability if there exists an sclf. Inverse Optimal Stabilization in Probability Denition. he problem of inverse optimal stabilization in probability for system (. is solvable if there exist a class K function whose derivative is also a class K function, a matrix-valued function R (x such that R (x = R (x > for all x, a positive denite radially unbounded function l(x, and a feedback control law u = (x continuous away from the origin with ( =, which guarantees global asymptotic stability in probability of the equilibrium x = and minimizes the cost functional J(u = E Z h l(x R (x = u i d : (.
In the language of stochastic risk-sensitive optimal control [], the cost functional (. would correspond to the risk-neutral case. In the next theorem we extensively use the Legendre-Fenchel transform given as follows: For a class K function, whose derivative is also a class K function, ` denotes `(r = heorem. Consider the control law Z r (? (sds: (. Lg V R?= u = (x =?R? (L g V ` ; (. L g V R?= where V (x is a Lyapunov function candidate, is a class K function whose derivative is also a class K function, and R (x is a matrix-valued function such that R (x = R (x >. If the control law (. achieves global asymptotic stability in probability for the system (. with respect to V (x, then the control law u = (x =? R? (L g V (? Lg V R?= L g V R?= ; (. solves the problem of inverse optimal stabilization in probability for the system (. by minimizing the cost functional (Z J(u = E l(x # R = u d ; (. where l(x = ` Lg V R?= (? ` Lg V R?=? L f V? # (g r @ V @x g : (.6 Proof. Since the control law (. globally asymptotically stabilizes the system in probability, then there exists a continuous positive denite function W : IR n IR such that LV j (: = L f V (g r @ V @x g L g V (x L f V (g r =?` Lg @ V R?= V @x g?w (x: (.7 hen we have l(x W (x (? ` Lg V R?= : (.8
Since W (x is positive denite,, and ` is a class K function (Lemma A., l(x is positive denite. herefore J(u is a meaningful cost functional. Before we engage into proving that the control law (. minimizes (., we rst show that it is stabilizing. With Lemma A. we get LV j (: = L f V (g r @ V @x g = L f V r (g ( @ V @x g? L g V R?=?? Lg V R?= i h` Lg V R?= (? Lg V R?= LV j (: < ; 8x 6= ; (.9 which proves that (. achieves global asymptotic stability in probability and, in particular, that x(t with probability. Now we prove optimality. Recalling that the It^o dierential of V is dv = LV (xdt @V @x g (xdw; (. according to the property of It^o's integral [8, heorem.9], we get Z t E V (? V (t LV (x(d = : (. hen substituting l(x into J(u, we have (Z J(u = E l(x Now we note that which yields = E fv (x(g E? lim E fv (x(tg t = E fv (x(g E (Z (Z # u d R = LV j (: l(x R = u # R = u d ` Lg V R?= L g V u] dg? lim t E fv (x(tg : (. J(u = E fv (x(g E? R= R = u (Z = Lg V R?= R= u R= u R= u R = u u 7 d 9 >= >; ; (. ` R= u? lim E fv (x(tg : (. t
With the general Young's inequality (Lemma A., we obtain J(u E fv (x(g E? R= u (Z? ` R= R= u # u d ` R= u? lim t E fv (x(tg = E fv (x(g? lim t E fv (x(tg ; (. where the equality holds if and only if R= u R= u R= u = R= u u R= R= u ; (.6 that is, u = u. Since u = u is stabilizing in probability, lim t E fv (x(tg =, and thus arg min u J(u = u (.7 min u J(u = E fv (x(g : (.8 o satisfy the requirements of Denition., it only remains to prove that (x is continuous and ( =. Because g ; R and @V =@x are continuous functions, and (? is a class K function, (x is continuous away from L g V R?= =. If Lg V R?=, continuity can be inferred from the fact that lim L g V R?= j (xj = lim L g V R?= = lim L g V R?= 8 < : ( R?= R?= Lg V R?= (? Lg V R?= 9 = L g V R?= ; (? Lg V R?= = : (.9 Since @V @x ( =, L g V ( = and we have ( =. Remark. Even though not explicit in the proof of heorem., V (x solves the following family of Hamilton-Jacobi-Bellman equations parameterized by [; : L f V r (g @ V @x g? ` Lg V R?= l(x = : (. In the next corollary we design controllers which are inverse optimal in the sense of Denition.. Corollary. If the system (. has an sclf, then the problem of inverse optimal stabilization in probability is solvable.
Proof. Let (r = r, =, and R (x = I 8 >< >:? L g V (L g V r L g V (L g V ; L g V 6= any positive number; L g V = : (. hen, with (? (r = r, the optimal control candidate (. is given by Sontag's formula []: where u = s (x = 8 >< >:? r L g V (L g V L g V (L g V (L g V ; L g V 6= ; L g V = = L f V r (g (. @ V @x g : (. According to [], this formula is smooth away from the origin because (. implies that, when x 6=, L g V = < : (. We note that the control law s (x will also be continuous at the origin if and only if the sclf V (x satises the small control property [] (the property that there exist some control law continuous at the origin which is stabilizing with respect to V (x. o prove that s (x is inverse optimal, we prove that the control law (. is stabilizing. Since ` (r = r, (. becomes u = s(x. hen the innitesimal generator of the system (. with the control law u = s(x is? LV =? r L g V (L g V # ; (. which is negative denite because of (.. Inverse Optimal Stabilization via Backstepping he controller given in able in [] is stabilizing but is not inverse optimal because it is not of the form (.. In this section we will redesign the stabilizing functions i (x to get an inverse optimal control law. n o design a control law in the form (., we rst note from V = z i that for system dx i = x i dt ' i (x i dw; i = ; ; n? (. dx n = udt ' n (x n dw; (. 6 i=
L g V = z n and, since u is a scalar input, R (x is sought as a scalar positive function. he following lemma is instrumental in motivating the inverse optimal design. Lemma. If there exists a continuous positive function M(x such that the control law u = (x =?M(xz n (. globally asymptotically stabilizes system (. in probability with respect to the Lyapunov n function V = z i, then the control law i= u = (x = (x; (. solves the problem of inverse optimal stabilization in probability. Proof. Let (r = r (. R = M?= : (.6 hen the control laws (. and (. are given, respectively, as u = (x =? R?= z n ; (.7 u = (x =? R?= z n : (.8 By dividing the last two expressions, we get (., and the lemma follows by heorem.. From the lemma it follows that we should seek a stabilizing control law in the form of (.. If we consider carefully the parenthesis including u in (. in [], every term except the second, the third, and the sixth has z n as a factor. We now deal with the three terms one at a time. he second term yields?z n n? l= =?z n @ n? x l n? l= @ =?z n? n z n n? @ n? z l @x n?? @ l l= k= n? l= l k= @ z n? n z k lk z l? n? @ n? A @x n? l= z n lk @ n? lk n? l= @ z n? n l k= @ n? z n l z lk k 7 z k lk l z l
= z n n? = z n @ l l= k= lk @ n? @ n? A @x n? l= z k n? n? z k k= l=k lk n? z l l= l? @ n? A @x n? n? l= l? n? l= n? @ n? l l l= k= n? @ n? l where the inequality is obtained by applying Young's inequality, l l= k= @ n? lk lk @ n? lk lk z l ; (.9 @?z n? n z l @ n? z n l z l l (. @ z n? n z k lk @ n? z n lk lk lk z k (.? @ n? @x n? @ n? @x n? ; (. and the last equation comes from changing the summation order in the second term. As to the third and the sixth terms, in the rst parenthesis in (: in [], we have? z n =? z n =? 8 z n = 8 z n n? p;q= n? @ n? ' p ' q n? p;q= p @ n? q p;q= k= l= n? p q p;q= k= l= n? p p;q= k= l= n? p p q z k pk k= @ z n? n pqkl z n q q p;q= k= l= l= pk ql z k z l @ n? pqkl @ n? pqkl @ n? z l ql pk ql z k z pqkl l pk ql z k n? n? pk ql z k p q z l p;q= k= l= pqkl l= z l n? n? p p= q=l k= pqkl (. and z n nn n? k= z k nk 8
= n? = 9 z n z nnn nk zk k= n? k nn nk z n # z k k= k n? k nn nk n? k= k= k Substituting (.9, (. and (. into (. in [], we have LV z n 8 < : u z n 8 z n 9 z n z n? @ p q p;q= k= l= n? k= @ n? n? A @x n? l= pqkl @ n? k nn nk z n @ z n? l= l z r z n? i= i?? l= ii z i r z i n? @ i z i l=i @ i? x l? i? k= z k ik z i n k? k=i l= kil n r n? @ n? l z k: (. l l= k= pk ql z k z n n? z nnn nn n? n? nkj nlj j= k= l= nkl z z k? k= l= If li ; pqil ; and i are chosen as li i? p;q= kl n? z i i? z i r i? n? p p= q= k= n? n? p= q=i k= 9 = ; pqk p @ i? ' p ' q i z i i? ikj ilj j= k= l= ikl z pqki i? z i i z i z i ii ii @ n? lk lk A : (. n? l=i li n? l= l? i?? n? n? p p= q= k= pqk n? n? p= q=i k= pqki where c and c i are those in [, (. and (.], and u =?M(xz n 9 p?? i = c (.6 = c i ; (.7 (.8
M(x = c n n? nn nn @ 8 n? p p;q= k= l= @ n? A @x n? q pqkl @ n? r n? n? nkj nlj j= k= l= nkl n? l= n? @ n? l 9 pk ql z k n? k= l l= k= k nn nk @ n? lk lk where c i > ; i = ; ; n; and M(x is a positive function, with (.6{(.8, we get LV? n i= (.9 c i z i : (. hus, according to Lemma., we achieve not only global asymptotic stability in probability, but also inverse optimality. heorem. he control law u =? M(xz n; (. guarantees that the equilibrium at the origin of the system (. is globally asymptotically stable in probability and also minimizes the cost functional (Z J(u = E ; (. l(x 7 6 M(x? u # d for some positive denite radially unbounded function l(x parameterized by. Remark. We point out that the quartic form of the penalty on u in (. is due to the quartic nature of the sclf. Appendix Lemma A. (Krstic and Li [7] If and its derivative are class K functions, then the Legendre-Fenchel transform satises the following properties: (a `(r = r (? (r? (? (r = Z r (? (sds (A. (b `` = (A. (c ` is a class K function (A. (d ` ( (r = r (r? (r: (A. Lemma A. (Young's inequality [, heorem 6] For any two vectors x and y, the following holds x y (jxj `(jyj; and the equality is achieved if and only if y = (jxj x jxj ; that is, for x = (? (jyj y jyj : (A. (A.6
References []. Basar and P. Bernhard, H -Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach, Boston, MA: Birkhauser, nd ed., 99. [] H. Deng and M. Krstic, \Stochastic nonlinear stabilization Part I: A backstepping design, submitted to Systems & Control Letters, 997. [] P. Florchinger, \A universal formula for the stabilization of control stochastic dierential equations, Stochastic Analysis and Applications, vol., pp. {6, 99. [] R. A. Freeman and P. V. Kokotovic, Robust Nonlinear Control Design, Boston, MA: Birkhauser, 996. [] G. Hardy, J. E. Littlewood, and G. Polya, Inequalities, nd Edition, Cambridge University Press, 989. [6] M. Krstic, I. Kanellakopoulos, and P. V. Kokotovic, Nonlinear and Adaptive Control Design, New York: Wiley, 99. [7] M. Krstic and Z. H. Li, \Inverse optimal design of input-to-state stabilizing nonlinear controllers, submitted to IEEE ransactions on Automatic Control, 996. [8] B. ksendal, Stochastic Dierential Equations{An Introduction with Applications, New York, Springer-Verlag, 99. [9] Z. Pan and. Basar, \Backstepping controller design for nonlinear stochastic systems under a risk-sensitive cost criterion, submitted to SIAM Journal of Control and Optimization, 996. [] R. Sepulchre, M. Jankovic and P. V. Kokotovic, Constructive Nonlinear Control, New York: Springer-Verlag, 997. [] E. D. Sontag, \A `universal' construction of Artstein's theorem on nonlinear stabilization, Systems & Control Letters, vol., pp. 7{, 989.