Linearly Solvable Stochastic Control Lyapunov Functions

Linearly Solvable Stochastic Control Lyapunov Functions Yoke Peng Leong, Student Member, IEEE,, Matanya B. Horowitz, Student Member, IEEE, and Joel W. Burdick, Member, IEEE, arxiv:4.45v3 [math.oc] 5 Sep 5 Abstract This paper presents a new method for synthesizing stochastic control Lyapunov functions for a class of nonlinear stochastic control systems. The technique relies on a transformation of the classical nonlinear Hamilton-Jacobi-Bellman partial differential equation to a linear partial differential equation for a class of problems with a particular constraint on the stochastic forcing. This linear partial differential equation can then be relaed to a linear differential inclusion, allowing for relaed solutions to be generated using sum of squares programming. The resulting relaed solutions are in fact viscosity super/subsolutions, and by the maimum principle are pointwise upper and lower bounds to the underlying value function, even for coarse polynomial approimations. Furthermore, the pointwise upper bound is a stochastic control Lyapunov function, yielding a method for generating nonlinear controllers with pointwise bounded distance from the optimal cost when using the optimal controller. These approimate solutions may be computed with non-increasing error via a hierarchy of semidefinite optimization problems. Finally, this paper develops a-priori bounds on trajectory suboptimality when using these approimate value functions, as well as demonstrates that these methods, and bounds, can be applied to a more general class of nonlinear systems not obeying the constraint on stochastic forcing. Simulated eamples illustrate the methodology. I. INTRODUCTION The study of system stability is a central theme of control engineering. A primary tool for such studies is Lyapunov theory, wherein an energy-like function is used to show that some measure of distance from a stability point decays over time. The construction of Lyapunov functions that certify system stability advanced considerably with the introduction of Sums of Squares (SOS) programming, which has allowed for Lyapunov functions to be synthesized for both polynomial systems [] and more general vector fields []. To address the more challenging problem of stabilization, rather than the analysis of an eisting closed loop system, it is possible to generalize Lyapunov functions to incorporate control inputs. The eistence of a Control Lyapunov Function (CLF) (see [3] [5]) is sufficient for the construction of a stabilizing controller. However, the synthesis of a CLF for general systems remains an open question. Unfortunately, the SOS-based methods cannot be naively etended to the generation of CLF solutions, due to the bilinearity between the Lyapunov function and control input. Y. P. Leong and M. B. Horowitz are with the Control and Dynamical Systems, California Institute of Technology, Pasadena, CA 95, USA ypleong@caltech.edu, mhorowit@caltech.edu J. W. Burdick is with the Mechanical Engineering, California Institute of Technology, Pasadena, CA 95, USA jwb@robotics.caltech.edu However, for several large and important classes of systems, CLFs are in fact known and may be used for stabilization, with a review of the theory available in [5]. The drawback is that these CLFs are hand-constructed and may be shown to be arbitrarily suboptimal. One way to circumvent this issue is through the use of Receding Horizon Control (RHC), which allows for the incorporation of optimality criteria. Euler-Lagrange equations are used to construct a locally optimum trajectory [6], and stabilization is guaranteed by constraining the terminal cost in the RHC problem to be a CLF. Suboptimal CLFs have found etensive use, with applications in legged locomotion [7] and distributed control [8]. Adding stochasticity to the governing dynamics compounds these difficulties [9], []. A complementary area in control engineering is the study of the Hamilton-Jacobi-Bellman (HJB) equation that governs the optimal control of a system. Methods to calculate the solution to the HJB equation via semidefinite programming have been proposed previously by Lasserre et al. []. In their work, the solution and the optimality conditions are integrated against monomial test functions, producing an infinite set of moment constraints. By truncating to any finite list of monomials, the optimal control problem is reduced to a semidefinite optimization problem. The method is quite general, applicable to any system with polynomial nonlinearities. In this work, we propose an alternative line of study based on the linear structure of a particular form of the HJB equation. Since the late 97s, Fleming [], Holland [3] and other researchers thereafter [4], [5] have made connections between stochastic optimal control and reaction-diffusion equation through a logarithmic transformation. Recently, when studying stochastic control using the HJB equation, Kappen [6] and Todorov [7] discovered that particular assumptions on the structure of a dynamical system, given the name linearly solvable systems, allows a logarithmic transformation of the optimal control equation to a linear partial differential equation form. The linearity of this class of problems has given rise to a growing body of research, with an overview available in [8]. Kappen s work focused on calculating solutions via path integral techniques. Todorov began with the analysis of particular Markov decision processes, and showed the connection between the two paradigms. This work was built upon by Theodorou et al. [9] into the Path Integral framework in use with Dynamic Motion Primitives. These results have been developed in many compelling directions [8], [] []. This paper combines these previously disparate fields of linearly solvable optimal control and Lyapunov theory, and provides a systematic way to construct stabilizing controllers

with guaranteed performance. The result is a hierarchy of SOS programs that generates stochastic CLFs (SCLF) for arbitrary linearly solvable systems. Such an approach has many benefits. First and foremost, this approach generates stabilizing controllers for an important class of nonlinear, stochastic systems even when the optimal controller is not found. We prove that the approimate solutions generated by the SOS programs are pointwise upper and lower bounds to the true solutions. In fact, the upper bound solutions are SCLFs which can be used to construct stabilizing controllers, and they bound the performance of the system when they are used to construct suboptimal controllers. Eisting methods for the generation of SCLFs do not have such performance guarantees. Additionally, we demonstrate that, although the technique is based on linearly solvability, it may be readily etended to more general systems, including deterministic systems, while inheriting the same performance guarantees. A preliminary version of this work appeared in [3] and [4], where the use of sum of squares programming for solving the HJB were first considered. This paper builds on this recent body of research, studying the stabilization and optimality properties of the resulting solutions. These previous works focused on path planning, rather than stabilization, and did not include the stability analysis or suboptimalty guarantees presented in this paper. A short version of this work appeared in [5] which included less details and did not include the etensions in Section V. The rest of this paper is organized as follows. Section II reviews linearly solvable HJB equations, SCLFs, and SOS programming. Section III introduces a relaed formulation of the HJB solutions which is efficiently computable using the SOS methodology. Section IV analyzes the properties of the relaed solutions, such as approimation errors relative to the eact solutions. This section shows that the relaed solutions are SCLFs, and that the resulting controller is stabilizing. The upper bound solution is also shown to bound the performance when using the suboptimal controller. Section V summarizes some etensions of the method to handle issues such as approimation of optimal control problems which are not linearly solvable, robust controller synthesis, and nonpolynomial systems. Two eamples are presented in Section VI to illustrate the optimization technique and its performance. Section VII summarizes the findings of this work and discusses future research directions. II. BACKGROUNDS This section briefly describes the paper s notation and reviews necessary background on the linear HJB equation, SCLFs, and SOS programming. A. Notation Table I summarizes the notation of different sets appearing in the paper. A compact domain in R n is denoted as Ω where Ω R n, and its boundary is denoted as Ω. A domain Ω is a basic closed semialgebraic set if there eists g i () R[] for i =,,..., m such that Ω = { g i () i =,,..., m}. Notation Z + R R + R n R[] R n m R n m [] K C k,k TABLE I SET NOTATION Definition All positive integers All real numbers All nonnegative real numbers All n-dimensional real vectors All real polynomial functions in All n m real matrices All M R n m such that M i,j R[] i, j All continuous nondecreasing functions µ : R + R + such that µ() =, µ(r) > if r >, and µ(r) µ(r ) if r > r All functions f such that f is k-differentiable with respect to the first argument and k -differentiable with respect to the second argument A point on a trajectory, (t) R n, at time t is denoted t, while the segment of this trajectory over the interval [t, T ] is denoted by t:t. Given a polynomial p(), p() is positive on domain Ω if p() > Ω, p() is nonnegative on domain Ω if p() Ω, and p() is positive definite on domain Ω where Ω, if p() = and p() > for all Ω\{}. If it eists, the infinity norm of a function is defined as f = sup f() for Ω. To improve readability, a function, f(,..., n ), is abbreviated as f when the arguments of the function are clear from the contet. B. Linear Hamilton-Jacobi-Bellman (HJB) Equation Consider the following affine nonlinear dynamical system, d t = (f( t ) + G( t )u t ) dt + B( t ) dω t () where t Ω is the state at time t in a compact domain Ω R n, and u t R m is the control input, f() R n [], G() R n m [], and B() R n l [] are real polynomial functions of the state variables, and ω t R l is a vector consisting of Brownian motions with covariance Σ ɛ, i.e., ωt i has independent increments with ωt i ωs i N (, Σ ɛ (t s)), for N ( µ, σ ), a normal distribution. The domain Ω is assumed to be a basic closed semialgebraic set defined as Ω = { g i () R[], g i () i =,,..., m}. Etensions to nonpolynomial functions is discussed in Section V-D. Without loss of generality, let Ω and = be the equilibrium point, whereby f() =, G() = and B() =. The goal is to minimize the following functional, ] J(, u) = E ωt [φ( T ) + T q( t ) + ut t Ru t dt subject to (), where φ R[], φ : Ω R + represents a state-dependent terminal cost, q R[], q : Ω R + is state dependent cost, and R R m m is a positive definite matri. The final time, T, unknown a priori, is the time at which the system reaches the domain boundary or the origin. This problem is generally called the first eit problem. The epectation E ωt is taken over all realizations of the noise ω t. For stability of the resultant controller to the origin, q( ) and φ( ) are also required to be positive definite functions. ()

3 The solution to this minimization problem is known as the value function, V : Ω R +, where beginning from an initial point t at time t V ( t ) = min J ( t:t, u t:t ). (3) u t:t Based on dynamic programming arguments [6, Ch. III.7], the HJB equation associated with this problem is = min u ( q + ut Ru + ( V ) T (f + Gu) + T r ( ( V ) BΣ ɛ B T )) (4) whereby the optimal control effort, u, can be found analytically, and takes the form u = R G T V. (5) Substituting the optimal control, u, into (4) yields the following nonlinear, second order partial differential equation (PDE): = q + ( V ) T f ( V ) T GR G T ( V ) + T r ( ( V ) BΣ ɛ B T ) (6) with boundary condition V () = φ(). For the stabilization problem on a compact domain, it is appropriate to set the boundary condition to be φ() = for =, indicating zero cost accrued for achieving the origin, and φ() > for Ω \ {}. In practice, φ() at the eterior boundary is usually chosen to be a large number that depends on application to impose large penalty for eiting the predefined domain. In general, (6) is difficult to solve due to its nonlinearity. However, with the assumption that there eists a >, a control penalty cost R in () satisfying the equation G( t )R G( t ) T = B( t )Σ ɛ B( t ) T Σ( t ) Σ t (7) and using the logarithmic transformation V = log Ψ, (8) it is possible [7], [7], [8], after substitution and simplification, to obtain the following linear PDE from (6) = qψ + f T ( Ψ) + T r (( Ψ) Σ t ) Ω Ψ() = e φ() Ω. (9) This transformation of the value function has been deemed the desirability function [7, Table ]. For brevity, define the following epression L(Ψ) f T ( Ψ) + T r (( Ψ) Σ t ) and the function ψ() at the boundary as ψ() e φ() Ω. Condition (7) is trivially met for systems of the form d t = f( t ) dt + G( t ) (u t dt + dω t ), a pervasive assumption in the adaptive control literature [9]. This constraint restricts the design of the control penalty R, such that control effort is highly penalized in subspaces with little noise, and lightly penalized in those with high noise. Additional discussion is given in [7, SI Sec..]. C. Stochastic Control Lyapunov Functions (SCLF) Before the stochastic control Lyapunov function (SCLF) is introduced, the definitions for two forms of stability are provided, following the definitions in [3, Ch. 5]. Definition. Given (), the equilibrium point at = is stable in probability for t if for any s and ɛ >, { } lim P sup X,s (t) > ɛ = t>s where X,s is the trajectory of () starting from at time s. Intuitively, Definition is similar to the notion of stability for deterministic systems. The following is a stronger stability definition that is similar to the notion of asymptotic stability for deterministic systems. Definition. Given (), the equilibrium point at = is asymptotically stable in probability if it is stable in probability and { } lim P lim t X,s (t) = = where X,s is the trajectory of () starting from at time s. These notions of stability can be ensured through the construction of SCLFs, as follows. Definition 3. A stochastic control Lyapunov function (SCLF) for system () is a positive definite function V C, on a compact domain O = Ω {} {t > } such that V(, t) =, V(, t) µ( ) t > u(, t) s.t. L(V(, t)) where µ K, and (, t) O\{(, t)} L(V) = t V + V T (f + Gu) + T r(( V)BΣ ɛ B T ). () Theorem 4. [3, Thm. 5.3] For system (), assume that there eists a SCLF and a u satisfying Definition 3. Then, the equilibrium point = is stable in probability, and u is a stabilizing controller. To achieve the stronger condition of asymptotic stability in probability, we have the following result. Theorem 5. [3, Thm. 5.5 and Cor. 5.] For system (), suppose that in addition to the eistence of a SCLF and a u satisfying Definition 3, u is time-invariant, V(, t) µ ( ) t > L(V(, t)) < (, t) O\{(, t)} where µ K. Then, the equilibrium point = is asymptotically stable in probability, and u is an asymptotically stabilizing controller. D. Sum of Squares (SOS) Programming Sum of Squares (SOS) programming is the primary tool by which approimate solutions to the HJB equation are generated in this paper. In particular, we will show how

4 the PDE that governs the HJB may be relaed to a set of nonnegativity constraints. SOS methods will then allow for the construction of an optimization problem where these nonnegativity constraints may be enforced. A complete introduction to SOS programming is available in []. Here, we review the basic definition of SOS that is used throughout the paper. Definition 6. A multivariate polynomial f() is a SOS polynomial if there eist polynomials f (),..., f m () such that m f() = fi (). i= The set of SOS polynomials in is denoted as S[]. Accordingly, a sufficient condition for nonnegativity of a polynomial f() is that f() S[]. Membership in the set S[] may be tested as a conve problem []. Theorem 7. [, Thm. 3.3] The eistence of a SOS decomposition of a polynomial in n variables of degree d can be decided by solving a semidefinite programming (SDP) feasibility problem. If the polynomial is dense (no sparsity), the ( dimension ) ( of the) matri inequality in the SDP is equal to n + d n + d. d d Hence, by adding SOS constraints to the set of all positive polynomials, testing nonnegativity of a polynomial becomes a tractable SDP. The converse question, is a nonnegative polynomial necessarily a SOS, is unfortunately false, indicating that this test is conservative []. Nonetheless, SOS feasibility is sufficiently powerful for our purposes. Theorem 7 guarantees a tractable procedure to determine whether a particular polynomial, possibly parameterized, is a SOS polynomial. Our method combines multiple polynomial constraints to an optimization formulation. To do so, we need to define the following polynomial sets. Definition 8. The preordering of polynomials g i () R[] for i =,,..., m is the set P (g,..., g m ) = s ν ()g () ν g m () νm ν {,} s ν S[]. m () The quadratic module of polynomials g i () R[] for i =,,..., m is the set { m } M(g,..., g m ) = s i ()g i () s i S[]. () i= The following proposition is trivial, but it is useful to incorporate the domain Ω in our optimization formulation later. Proposition 9. Given f() R[] and the domain Ω = { g i () R[], g i (), i {,,..., m}}, if f() P (g,..., g m ), or f() M(g,..., g m ), then f() is nonnegative on Ω. If there eists another polynomial f () such that f () f(), then f () is also nonnegative on Ω. Proof. Because g i () and s i () are nonnegative, all functions in M( ) and P ( ) are nonnegative. The second statement is trivially true if the first statement is true. To illustrate how this proposition applies, consider a polynomial f() defined on the domain [, ]. The bounded domain can be equivalently defined by polynomials with g () = + and g () =. To certify that f() on the specified domain, construct a function h() = s ()( + ) + s ()( ) + s 3 ()( + )( ) where s i S[] and certify that f() h(). Notice that h() P ( +, ), so h(). If f() h(), then f() h(). Proposition 9 is applied here. Finding the correct s i () is not trivial in general. Nonetheless, as mentioned earlier, if we further impose that f() h() S[], then checking if there eists s i () such that f() h() S[] becomes a SDP as given by Theorem 7. More concretely, the procedure may begin with a limited polynomial degree for s i (), increasing the degree until a certificate is found (if one eists) or the computation resources are ehausted. To simplify notation in the remainder of this tet, given a domain Ω = { g i () R[], g i (), i {,,..., m}}, we set the notation P (Ω) = P (g,..., g m ) and M(Ω) = M(g,..., g m ). Remark. Choosing either M(Ω) or P (Ω) relies on the computational resources available. Although M(Ω) P (Ω) and therefore the chances of finding a certificate is larger using P (Ω), the resulting SDP is also larger. In addition, using other subsets of P (Ω) apart from M(Ω) does not change the results. These polynomial sets are often used in the discussions of Schmüdgen s or Putinar s Positivstellensatz. Loosely speaking, Schmüdgen s Positivstellensatz states that if f() is positive on a compact domain Ω, then f() P (Ω) [], []. III. SUM-OF-SQUARES RELAXATION OF THE HJB PDE Sum of squares programming has found many uses in combinatorial optimization, control theory, and other applications. This section now adds solving the linear HJB to this list. We would like to emphasize the following standing assumption, necessary in moment and SOS-based methods [], []. Assumption. Assume that system () evolves on a compact domain Ω R n, and Ω is a basic closed semialgebraic set such that Ω = { g i () R[], g i (), i {,..., k}} for some k. Then, the boundary Ω is polynomial representable. We use the notation Ω = { h i () R[], m i= h i() = } for some m to describe it. The following definitions formalize several operators that will prove useful in the sequel. Definition. Given a basic closed semialgebraic set Ω = { g i () R[], g i (), i {,..., k}} and a set of SOS polynomials, S = {s ν () s ν () S[], ν {, } k },

5 define the operator D as D(Ω, S) = s ν ()g () ν g k () νk ν {,} k where D(Ω, S) P (Ω). Definition 3. Given a polynomial inequality, p(), the boundary of a compact set Ω = { h i () R[], m i= h i() = } and a set of polynomials, T = {t i () t i () R[], i {,..., m}}, define the operator B as B(p(), Ω, T ) = {p() t i ()h i () i {,..., m}} where B returns a set of polynomials that is nonnegative on Ω. A. Relaation of the HJB equation If the linear HJB (9) is not uniformly parabolic [3], a classical solution may not eist. The notion of viscosity solutions is developed to generalize the classical solution. We refer readers to [3] for a general discussion on viscosity solutions and [6] for a discussion on viscosity solutions related to Markov diffusion processes. Definition 4. [3, Def..] Given Ω R N and a partial differential equation F (, u, u, u) = (3) where F : R N R R N S(N) R, S(N) is the set of real symmetric N N matrices, and F satisfies F (, r, p, X) F (, s, p, Y ) whenever r s and Y X, then a viscosity subsolution of (3) on Ω is a function u USC(Ω) such that F (, u, u, u) Ω, (p, X) J,+ Ω u() Similarly, a viscosity supersolution of (3) on Ω is a function u LSC(Ω) such that F (, u, u, u) Ω, (p, X) J, Ω u() Finally, u is a viscosity solution of (3) on Ω if it is both a viscosity subsolution and a viscosity supersolution in Ω. The notations USC(Ω) and LSC(Ω) represent the sets of upper and lower semicontinuous functions on domain Ω respectively, and J,+, Ω u() and JΩ u() represents the second order superjets and subjets of u at respectively, a completely unrestrictive domain in our setting. For further details, readers may refer to [3]. For the remainder of this paper, we assume a unique nontrivial viscosity solution to (6) and (9) eists (see [6], Chapter V) and denote the unique solutions as Ψ and V respectively. The equality constraints of (9) may be relaed as follows qψ L(Ψ) Ψ() ψ() Ω. (4) Such a relaation provides a point-wise bound to the solution Ψ, and this relaation may be enforced via SOS programming. In particular, a solution to (4), denoted as Ψ l, is a lower bound on the solution Ψ over the entire problem domain. Theorem 5. Given a smooth function Ψ l that satisfies (4), then Ψ l is a viscosity subsolution and Ψ l Ψ for all Ω. Proof. By Definition 4, the solution Ψ l is a viscosity subsolution. Note that Ψ is both a viscosity subsolution and a viscosity supersolution, and Ψ l Ψ on the boundary Ω. Hence, by the maimum principle for viscosity solutions [3, Thm 3.3], Ψ l Ψ for all Ω. Similarly, the analogous relaation qψ L(Ψ) Ψ() ψ() () Ω (5) gives an over-approimation of the desirability function, and its solution, denoted as Ψ u, is an upper bound of Ψ over domain Ω. Thus, we also have Theorem 6. Given a smooth function Ψ u that satisfies (5), then Ψ u is a viscosity supersolution and Ψ u Ψ for all Ω. Proof. The proof is identical to the proof for Theorem 5. Because the logarithmic transform (8) is monotonic, one can relate these bounds on the desirability function to bounds on the value function as follows Proposition 7. If the solution to (6) is V, given solutions V u = log Ψ l and V l = log Ψ u from (4) and (5) respectively, then V u V and V l V. Proof. Recall that V = log Ψ. Applying Theorem 5 and 6, V u V and V l V. Although the solutions to (4) and (5) do not satisfy (9) eactly, they provide point-wise bounds to the solution Ψ. B. SOS Program Given that relaation (4) and (5) results in a pointwise upper and lower bound to the eact solution of (9), we construct the following optimization problem that provides a suboptimal controller with bounded residual error: min ɛ (6) Ψ l,ψ u s.t. qψ l L(Ψ l ) Ω qψ u L(Ψ u ) Ω Ψ u Ψ l ɛ Ω Ψ l ψ Ψ u Ω iψ l i iψ l i Ψ l () =

6 where i is the i-th component of Ω. As mentioned in Section III-A, the first two constraints result from the relaations of the HJB equation, and the fourth constraint arises from the relaation of the boundary conditions. The third constraint ensures that the difference between the upper bound and lower bound solution is bounded, and the last three constraints ensure that the solution yields a stabilizing controller, as will be made clear in Section IV. Note that in the optimization problem, Ψ u and Ψ l are polynomials whereby the coefficients and the degree for both are optimization variables. The term ɛ is related to the error of the approimation. As discussed in the review of SOS techniques, a general optimization problem involving parameterized nonnegative polynomials is not necessarily tractable. In order to solve (6) using a polynomial-time algorithm, we restrict the polynomial inequalities such that they are SOS polynomials instead of nonnegative polynomials. We therefore apply Proposition 9 to rela optimization problem (6) into min Ψ l,ψ u,s,t s.t. ɛ (7) qψ l + L(Ψ l ) D(Ω, S ) S[] qψ u L(Ψ u ) D(Ω, S ) S[] ɛ (Ψ u Ψ l ) D(Ω, S 3 ) S[] B(Ψ l, Ω, T ) S[] B(ψ Ψ l, Ω, T ) S[] B(Ψ u ψ, Ω, T 3 ) S[] iψ l D(Ω { i }, S 4 ) S[] iψ l D(Ω { i }, S 5 ) S[] Ψ l () = where S = (S,..., S 4, S 5 ), S i S[] is defined as in Definition, T = (T, T, T 3 ), and T j R[] is defined as in Definition 3. With a slight abuse of notation, B( ) S[] implies that each polynomial in B( ) is a SOS polynomial. If the polynomial degrees are fied, optimization problem (7) is conve and solvable using a semidefinite program via Theorem 7. The net section will discuss the systematic approach we used to solve the optimization problem. Henceforth, denote the solution to (7) as (Ψ u, Ψ l, S, T, ɛ). Remark 8. By Definition 4, the viscosity solution is a continuous function. Consequently, the solution Ψ is a continuous function defined on a bounded domain. Hence, Ψ u and Ψ l can be made arbitrary close to Ψ by the Stone- Weierstrass Theorem [3] in (6). However, this guarantee is lost when Ψ u and Ψ l are restricted to be a SOS polynomials. The feasible set of the optimization problem (7) is therefore not necessarily non-empty for a given polynomial degree. One would not epect feasibility for all instances of (7) as this would imply there eists is a linear stabilizing controller for any given system. C. Controller Synthesis Let d be the maimum degree of Ψ l, Ψ u and polynomials in S and T, and denote (Ψ d u, Ψ d l, Sd, T d, ɛ d ) as a solution to (7) when the maimum polynomial degree is fied at d. The hierarchy of SOS programs with increasing polynomial degree produces a sequence of (possibly empty) solutions (Ψ d u, Ψ d l, Sd, T d, ɛ d ) d I, where I Z +. This sequence will be shown in the net section to improve, under the metric of the objective in (7). In other words, if solutions eist for d and d such that d > d, then ɛ d ɛ d. Therefore, one could keep increasing the degree of polynomials in order to achieve tighter bounds on Ψ, and invariably, V. The use of such hierarchies has become commonplace in polynomial optimization [], [33]. If at certain degree, ɛ d =, the solution Ψ is found. Once a satisfactory error is achieved or computational resources run out, the lower bound Ψ l can be used to compute a suboptimal controller. Recall that u = R G T V and V = log Ψ. The suboptimal controller u ɛ for a given error ɛ is computed as u ɛ = R G T V u where V u = log Ψ l. Even when ɛ is larger than a desired value, the solution Ψ l still satisfies conditions in Definition 3 to yield a stabilizing suboptimal controller. Net section will analyze properties of the solutions and the suboptimal controller. IV. ANALYSIS This section establishes several properties of the solutions to the optimization problem (7) that are useful for feedback control. First we show that the solutions in the SOS program hierarchy are uniformly bounded relative to the eact solutions. We net prove that the relaed solutions to the stochastic HJB equation are SCLFs, and the approimated solution leads to a stabilizing controller. Finally, we show that the costs of using the approimate solutions as controllers are bounded above by the approimated value functions. A. Properties of the Approimated Desirability Function First, the approimation error of Ψ l or Ψ u obtained from (7) is computed relative to the true desirability function Ψ. Proposition 9. Given a solution (Ψ u, Ψ l, S, T, ɛ) to (7) for a given degree d, the approimation error of the desirability function is bounded as Ψ Ψ ɛ where Ψ is either Ψ u or Ψ l. Proof. By Corollary 5 and 6, Ψ l is the lower bound of Ψ, and Ψ u is the upper bound of Ψ. So, ɛ Ψ u Ψ l and Ψ u Ψ Ψ l. Combining both inequalities, one has Ψ u Ψ ɛ and Ψ Ψ l ɛ. Therefore, Ψ Ψ ɛ where Ψ is either Ψ u or Ψ l. Proposition. The hierarchy of SOS programs consisting of solutions to (7) with increasing polynomial degree produces a sequence of solutions (Ψ d u, Ψ d l, Sd, T d, ɛ d ) such that ɛ d+ ɛ d for all d. Proof. Polynomials of degree d form a subset of polynomials of degree d +. Thus, at a higher polynomial degree d +, a previous solution at a lower polynomial degree d is still a feasible solution when the coefficients for monomials with total degree d + is set to. Consequently, the optimal value ɛ d+ cannot be smaller than ɛ d for all d.

7 Thus, as the polynomial degree of the optimization problem is increased, the pointwise error ɛ is non-increasing. Therefore, one could keep increasing the degree of polynomials in order to achieve tighter bounds on Ψ, and invariably, V. However, ɛ is only non-increasing as the polynomial degree is increased, and a convergence of the bound ɛ to zero is not guaranteed. Although the bound on the pointwise error is nonincreasing, the actual difference between Ψ and Ψ may increase between iterations. We bound this variation as follows. Corollary. Suppose Ψ d Ψ ɛ d and Ψ d+ Ψ = γ d+. Then, γ d+ ɛ d. Proof. By Proposition, ɛ d+ ɛ d. Because γ d+ ɛ d+, γ d+ ɛ d In other words, the approimation error of the desirability function for a SOS program using d + polynomial degree cannot increase such that it is larger than ɛ d in each step of the hierarchy of SOS programs. B. Properties of the Approimated Value Function Up to this point, the analysis has focused on properties of the desirability solution. We now investigate the implications of these results upon the value function. Recall that the value function is related to the desirability via the logarithmic transform (8). Henceforth, denote the solution to (6) as V ( t ) = min u[t:t ] E ωt [J( t )] = log Ψ ( t ), the solution to (7) for a fied degree d as (Ψ u, Ψ l, S, T, ɛ), and the suboptimal value function computed from the solution of (7) as V u = log Ψ l. Only Ψ l and V u is considered henceforth, because Ψ l, but not Ψ u, gives an approimate value function that satisfies the properties of SCLF in Definition 3, a fact shown in the net section. Theorem. V u is an upper bound of the optimal cost V such that ( { V u V log min, ɛ }) (8) η where η = e V. Proof. By Proposition 7, V u V and hence, V u V. To prove the other inequality, by Proposition 9, V u V = log Ψ ( l Ψ log Ψ ɛ Ψ log ɛ ). η The last inequality holds because Ψ e V by definition in (8). Since Ψ l is the lower bound of Ψ, the right hand side of the first equality is always a positive number. Therefore, V u is a point-wise upper bound of V. Corollary 3. Let Vu d = log Ψ d l and Vu d+ = log Ψ d+ l. If Vu d ( V { ɛ d and}) Vu d+ V = γ d+, then γ d+ log min, ɛd η. Proof. This result is given by Corollary and Theorem. At this point, we have shown that the lower bound of the desirability function gives us an upper bound of the suboptimal cost. More importantly, the upper bound of the suboptimal cost is not increasing as the degree of polynomial increases. C. The Approimate HJB solutions are SCLFs This section shows that the approimate value function derived from the desirability approimation, Ψ l, is a SCLF. Theorem 4. V u is a stochastic control Lyapunov function according to Definition 3. Proof. The constraint Ψ l () = ensures that V u () = log Ψ l () =. Notice that all terms in J(, u) from () are positive definite, resulting in V being a positive definite function. In addition, by Proposition 7, V u V. Hence, V u is also a positive definite function. The second and third to last constraints in (7) ensures that Ψ l is nonincreasing away from the origin. Hence, V u is nondecreasing away form the origin satisfying µ( ) V u () µ ( ) for some µ, µ K. Net, we show that there eists a u such that L(V u ). Following (5), let u ɛ = R G T V u, (9) the control law corresponding to V u. Notice that from the definition of V u, V u = Ψ l Ψ l and V u = ( Ψ Ψ l )( Ψ l ) T Ψ l l Ψ l. So, u ɛ = Ψ l R G T Ψ l. Then, from (), L(V u ) = ( Ψ l ) T (f + GR G T Ψ l ) Ψ l Ψ l + (( T r Ψ ( Ψ l )( Ψ l ) T ) ) Ψ l BΣ ɛ B l Ψ l where t V u = because V u is not a function of time. Applying the assumption in (7) and simplifying, L(V u ) = Ψ l ( Ψ l ) T f From the first constraint in (7), Ψ l ( Ψ l ) T Σ t Ψ l Ψ l T r (( Ψ l ) Σ t ). qψ l f T ( Ψ l ) T r (( Ψ l ) Σ t ) = ( Ψ l ) T f q + T r (( Ψ l ) Σ t ). Ψ l Ψ l Substituting this inequality into L(V u ) and simplifying yields L(V u ) q Ψ l ( Ψ l ) T Σ t Ψ l () because q, > and Σ t is positive semidefinite by definition. Since V u satisfies Definition 3, V u is a SCLF. Corollary 5. The suboptimal controller u ɛ = R G T V u is stabilizing in probability within the domain Ω. Proof. This corollary is a direct consequence of the constructive proof of Theorem 4 and Theorem 4.

8 Corollary 6. If Σ t is a positive definite matri, the suboptimal controller u ɛ = R G T V u is asymptotically stabilizing in probability within the domain Ω. Proof. This corollary is a direct consequence of the constructive proof of Theorem 4 and Theorem 5. In (), L(V u ) < for Ω\{} if Σ t is positive definite. Recall that q is positive definite in the problem formulation. D. Bound on the Total Trajectory Cost We conclude this section by showing that the epected total trajectory cost incurred by the system while operating under the suboptimal controller of (9) can be bounded as follows. Theorem 7. Given the control law u ɛ = R G T V u, ( { J u V u V log min, ɛ }) () η where J u = E ωt [φ T ( T ) + T r( t, u ɛ t)dt], the epected cost of the system when using the control law, u ɛ. Proof. By Itô s formula, dv u ( t ) = L(V u )( t )dt + V u ( t )B( t )dω t. where L(V ) is defined in (). Then, V u ( t ) = V u (, )+ t t + L(V u )( s )ds V u ( s )B( s )dω s. () Given that V u is derived from polynomial function Ψ l, the integrals are well defined, and we can take the epectation of () to get [ t ] E[V u ( t )] = V u (, ) + E L(V u )( s )ds whereby the last term of () drops out because the noise is assumed to have zero mean. The epectations of the other terms return the same terms because they are deterministic. From (), L(V u ) q Ψ l ( Ψ l ) T Σ t Ψ l = q ( V u ) T GR G T ( V u ) = q (uɛ ) T Ru ɛ where the first equality is given by the logarithmic transformation and the second equality is given by the control law u ɛ = R G T V u. Therefore, [ ] T E ωt [V u ( T )] = V u ( ) + E ωt L(V u )( s )ds [ ] T V u ( ) E ωt q( s ) + (uɛ s) T Ru ɛ sds = V u ( ) J(, u ɛ ) + E ωt [φ( T )] where the last equality is given by (). Therefore, V u ( ) J(, u ɛ ) E ωt [V u ( T ) φ( T )]. By definition, V u ( T ) φ( T ) for all T Ω. Thus, E ωt [V u ( T ) φ( T )]. Consequently, V u ( ) J(, u ɛ ), and V u ( ) J(, u ɛ ). Theorem gives the second inequality in the theorem. V. EXTENSIONS This section briefly summarizes some etensions of the basic framework to a few related problems. A. Linearly Solvable Approimations The approach presented in this paper would appear up to this point to be limited to systems that are linearly solvable, i.e., those that satisfy condition (7). However, the proposed methods may be etended to a system which does not satisfy these conditions by approimating the system with one that is linearly solvable. One eample is to introduce stochastic forcing into an otherwise deterministic system. We first construct a comparison theorem between HJB solutions to systems that share the same general dynamics, but with differing noise covariance. This comparison allows for the approimated value function of one system to bound the value function for another, providing pointwise bounds, and indeed SCLFs, for those that do not satisfy (7). Proposition 8. Suppose V a is the solution to the HJB equation (6) with noise covariances Σ a, and V b is a supersolution to (6) with identical parameters ecept the noise covariance Σ b where Σ b Σ a, then V b V a for all Ω. Proof. From [3, Def..], V is a viscosity supersolution to the HJB equation (6) with noise covariance Σ if it satisfies q ( V ) T f + ( V ) T GR G T ( V ) T r ( ( V ) BΣB T ). (3) Since Σ b Σ a the following trace inequality holds, T r ( ( V a ) BΣ b B T ) T r ( ( V a ) BΣ a B T ). Therefore, we have the inequality q ( V b) T ( f + V b) T GR G T ( V b) T r (( V b) BΣ b B T ) q ( V b) T ( f + V b) T GR G T ( V b) T r (( V b) BΣ a B T ) which implies that V b is in fact a viscosity supersolution to the system with noise covariance Σ a (i.e., V b satisfies (3) for Σ a ). As V b is a supersolution to the system with parameter Σ a, then V b V a. A particular class of such approimations arises from a deterministic HJB solution, which is not linearly solvable, but is approimated by one that is linearly solvable. Consider a deterministic system of the form d t = (f( t ) + G( t )u t ) dt (4)

9 with cost function J(, u) = φ( T ) + T q( t ) + u tru t dt (5) where φ, q, R, f, G, and the state and input domains are defined as in the stochastic problem in Section II-B. Then, the HJB equation is given by = q + ( V ) T f ( V ) T GR G T ( V ) (6) and the optimal control is given by u = R G T V. Corollary 9. Let V be the value function that solves (6), and V u be the upper bound solution obtained from (7) where all parameters are the same as (6) and Σ t is not zero. Then, V u is an upper bound for V over the domain (i.e., V V u ). Proof. A simple application of Proposition 8, where Σ a takes the form of a zero matri, gives V V u. Interestingly, using the solution from (7) and the transformation V u = log Ψ l, the suboptimal controller u ɛ = R G T V u is a stabilizing controller for the deterministic system (4) if a simple condition is satisfied. This fact is shown using the Lyapunov theorem for deterministic systems introduced net [5]. Definition 3. Given the system (4) and cost function (5), a control Lyapunov function (CLF) is a proper positive definite function V C on a compact domain Ω {} such that V() =, V() µ( ) Ω\{} u() s.t. ( V) T (f + Gu) Ω\{} (7) where µ K. Theorem 3. [5, Thm..5] Given a system (4) and cost function (5), if there eists a CLF V and a u satisfying Definition 3, then the controlled system is stable, and u is a stabilizing controller. Furthermore, if ( V ) T (f + Gu) < for all Ω\{}, the controlled system is asymptotically stable, and u is an asymptotically stabilizing controller. Verifying that the controller u ɛ = R G T V u is in fact stabilizing and that V u is a CLF may be seen as follows. Corollary 3. Given the controller u ɛ = R G T V u, if T r ( ( V u ) BΣ ɛ B T ) then u ɛ is a stabilizing controller for (4). If T r ( ( V u ) BΣ ɛ B T ) > Ω\{}, Ω\{}, then u ɛ is an asymptotically stabilizing controller for (4). Proof. Recall that from the proof of Theorem 4, all conditions in Definition 3 are satisfied by V u ecept (7). To show that V u satisfies (7), rearrange (6) to yield the following ( V u ) T (f + Gu ɛ ) = ( V u ) T f ( V u ) T GR G T ( V u ) q ( V u ) T GR G T ( V u ) T r ( ( V u ) BΣ ɛ B T ). Recall that q and R are positive definite. If T r ( ( V u ) BΣ ɛ B ) T for all Ω\{}, then ( V u ) T (f + Gu ɛ ) implying that V u is a CLF and u ɛ is a stabilizing controller by Theorem 3. Furthermore, if T r ( ( V u ) BΣ ɛ B ) T > for all Ω\{}, u ɛ is an asymptotically stabilizing controller. The trace condition in Corollary 3 is easily enforced in (7) by adding one etra constraint in the optimization problem. Thus, the optimization problem (7) can also produce a CLF for the corresponding deterministic system, with analytical results from the Section IV, including a priori trajectory suboptimality bounds (Theorem 7), inherited as well. B. Robust Controller Synthesis The proposed technique may be etended to incorporate uncertainty in the problem data. Assume there eists unknown coefficients a H in f(), G(), B(), where H R k, H = {a g i (a), g i (a) R[], i {,,..., d}} is a basic closed semialgebraic set describing the uncertainty set of a. The problem data is then defined by the epressions f(, a), G(, a), B(, a) for Ω, and a H. In this case, the uncertain parameters may be considered as additional domain variables, defined over their own compact space. Uncertainty of this form may be incorporated naturally into the optimization problem (7). Define the monomial set X = {a α β } α=...n,β=...m. The optimization variables corresponding to the polynomials in S and T are then constructed out of X as n m p(, a) = c α,β a α β. α= β= Note that Ψ u and Ψ l are not themselves functions of a. The uncertainty set H is incorporated by defining a compact domain M = Ω H, that takes the product of the original problem domain and the uncertainty set. The resulting optimization problem is therefore min Ψ l,ψ u,s,t ɛ (8) s.t. qψ l + L(Ψ l, a) D(M, S ) S[, a] qψ u L(Ψ u, a) D(M, S ) S[, a] ɛ (Ψ u Ψ l ) D(M, S 3 ) S[, a] B(Ψ l D(H, S 4 ), Ω, T ) S[, a] B(ψ Ψ l D(H, S 5 ), Ω, T ) S[, a] B(Ψ u ψ D(H, S 6 ), Ω, T 3 ) S[, a] iψ l D(Ω { i }, S 7 ) S[, a] iψ l D(Ω { i }, S 8 ) S[, a] Ψ l () D(H, S 9 ) S[a] Ψ l () + D(H, S ) S[a] where S = (S,..., S ), and T = (T, T, T 3 ). The operator L now depends on the variable a. The resulting solutions to the optimizations (8), and the upper bound suboptimal value functions V u = log Ψ l, are found for all a H, with

Approimation error, Polynomial Degree Cost * Polynomial Degree the control law u ɛ = R G T V u now stabilizing for the entirety of the uncertainty set H. Similar techniques have been studied previously for Lyapunov analysis, e.g., [34, Ch. 4.3.3]. C. Path Planning Although our study up to this point has emphasized the use of the approimate solutions for stabilization, their use is more general. As studied in [3], the methods of this paper may also be used to construct controllers for path planning problems. In a path planning problem, given a dynamical system of the form () with cost function (), the goal is to move from a particular state to a goal state while minimizing the cost function (). This problem is almost the same as the stabilization problem ecept the last three inequalities in (7) that ensure stability to the origin are omitted. Indeed, for general path planning problems the value function isn t epected to have the Lyapunov function s conve-like geometry. Unfortunately, without the aforementioned constraints, which provides strong guarantees upon trajectory behavior, the above results do not hold, such as Theorem 7 and other results guaranteeing trajectory convergence or trajectory suboptimality. D. Non-Polynomial Systems The development of this work has been limited to nonlinear systems governed by polynomial functions. A number of avenues eist for incorporating non-polynomial nonlinearities. The most straightforward approach is to simply project the non-polynomial functions to a polynomial basis. As polynomials are universal approimators in L by the Stone-Weierstrass Theorem [3], this approimation can be made to arbitrary accuracy if the functions are continuous. A limited basis may introduce modeling error, but this may be dealt with via the robust optimization techniques addressed previously. Alternatively, non-polynomial constraints may be incorporated using additional equality constraints, as is done in []. VI. NUMERIC EXAMPLES This section studies the computational characteristics of this method using two eamples a scalar system and a twodimensional system. In the following problems, the optimization parser YALMIP [35] was used in conjunction with the.9.8.7.6.5.4.3.. True solution - -.5.5 Fig.. The desirability function of system (9) for varying polynomial degree. The true solution is the black curve. semidefinite optimization package MOSEK [36]. In both eamples, the continuous system is integrated numerically using Euler integration with step size of.5s during simulations. A. Scalar Unstable System Consider the following scalar unstable nonlinear system d = ( 3 + 5 + + u ) dt + dω (9) on the domain Ω = { }. The noise model considered is Gaussian white noise with zero mean and variance Σ ɛ =. The goal is to stabilize the system at the origin. We choose the boundary at two ends of the domain to be Ψ( ) = e and Ψ() = e. At the origin, the boundary is set as Ψ() =. We set q =, and R =. In the one dimensional case, the origin, which is a boundary, divides the domain into two partitions, and. Because of the natural division of the domain, the solutions for both domains can be represented by smooth polynomial respectively, and solved independently. The simulation is terminated when the trajectories enter the interval [.5,.5] centered on the origin. 35 3 5 5.9.8.7.6.5.4.3.. 5 5 3 35 4 Polynomial degree (a) l r.8.6.4. -. -.4 -.6 -.8 -.5.5 Time, s (b) 35 3 5 5.8.6.4..8.6.4. V u (-.5) J u (-.5) 5 5 3 35 4 Polynomial degree Fig.. Computational results of system (9). (a) Convergence of the objective function of (7) as the degree of polynomial increases. The approimation error for is denoted as ɛ l and the approimation error for is denoted as ɛ r. (b) Sample trajectories using controller computed from optimization problem (7) with different polynomial degrees starting from si randomly chosen initial points. (c) The comparison between J u and V u for different polynomial degrees whereby J u is the epected cost and V u is the value function computed from optimization problem (7). The initial condition is fied at =.5. (c)

The desirability functions that result from solving (7) for varying polynomial degrees are shown in Figure. The true solution is computed by solving the HJB directly in Mathematica [37]. The kink at the origin is epected because the HJB PDE solution is not necessarily smooth at the boundary, and in this instance the origin is a zero-cost boundary. The approimation error for both partitions is shown in Figure (a) for increasing polynomial degree. As seen in the plots, the approimation improves as the polynomial degree increases. Polynomial degrees below 4 are not feasible, hence this data is absent in the plots. The suboptimal solution converges faster for > than for < when the degree of polynomial increases because the true solution for > has a simple quadratic-like shape that can be easily represented as a low degree SOS function. Figure (b) shows sample trajectories using the controller computed from optimization problem (7) for different polynomial degrees. The controllers are stabilizing for si randomly chosen initial points. Unsurprisingly, the suboptimal solutions with low pointwise error result in the system converging towards the origin faster. To compare between Ju and Vu, a Monte Carlo eperiment is illustrated in Figure (c). For each polynomial degree that is feasible, the controller obtained from Ψl in optimization problem (7) is implemented in 3 simulations of the system subject to random samples of Gaussian white noise with Σ =. The initial condition is fied at =.5. In the figure, V u J u as epected, and the difference between the two decreases with increasing d..8.6.6.4.4...5 y 8 6 6 4 4.5-9.8.8.6 8.6.7.4 7.4.6. 6.5 5.4 -. 4.3 -.4 3. -.6.4. -.8. 6 (a) 8 -.5 (b).5 Vu(.7,.7) Ju(.7,.7). Cost Polynomial Degree y 4 (d) V, Degree = is R = I, and state cost is q() = + y. The boundary conditions for the sides at =, =, y =, and y = are set to φ(, y) = 5, while at the origin, the boundary has cost φ(, ) =. The noise model considered is Gaussian white noise with zero mean and an identity covariance matri. The approimated desirability functions and their corresponding value functions are shown in Figure 3 for different degrees of polynomial. The solutions are shown for half of the domain [, ] in order to get a view of the gaps between the upper and lower bound solutions. When the polynomial degree is, the upper and lower bound solutions are numerically identical in many regions. Figure 4(a) shows the convergence of the objective function of optimization problem (7) as the degree of polynomial increases. There is no data below degree of because the optimization problem is not feasible in these cases. As shown in Figure 4(b), sample trajectories starting from si different initial points shows that.8 Polynomial Degree Fig. 3. Approimated desirability functions and value functions for (3) when polynomial degrees are and. In (a) and (b), the blue sheets are the upper bound solutions Ψu and the red sheets are the lower bound solutions Ψl. The corresponding value functions are shown in (c) and (d) respectively..5 -.5 - (c) V, Degree =.9 y.5 -.5.5 (b) Ψ, Degree = 8 y.5 -.5 - (a) Ψ, Degree = The goal is to reach the origin at the boundary of the domain Ω = {(, y), y }. The control penalty - - y.5 -.5 - In the following eample, we demonstrate the power of this technique on a -dimensional system. Consider a nonlinear -dimensional problem eample with following dynamics: 5 d 3 + y 4 u = + dt dy y 5 y 3 y + y4 y u dω +. (3) y dω.5 B. Two Dimensional System Approimation error,.8.8.6 4 6 8 Polynomial Degree (c) Fig. 4. Computational results of system (3). (a) Convergence of the variables in the objective function of (7). (b) Sample trajectories using controller from optimization problem (7) with different polynomial degrees starting from si randomly chosen initial points. (c) The comparison between Ju, the epected cost, and Vu the value function for different polynomial degrees from optimization problem (7). The initial condition is fied at = (.5,.5).