arxiv: v1 [cs.ro] 24 May 2017

Size: px

Start display at page:

Download "arxiv: v1 [cs.ro] 24 May 2017"

Aubrey Felicity Goodwin
5 years ago
Views:

1 A Near-Otimal Searation Princile for Nonlinear Stochastic Systems Arising in Robotic Path Planning and Control Mohammadhussein Rafieisakhaei 1, Suman Chakravorty 2 and P. R. Kumar 1 arxiv: v1 [cs.ro] 24 May 2017 Abstract We consider nonlinear stochastic systems that arise in ath lanning and control of mobile robots. As is tyical of almost all nonlinear stochastic systems, the otimally solving roblem is intractable. We rovide a design aroach which yields a tractable design that is quantifiably nearotimal. We exhibit a searation rincile under a small noise assumtion consisting of the otimal oen-loo design of nominal trajectory followed by an otimal feedback law to track this trajectory, which is different from the usual effort of searating estimation from control. As a corollary, we obtain a trajectory-otimized linear quadratic regulator design for stochastic nonlinear systems with Gaussian noise. I. INTRODUCTION Practical systems are often subject to inaccuracies that we model as noise. Planning for a stochastic system requires attention to the noise structure, available models and noise levels. Many robotic systems, in articular, mobile aerial and ground robots, are equied with noisy actuators that require feedback comensation or lanning ahead for a olicy that accounts for the random erturbations. Simly ignoring the noise and lanning for the unerturbed equivalent of the stochastic system can yield crucial errors leading to the failure in reaching the end-goal, or cause the system to fall into unsafe states. In a stochastic setting, the general roblem of sequential decision-making is formulated as a Markov Decision Problem (MDP) [1], [2]. The otimal solution of the stochastic control roblem can be obtained iteratively by value or olicy iteration methods to solve the Hamilton-Jacobi- Bellman equations [2]. Excet in secial cases, such as in a linear Gaussian environment, this involves discretization of the underlying saces [3]; an aroach whose scalability faces the curse of dimensionality [4]. As a result, they require a comutation time that is rovably exonential in the state dimension, in a real number based model of comlexity, without any assumtion that P NP [5]. Many aroaches have been roosed based on their tractability. Some rely on a searate design of the deterministic trajectory from the feedback olicy. Model Predictive Control (MPC)-based methods [6], [7], robust formulations [8], [9], and other designs that relate to the Pontryagin s *This material is based uon work artially suorted by NSF under Contract Nos. CNS and Science & Technology Center Grant CCF , the U.S. Army Research Office under Contract No. W911NF , and NPRP grant NPRP from the Qatar National Research Fund, a member of Qatar Foundation. 1 M. Rafieisakhaei and P. R. Kumar are with the Deartment of Electrical and Comuter Engineering, and 2 S. Chakravorty is with the Deartment of Aerosace Engineering, Texas A&M University, College Station, Texas, USA. {mrafieis, schakrav, rk@tamu.edu} Maximum Princile [10] are some of the methods that have been successfully used as surrogate design aroaches. Another oular aroach is utilizing Differential Dynamic Programing (DDP) [11] and DDP-based variations, such as the Stochastic DDP [12], ilqr and ilqg [13]. These methods rely on local linearizations of the cost function and the dynamics to the second order and roose iterative methods that attemt to find locally-otimal solutions in a tube around a nominal trajectory [13]. In this aer, we address the nonlinear stochastic control roblem and roose an architecture under which the searate design of an otimal oen-loo control sequence and a feedback olicy is near-otimal. In articular, we show that under a small noise assumtion, the searation into globallyotimal trajectory design and a globally-otimal feedback control law holds for a fully-observed nonlinear stochastic system. This result also sheds light on the conditions under which oular design aroaches based on the Maximum Princile may be globally ɛ-otimal. We quantify the first order stochastic error for small-noise levels based on Wentzell-Freidlin large-deviations theory. We thereby determine reach to a Trajectory-otimized Linear Quadratic Regulator (T-LQR) design for fully-observed nonlinear stochastic systems under Gaussian small-noise erturbations. In short, the design can be broken into two arts: i) an oen-loo otimal control roblem that designs the nominal trajectory of the LQR controller, which resects the nonlinearities as well as state and control constraints; ii) the design of an LQR olicy around the otimized nominal trajectory. The quality of the design is rigorously rovided by the main results of the aer. The organization of the aer is as follows. Section II rovides a brief background on Wentzell-Freidlin theory [14] and investigates its imlications regarding the linearization of a stochastic system couled with the usage of the Taylor theorem. Section III defines a general stochastic control roblem for a fully-observed system. Section IV rovides the main results by first analyzing the effect of feedback comensation on the linearization error, and then roviding the state and control error roagations along with robabilistic bounds based on the theory develoed in Section II. Section IV also rovides the first-order exected error of the stochastic cost function along with the searation result. Section V introduces the T-LQR design aroach. Finally, Section VI rovides a design based on T-LQR for a non-holonomic carlike robot and rovides numerical results on the roosed aroach to design.

2 II. SMALL RANDOM PERTURBATIONS OF A NON-LINEAR SYSTEM In this section, we discuss the theoretical background regarding the small noise erturbations of general dynamical systems. In articular, we discuss Wentzell-Freidlin theory on the small noise asymtotics of a erturbed system reresented by a general Stochastic Differential Equation (SDE). We consider a time-varying system as that is required for our design. A general discussion regarding large deviations of the trajectories of a erturbed system from that of its unerturbed counterarts and related theories can be found in [14] [22]. Probability sace: We consider a robability sace {Ω, F, P } with the random variables on a measurable sace (X, B), where X is a Euclidean sace with dimension of n x, n w or a smooth manifold in these saces, and B is the corresonding σ-algebra of Borel sets. Diffusion rocess: Let us consider a dynamical system with the following equation: dx ɛ t = b(t, X ɛ t)dt + ɛdw t, X ɛ 0 = x 0, (1) where b : R R nx R nx is a uniformly Lischitz continuous function, such that: b(t 1, x 1 ) b(t 2, x 2 ) K 1 x 1 x 2, (2) where x 1, x 2 R nx, t 1, t 2 [0, K], ɛ > 0, and K 1 > 0, {w t, t 0} is a Wiener rocess on R nw. Nominal unerturbed trajectory: Such a system can result from small random erturbations of the following timevarying ODE: ẋ t = b(t, x t ), (3) with initial condition x 0 = x 0 R nx. First order Taylor exansion: Using Taylor s theorem to obtain the first order linearization of the right hand side of the above system around the trajectory {x t } K results in the following: dx ɛ t =b(t, x t )dt+a t (X ɛ t x t )dt+ɛdw t +o( X ɛ t x t ), (4) where A t = x b(t, x) t,x t is the Jacobian matrix. Accuracy of linearization: Equation (4) states that if X ɛ t x t δ for all 0 t K, then, dx ɛ t =b(t, x t )dt+a t (X ɛ t x t )dt +ɛdw t + o(δ). (5) We will use the Wentzell-Freidlin theorem to calculate the robability that the aforesaid condition holds. In order to do that, we define the action functional for the family of rocesses defined in equation (1). Action functional [14]: For [T 1, T 2 ] [0, K], the action functional is defined as: S T1,T 2 (φ) := 1 T2 2ɛ 2 φ t b(t, φ t ) 2 dt, (6) T 1 for absolutely continuous φ, and is set to be equal to + for other φ C 0K (R nx ). Note that this defines the action functional for the (ɛ-deendent) family of rocesses given by the SDE (1), uniformly on the whole sace as ɛ 0. Theorem 1. Exonential Rate of Convergence Let: D be a domain in R nx, and denote its closure by cl(d); D denote the boundary of D; H D (t, x 0 )={φ C 0K (R nx ) : φ 0 = x 0, φ t D D}. Assume D = cl(d). Then, we have the following: lim ɛ 0 ɛ2 ln P x0 {X ɛ t D}= inf S 0t(φ), (7) φ H D (t,x 0) Theorem 2. Asymtotics of the Diffusion Process: Let: D t = cl(b c δ (x t )), the closure of the comlement of a ball with radius δ > 0 around the oint x t ; and τ ɛ = Min{t : X ɛ t D t }. Then, lim ɛ 0 ɛ2 ln P x0 {τ ɛ t} = inf S 0t (φ). (8) {φ:φ 0 =x 0, φ t x t >δ} Proof of these results can be found in [14], [15]. Thus, according to Theorem 1, for a given t, the robability as ɛ 0 of X ɛ t x t δ can be calculated as in equation (7). Note that this robability tends to zero exonentially for any fixed δ > 0 as ɛ 0. Moreover, from Theorem 2, the robability that the trajectory of X ɛ ever exits the tube of radius δ round the nominal trajectory in the time interval [0, t] also goes to zero exonentially at the same rate. (This also asserts that the likely aths to ever exit in [0, t] are those exiting at time t). This rovides the validity region of the linearized equation (4) and concludes our discussion in this section. III. THE FULLY OBSERVED SYSTEM The general stochastic control roblem of interest for fully observed system can be formulated as an otimization roblem in the sace of feedback olicies. In this section, we define the system equations and ose the general roblem. Without loss of generality, we consider the discrete-time version of the systems considered in the revious section and continue our analysis on that basis. Process model: We denote the state and control by x X R nx and u U R nu, resectively. The rocess model with f : X U X is defined as: x t+1 = f(x t, u t ) + ω t, ω t N (0, Σ ωt ) (9) where {ω t } is indeendent, identically distributed (i.i.d.). Now, we ose the general stochastic control roblem [1], [23]. Problem 1. Stochastic Control Problem for Fully Observed System: Given an initial state x 0, we wish to determine an otimal or near-otimal for min E[ c π t (x t, u t ) + c π π K(x K )] s.t. x t+1 = f(x t, u t ) + ω t, (10) where the otimization is over Markov, i.e., time-varying state-feedback, olicies, π Π, with π := {π 0,, π t }, π t : X U ; and u t = π t (x t ) secifying the action taken given the state; c π t (, ) : X U R is the one-ste cost function; c π K ( ) : X R denotes the terminal cost; K is the time horizon.

3 IV. SEPARATION OF OPEN LOOP AND CLOSED LOOP DESIGNS: FULLY OBSERVED SYSTEMS In this section, we rovide the theoretical basis for our design. The analysis emloys the Taylor series exansion of the rocess model and large deviations theory. A. Preliminary Analysis We start by roviding the nominal trajectory to linearize the rocess model. Then, we discuss the feedback law and comensate the rocess model with the feedback in order to use large deviations theory. Nominal Trajectory: We use the rocess model with zero noise to roagate the initial state, x 0, with a set of unknown controls {u t }, in order to obtain a arametrization of the feasible nominal trajectories as: x t+1 = f(x t, u t ), 0 t K 1, (11) where x 0 = x 0. Linearization of the rocess model: We linearize the rocess model of equation (9) around the nominal trajectory: x t+1 =A t x t +B t ũ t +ω t +o(e x,u t ), (12) where we have: A t (x t, u t ) = x f(x, u) x, denoted by A t; t,u t B t (x t, u t ) = u f(x, u) x, denoted by B t; t,u t x t := x t x t, the state error with resect to the nominal trajectory; ũ t := u t u t, the control error; and := x t + ũ t the error. e x,u t As the control inuts change, the underlying nominal trajectory also changes, and therefore the Jacobian matrices, A t, B t, and G t change, as well. The Taylor series exansion of equation (12) is valid as e x t 0, i.e., the linearized function remains close to the linearization region. In this equation, the only factor that can drive the linearized function away from the linearization region is the noise rocess ω t. Therefore, we establish robabilistic bounds on the validity of this equation using the small noise theory of Section II. Otimization over olicy sace: A feedback law with Linear Time-Varying (LTV) gain is sufficient to control a linearized model around a nominal trajectory. Therefore, we restrict the search to feedback olicies with LTV feedback gain, Π L. In the next section, we design a Linear Quadratic Regulator olicy (LQR) as a secial case for our design. Feedback controller: Assuming the controllability of the deterministic model of the system, we suose the existence of a feedback control law with LTV feedback gain to track and stabilize the trajectory of states around the nominaldesigned trajectory. Later, we exlain in detail how to design such a law. Thus, the control action error can be exressed as: ũ t = u t u t = L t (x t x t ), (13) where L t is the linear feedback gain. It is imortant to note that although we are working with the linearized system, the original system is a nonlinear system, and the design is tailored to work for the original system. Linearized system equation comensated with feedback: Relacing the feedback law in equation (12), we obtain: x t+1 =A t x t + B t ũ t + ω t + o(e x,u t ), =(A t B t L t ) x t + ω t + o(e x t ), =D t x t + ω t + o(e x t ), (14) where D t := A t B t L t, t 1 and e x,ω t := x t denotes the linearization-based error. Comensating the original system with feedback: Let us substitute for the control action in (9) using the feedback law of (13) as follows: x t+1 = f(x t, u t ) + ω t = f(x t, u t L t (x t x t )) + ω t. Using the last equation we define g : R X X, where g(t, x) =: f(x t, u t L t (x t x t )). (15) Note that the time-deendency for g stems from the timedeendency of the feedback law. Moreover, the nominal trajectory, {x t } K, satisfies the same equation as (11): x t+1 = g(t, x t ) = f(x t, u t L t (x t x t )) = f(x t, u t ). Note that linearizing g around the nominal trajectory yields (14), which itself is equivalent to equation (12) x g(t, x) t,x t (x t x t ) = x f(x, u t L t (x x t )) x t (x t x t ) = x f(x, u) x t,u t Lt(x x t ) (x t x t ) + u f(x, u) x t,u t Lt(x x t ) (u t L t (x x t )) x x (x t x t t ) = x f(x, u) x (x t,u t x t t ) + u f(x, u) x ( L t)(x t,u t t x t ) =A t (x t x t ) + B t ( L t )(x t x t ) = D t (x t x t ). Therefore, g(t, x t ) =D t (x t x t ) + ω t + o(e x t ), as e x,ω t 0. (16) Validity of the linearization: Let us analyze the validity of (12) using the Wentzell-Freidlin theory discussed in Section II. Let us assume that the noise rocess is ω t = ɛw t, where w t is a Wiener rocess as described in Section II, and ɛ > 0. Now, for a time-varying system, the robability that the error x t is less than a given δ > 0 can be calculated using large deviations theory. In articular, the discussion in Section II holds for rocess g. However, we require the function g to satisfy a uniform Lischitz continuity condition, for which uniform Lischitz continuity of rocess model f is sufficient. This is because, if f(x 1, u 1 ) f(x 2, u 2 ) K f ( x 1 x 2 + u 1 u 2 ), where x 1, x 2 R nx, and u 1, u 2 R nu, in addition to smoothness of the nominal trajectory (which is calculated as in (11)) on the interval [0, K], and we have the Lischitz continuity of g, as well. Effect of feedback on the linearization error: Note that before alying the feedback law, equation (9) deends on both u and ω. The influence of ω can be analyzed using large deviations theory; however, it is the feedback law that

4 limits the error of linearization caused by the control actions and converts the control action error into the state error. Moreover, the feedback effectively changes the drift term of the diffusion rocess and affects the validity region s robability through the action functional. B. Main Results In this section, we quantify the overall erformance obtained from the searated design. The roofs are rovided in the aendix. Lemma 1. State Error Proagation: Let ω t = ɛw t, where w t is a Gaussian rocess as described in section II, and ɛ > 0. Let the state error be x t = x t x t for t 0. Then, for t 0 the non-recursive state error roagation, x t+1, in terms of the indeendent variables, including rocess noise at each time ste can be written as follows: t x t+1 = D ω s,tω s + o(δ), as ɛ 0, (17) where we have: D 0 := A 0 D t1:t 2 = Π t2 t=t 1 D t, t 2 t 1 0, otherwise, it is the identity matrix; D ω s,t := D s+1:t, 0 s t 1, t 1; and D ω t,t := D t+1:t = I, t 0. The following lemma follows directly by taking into account the feedback law in the result of Lemma 1. Lemma 2. Control Error Proagation: Let ω t = ɛw t, where w t is a Gaussian rocess as described in section II, and ɛ > 0. Let the control error be ũ t = u t u t for t 0. Then, for t 0 the non-recursive control error roagation, ũ t+1, in terms of the indeendent variables, including rocess noise at each time ste can be written as follows: t ũ t+1 = L ω s,t+1ω s + o(δ), as ɛ 0, where L ω s,t+1 := L t+1 Dω s,t, t 0, t s 0. Moreover, the validity region of the above equation is the same as for (17) in Lemma 1. Next, we linearize of the cost function and rovide the searation result for a fully observed system. Linearization of the cost function: Using the Taylor aroximation around the nominal trajectories of state and control actions yields J = J + (C x t x t + C u t ũt) + C x K x K + o(e x,u ), (18) J 1 where we assume that the cost function is continuously differentiable. Moreover: J := c t(x t, u t )+c K (x K ) denotes the nominal cost; J 1 := J + (Cx t x t + C u t ũt) + C x K x K is the first order aroximation of the cost function; J1 := (Cx t x t + C u t ũt) + C x K x K is the first order error in the cost by our aroximation scheme. Therefore, J1 = J 1 J ; C x t = x c t (x, u) x C u t = u c t (x, u) x ; t,u t ; t,u t C x K = xc K (x) x ; and K e x,u J 1 error. := t=1 ( x t + ũ t ) + x K is the linearization Note that since the error term is in terms of state and control at all time stes, the robability of this equation holding true is equivalent to the robability of the latest time-ste term still being in the vicinity of the nominal trajectory at that ste. Therefore, the robability that this last equation is valid can be calculated as the robability that x K δ for δ > 0, which is given by equation (7) for rocess g defined in equation (15) and using D K = cl(b c δ (x K )) in Theorem 1. As a result, all the revious stes will remain within the same tube around the nominal trajectory and the total error will still be of the order of δ. Therefore, given this robability, we have: J = J + (C x t x t + C u t ũt) + C x K x K + o(δ), (19) ɛ 0. Hence, J J 1 = o(δ) as ɛ 0 with robability given in equation (7) for t = K. Next, we rovide the main result regarding the exected first order error of the cost function. Theorem 3. First Order Cost Function Error: Let us denote the first order cost function error by J 1. Given that rocess noises are zero mean i.i.d., under a first-order aroximation for the small noise aradigm, the stochastic cost function is dominated by the nominal art of the cost function. Moreover the exected first-order error is zero. That is, E[ J 1 ] = 0. Moreover, if the rocess noise at each time ste is distributed according to a zero mean Gaussian distribution, then J 1 also has a zero mean Gaussian distribution. The above result says that the random erturbation in the stochastic running cost form the nominal is zero mean if the linearization holds. From Wentzell-Freidlin theory, we have already established that the linearization holds with a robability exonentially close to 1 as ɛ 0. Hence, this imlies that the exected stochastic cost is equal to the nominal cost with a very high robability as ɛ 0. Therefore, it follows that the oen loo nominal design can be done searately from the closed loo design, summarized bellow: Corollary 1. Searation of the Closed Loo and Oen Design Under Small Noise Based on Theorem 3, under the small noise aradigm, as ɛ 0, the design of the feedback law can be done searately from the design of the oen loo otimized trajectory. Furthermore, this result holds with a robability that exonentially tends to one as ɛ 0.

5 Remark: This result means that under a small noise assumtion and assuming the existence of a feedback law (with LTV gain, which is designed searately), the oen loo nominal trajectory of the system can be designed by relacing the stochastic equations with their nominal counterarts. This design tends to the otimal design with robability one (for the general class of Gaussian rocesses that are considered) as the intensity of noise tends to zero. Remark: It should be mentioned that while our general roblem definition has only the rocess model as dynamics, other constraints on state or control can be considered as long as they share the same smoothness roerties as the cost function. Remark: It is worth mentioning that although we have considered diffusion rocesses with additive white Gaussian noise, the theory in fact holds for a larger class of roblems. On can aeal to more general results in [15] for time-inhomogeneous diffusion rocesses with non-additive white noise. In such cases, the action functional is usually calculated through the Legendre transform. Remark: As mentioned before, although we roved the results of this section for discrete time systems, one can rove the continuous-time versions of our results. This can be done, for instance, by reducing the samling time and limiting it to zero, while utilizing results such as Fubini s theorem along with the similar conditional exectation theorem on Itô s stochastic integrals to exchange the integrations with the exectation. It should be mentioned that there also exists a discrete-time counterart of the Wentzell-Freidlin theory as rovided in [15]. Remark: Higher order designs and analysis of the cost function (or even the dynamics) are ossible using a similar aroach rovided in this aer. Remark: In Ref. [17], for a secial case of nonlinear systems where the rocess model is linear in the control variable, i.e., f(x t, u t ) = f 1 (x t ) + f 2 (x t )u t, three results are roven. The first result, concerns the ɛ-otimality of the otimal deterministic law under convexity of J in the control (i.e., v T ( u,u J)v 0, v), and additional smoothness and regularity conditions. The second result concerns the ɛ 2 - otimality of the otimal deterministic law under a stronger convexity condition of J in the control (i.e., v T ( u,u J)v c( u ) v 2, v, c( ) : R R is a monotonically nonincreasing ositive function), and some smoothness and regularity conditions. The third result concerns the ɛ-otimality of the otimal deterministic sequence under the latter condition. Our result, on the other hand, rovides the ɛ-otimality of the roosed design aroach for a broader class of rocesses f(x t, u t ) with nonlinear deendence in the control variable and more general cost functions (most imortantly, does not assume the linear deendence on the control sequence). In fact, our simulations are erformed for a car-like robot with nonlinear deendence on the control variables. V. T-LQR: TRAJECTORY-OPTIMIZED LQR In this section, we rovide a design scheme based on the theory rovided in the revious sections. This aroach aims at designing an LQR controller with an otimal nominal underlying trajectory based on the searation result of Corollary 1 and Theorem 3. As a result, we term this method as the Trajectory-otimized LQR (T-LQR). Problem 2. Trajectory Planning Problem: Solve for the otimal trajectory: min u 0: c(x t, u t ) + c K (x K ) s.t. x t+1 = f(x t, u t ), 0 t K 1, x 0 = x 0. (20a) (20b) Otimized nominal trajectory: Problem 2 is a deterministic roblem aiming for the best nominal erformance. This roblem utilizes the first order aroximation of the cost function and otimizes the underlying nominal trajectory used in the design of the feedback law. We will denote the resulting otimized nominal trajectory of roblem 2 by {x o t } K, {u o t }. Feedback control: The resulting trajectory from the otimization roblem is otimized in terms of control effort and other constraints, such as a terminal constraint. Now, using the searation result, an LQR controller is designed to track the otimized nominal trajectory. Therefore, the LQR cost is designed for the tracking error x t x o t. The resulting control olicy is a feedback olicy with LTV gain, and the evolution of x t is obtained from the original equation of the rocess model during the execution. Although we utilize an LQR controller, it is imortant to note that the searation result only assumes a linear form of feedback and other tyes of designs [24] can be used as well. Linearization of system equations: For simlicity, we denote the Jacobian matrices and every other variable associated with the otimized nominal trajectory with a suerscrit o. The Jacobians are A o t = x f(x, u) x o t,u o, and Bo t t = u f(x, u) x o t,u o. t Problem 3. LQR Problem: Given the otimized nominal trajectory as {x o t } K and {u o t }, and a lanning horizon of K > 0, solve the following LQR roblem to track the nominal trajectory: K min [(x t x o u t ) T Wt x (x t x o t ) + (ũ o t 1) T Wt u ũo t 1] 0: t=1 s.t. x o t+1 = A o t x o t + B o t u o t, 0 t K 1 (21) where ũ o t = u t u o t and W u t, W x t 0 are ositive-definite matrices. Control olicy: The resulting control olicy of roblem 3 is a feedback olicy as follows [1]: ũ o t = L o t (x t x o t ), where the linear feedback gain L o t is: L o t = (W u t + (B o t ) T P f t+1 Bo t ) 1 (B o t ) T P f t+1 Ao t, and the matrix P f t is the result of backward iteration of the dynamic Riccati equation P f t 1 = (Ao t ) T P f t A o t

(a) Otimized trajectory of roblem(b) A tyical ground truth trajectory 2. with noise standard deviation equal to 10% of the maximum control signal. Fig. 1. Otimized vs.

6 (a) Otimized trajectory of roblem(b) A tyical ground truth trajectory 2. with noise standard deviation equal to 10% of the maximum control signal. Fig. 1. Otimized vs. a tyical execution trajectory for a car-like robot. (A o t ) T P f t B o t (W u t + (B o t ) T P f t B o t ) 1 (B o t ) T P f t A o t +W x t, which is solvable with a terminal condition P f K = Wx t. Remark: The comutations involved in roblem 2 is of the order of O(Kn 2 x) for tyically smooth dynamics for one iteration. Let us assume O(l) is the order of the number of iterations in the otimizer until convergence. The LQR olicy calculation is of order of O(Kn 3 x). Therefore, overall, the design aroach based on the searation rincile of Corollary 1 is O(lKn 2 x + Kn 3 x) for a tyical rocess model (such as our examle in the next section). The low comutational comlexity of this aroach results in fast relanning in case of deviations during execution. This renders the first scheme to be eminently imlementable for imlementation in on-line alications. Remark: For the secific class of roblems considered in [17] (see the last remark in Section IV) the design aroach of [17] requires calculation of the otimal control law through intractable dynamic rogramming. In contrast, the roosed design aroach in this aer utilizes the tractable solution of Maximum Princile roblem followed by an LQR design. Even imlementing the result of [17] through a model redictive aroach would require more comutations of at least an order of the lanning horizon (from O(K) to O(K 2 )). In such an imlementation, the online comutations of the aroach of [17] require O(lKn 2 x) calculations comared to only O(n 2 x) calculations in our algorithm. VI. EXAMPLE Let us consider a car-like four-wheel robot with rocess model [25]: v ẋ = v cos(θ), ẏ = v sin(θ), θ = tan(φ), (22) L where (x, y, θ) is the state, and (v, φ) is the control inut. We suose that, φ < φ max = π/2, v v max = 0.6, x 0 = ( 1.5, 0.5, 0), K = 20, and the time discretization eriod is 0.7. We incororate the control constraints and the terminal goal, x g = ( 0.5, 1, 0), in the cost function. Last, the initial control sequence used for the otimization is just a sequence of zero inuts. The rocess noise is additive mean zero Gaussian noise with a standard deviation equal to ɛ max t { u t 2 }. Figure 1a shows the result of the otimization roblem 2 whereas Fig. 1b shows a tyical ground truth trajectory with ɛ = 0.1. We have used MATLAB (a) Feedback-comensated system. (b) Oen-loo system. Fig. 2. Evolution of average NMSE as ɛ 0 for a feedback comensated and oen loo system with the same nominal trajectories. 2016b and its fmincon solver for simulations. In the next exeriment, we increase ɛ from to , in ste sizes of For each value of ɛ, we execute the resulting olicy 100 times and comute the average Normalized Mean Squared Error (NMSE) as: Average NMSE (%) = x x j x 2 100, (23) j=1 2 where x indicates the lanned trajectory and x j indicates the ground truth trajectory at jth exeriment. The results of this exeriment are shown in Fig. 2a, where the evolution of the average NMSE is deicted for various values of noise level ɛ. As indicated in this figure, as ɛ 0, the average NMSE tends to zero at an exonential rate, which is consistent with the theory develoed in Section II. Moreover, this figure indicates that through the feedback comensation, moderate noise levels can be tolerated, rather than just small levels. Last, Fig. 2b deicts the evolution of the average NMSE for an exeriment with the same setting as in Fig. 2a, excet that only the oen-loo lanned control sequence is alied during execution. As redicted by the theory, the error still decreases exonentially as the noise level decreases. However, the rate of convergence is about one-fifth of the revious rate. The results of Fig. 2 show that our design can be used for relatively moderate levels of noise, using the ower of feedback. Remark: In ractice, if at any oint in the execution the calculated error exceeds a threshold, very raid relanning can be triggered very fast due to the low comutational burden of the otimization roblem. VII. CONCLUSION We have resented a design aroach that searates the design of the oen-loo nominal trajectory and the closedloo feedback olicy for fully-observed nonlinear stochastic systems with Gaussian distributions. We have shown that under a small-noise assumtion, the stochastic cost function is dominated by the nominal art of the cost function and the exected first order linearization error is of mean zero. This results in a reliable raid lanning method that is rovably near-otimal. It can be used in robotic ath lanning and control, and otentially in other alications.

7 REFERENCES [1] P. R. Kumar and P. P. Varaiya, Stochastic Systems: Estimation, Identification, and Adative Control. Englewood Cliffs, NJ: Prentice- Hall, [2] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic rogramming and otimal control. Athena Scientific Belmont, MA, 1995, vol. 1, no. 2. [3] H. Kushner and P. G. Duuis, Numerical methods for stochastic control roblems in continuous time. Sringer Science & Business Media, 2013, vol. 24. [4] R. Bellman, Dynamic Programming, 1st ed. Princeton, NJ, USA: Princeton University Press, [5] C.-S. Chow and J. N. Tsitsiklis, The comlexity of dynamic rogramming, Journal of comlexity, vol. 5, no. 4, , [6] D. Mayne, Robust and stochastic mc: Are we going in the right direction? IFAC-PaersOnLine, vol. 48, no. 23,. 1 8, [7] D. Q. Mayne, Model redictive control: Recent develoments and future romise, Automatica, vol. 50, no. 12, , [8] J. N. Tsitsiklis, Comutational comlexity in markov decision theory, HERMIS-An International Journal of Comuter Mathematics and its Alications, vol. 9, , [9] Y. Le Tallec, Robust, risk-sensitive, and data-driven control of markov decision rocesses, Ph.D. dissertation, Massachusetts Institute of Technology, [10] R. E. Ko, Pontryagin maximum rincile, Mathematics in Science and Engineering, vol. 5, , [11] D. H. Jacobson and D. Q. Mayne, Differential dynamic rogramming, [12] E. Theodorou, Y. Tassa, and E. Todorov, Stochastic differential dynamic rogramming, in American Control Conference (ACC), IEEE, 2010, [13] E. Todorov and W. Li, A generalized iterative lqg method for locallyotimal feedback control of constrained nonlinear stochastic systems, in American Control Conference, Proceedings of the IEEE, 2005, [14] M. I. Freidlin and A. D. Wentzell, Random Perturbations. New York, NY: Sringer US, 1984, [15] A. D. Wentzell, Limit theorems on large deviations for Markov stochastic rocesses. Sringer Science & Business Media, 2012, vol. 38. [16] A. Dembo and O. Zeitouni, Large deviations techniques and alications. Sringer Science & Business Media, 2009, vol. 38. [17] W. H. Fleming, Stochastic control for small noise intensities, SIAM Journal on Control, vol. 9, no. 3, , [18] H. Cruz-Suárez and R. Ilhuicatzi-Roldán, Stochastic otimal control for small noise intensities: The discrete-time case, WSEAS Trans. Math., vol. 9, no. 2, , Feb [19] J. D. Perkins and R. W. H. Sargent, Nonlinear otimal stochastic control some aroximations when the noise is small. Berlin, Heidelberg: Sringer Berlin Heidelberg, 1976, [20] J. Perkins and R. Sargent, Nonlinear otimal stochastic controlsome aroximations when the noise is small, in IFIP Technical Conference on Otimization Techniques. Sringer, 1975, [21] C. J. Holland, An aroximation technique for small noise oen-loo control roblems, Otimal Control Alications and Methods, vol. 2, no. 1. [22] S. S. Varadhan and S. S. Varadhan, Large deviations and alications. SIAM, 1984, vol. 46. [23] D. Bertsekas, Dynamic Programming and Otimal Control: 3rd Ed. Athena Scientific, [24] P. Kumar et al., Control: a ersective, Automatica, vol. 50, no. 1,. 3 43, [25] S. Lavalle, Planning algorithms. Cambridge University Press, APPENDIX Proof. Lemma 1: State Error Proagation Ignoring the validity region, x t+1 =A t x t + B t ũ t + ω t = (A t B t L t ) x t + ω t =:D t x t +ω t =: D t t 0:t x 0 + D r+1:t ω r =: D ω s,tω s. r=0 Note that using the definition of x t, the initial state error is x 0 = x 0 x 0 = x 0 x 0 = 0. Likewise, the state error at time-ste 1 is x 1 = A 0 x 0 +ω 0 = ω 0. Moreover, these errors are consistent with the lemma using the definitions rovided and the indicator function notation. Now, since this equation utilizes the linearizations at all stes, its error is within o(δ), if x s δ for all s t. Moreover, the robability that equation (17) is valid (i.e., the linearizations are valid with o(δ) error for the entire trajectory u to time t) is the same as the robability that the linearization is valid on the last ste (i.e., ste t). This is due to Wentzell-Freidlin theory. Now, the robability that x t δ is given by (7) for rocess g defined in (15), and D t = cl(b δ (x t )) for Theorem 1. Therefore, as ɛ 0, the robability of x t x t δ is calculated as in equation (7), which tends exonentially to zero. Last, note that through Wentzell-Freidlin theory, the validity of linearization only deends on the aggregated effect of the random erturbations at stes rior to t, and there is no need to individually bound the noise at each ste. Proof. Lemma 2, Control Error Proagation Relacing state error in the control law: Using the result of Lemma 1, we can rewrite ũ t+1 for t 1 as follows: t t ũ t+1 = L t+1 x t+1 = L t+1 D ω s,tω s =: L ω s,t+1ω s. Note that ũ 0 = 0, and the last formula is consistent with this error using the definitions rovided in the lemma. Proof. Theorem 3, Cost Function Error Using the linearization rocess described reviously, we can write the cost function error as E[ J 1 ] = E[ (Cx t x t+c u t ũt)+c x K x K]. Utilizing the assumtion that the rocess noise is zero mean i.i.d., E[ω t ] = 0 for all t. Moreover, x 0 = 0 which follows from the fact that x 0 = x 0. Therefore, using the linearity of the exectation oerator and Lemmas 1 and 2, we can rewrite E[ J 1 ] as follows: E[ J 1 ]= (C x t E[ x t ] + C u t E[ũ t ]) + C x KE[ x K ] = t 1 C x t E[ D ω s,t 1ω s ]+ + C x KE[ D ω s,ω s ] t 1 C u t E[ L ω s,tω s ] t 1 = E[(C x D t ω s,t 1 C u t L ω s,t)ω s ]+ E[C x D K ω s,ω s ] t 1 t 1 n u K K =: E[(w s,t ) T ω s ] = ws,te[ω j s] j = 0. j=1 where w s,t := (C x D t ω s,t 1 C u t L ω s,t) T, t 1 s 0, K 1 t 0, w s,k := (C x D K ω s, )T, K 1 s 0. Moreover, w s,t := (ws,t, 1, ws,t nu ) T is a vector of the same size of ω s = (ωs, 1, ωs nu ) T.

Feedback-error control

Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller