arxiv: v1 [cs.sy] 3 Sep 2015

Size: px
Start display at page:

Download "arxiv: v1 [cs.sy] 3 Sep 2015"

Transcription

1 Model Based Reinforcement Learning with Final Time Horizon Optimization arxiv:9.86v [cs.sy] 3 Sep Wei Sun School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. wsun4@gatech.edu Evangelos Theodorou School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. evangelos.theodorou@ae.gatech.edu Panagiotis Tsiotras School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. tsiotras@gatech.edu Abstract We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous results in model based trajectory optimization. Our analysis shows that the proposed algorithm recovers the theoretical optimal solution on linear low dimensional problem. Finally we provide application results on nonlinear systems. Introduction Trajectory optimization is one of the most active areas of research in machine learning and control theory with a plethora of applications in robotics, autonomous systems and computational neuroscience. Among the different methodologies, Differential Dynamic Programming (DDP) is a model based reinforcement learning algorithm that relies on linear approximation of dynamics and quadratic approximations of cost functions along nominal trajectories. Even though there has been almost 4 years since the fundamental work by Jacobson and Mayne on Differential Dynamic Programming [], it is a fact that research on trajectory optimization and model based reinforcement learning is performed nowadays by having a main ingredient DDP. In the NIPS community, recently published state of art methods on trajectory optimization use DDP to perform guided policy search [] and data-efficient probabilistic trajectory optimization [3]. Earlier work on DDP includes minmax [4], control limited [], receding horizon [6, 7], and stochastic optimal control formulations [8, 9]. Despite all of this research on trajectory optimization using model based reinforcement learning methods such as DDP, there has not been any effort towards the development of model based trajectory optimization algorithms in which the time horizon is not a-priori specified. The time horizon is one of the important free tuning parameters in trajectory optimization algorithms and in most case is manually tuned based on the experience of the engineer. In this paper we present a new algorithm on model based reinforcement learning in which optimization is performed with respect to control and the time horizon. While free time horizon DDP has been initially derived by Jacobson and Mayne in [], the resulting algorithm is not implementable

2 and it relies on the assumption that the initialization of the algorithm starts close to the optimal control solution. This will become more clear in the next section as we present our analysis on free final time model based trajectory optimization. Problem Formulation and Analysis We consider model based reinforcement learning problems in which optimization occurs with respect to control and time horizon. In mathematical terms these problems are formulated as follows: [ tf ] V(x(t ),t ;ν,t f ) = minj(x( ),u( )) = min Φ(x(t f ),ν,t f )+ L(x(t),u(t),t), () u( ) u( ) t where the term Φ(x(t f ),ν,t f ) is defined as Φ(x(t f ),ν,t f ) = φ(x(t f ),t f ) + ν T ψ(x(t f ),t f ), φ(x(t f ),t f ) is the terminal cost, ψ(x(t f ),t f ) is the terminal constraint and ν is the corresponding Lagrange mulitplier. L(x(t),u(t),t) is the running cost accumulated along the time horizont f, which is not specified a-priori. The cost function J(x( ), u( )) in () is minimized subject to the dynamics: dx(t) = F(x(t),u(t),t), x = x(t ). () where x R n is the state and the u R m is the control of the dynamics. Note that the value functionv(x(t ),t ;ν,t f ) is now a function of the Lagrange multiplierν and the terminal timet f. This is important for the derivation of the free time horizon algorithm since expansions of the value function are computed not only with respect to nominal controls and state trajectories but also with respect to nominal ν and t f.. Derivation of Differential Dynamic Programming (DDP) with Free Final Time Our analysis and derivation of the free time horizon model based reinforcement learning is in continuous time. As it is shown, a set of backward ordinary differential equations is derived that back propagates the value function along the nominal trajectory. In particular, given a nominal trajectory ( x( ),ū( )) with nominal Lagrange multiplier ν and terminal time t f, we start our analysis with the linearization of the dynamics as follows: dx(t) = F( x(t)+δx(t),ū(t)+δu(t),t) dδx(t) = F x ( x(t),ū(t),t)δx(t)+f u ( x(t),ū(t),t)δu(t). (3) All the quantities in the derivation later are evaluated at( x(t),ū(t), ν, t f ) unless otherwise specified. Since our derivation is in continuous time, we consider the corresponding Hamilton-Jacobi-Bellman equation: V(x(t),t;ν,t f) t [ ] = min H(x(t),t;ν,t f ), (4) u(t) under the terminal conditionv(x(t f ),t f ;ν,t f ) = Φ(x(t f ),ν,t f ), and with the Hamiltonian functionh(x(t),t;ν,t f ) defined as follows: H(x(t),t;ν,t f ) = L(x(t),u(t),t)+V x (x(t),t;ν,t f ) T F(x(t),u(t),t). () We take expansions of the terms on both sides of 4 around ( x,ū, ν, t f ). Notice that this is in contrast with the derivation of free final time DDP in [] in which the expansion takes place around (x,u ). Hence, the key assumption in [] is thatū is close to the optimal controlu, which makes the algorithm hard to implement, especially when the optimal t f is not known a-priori. Moreover, expansion of the Hamiltonian H around u yields H u u=u =, which results in dropping terms from the derivation.

3 The left-hand side of (4) can be expanded as t V( x(t)+δx(t),t; ν +δν, t f + ) ( ) V( x(t),t; ν, t f )+V T x t δx(t)+vt ν δν +V t f + ( ] [δx(t) T δν T xx V xν V xtf δx(t) ) V νx V νν V νtf δν. t V tf x V tf ν V tf t f (6) Next we make use of the fact that d ( ) = t ( )+ x ( )T F( x(t),ū(t),t) t ( ) = d ( )+ x ( )T F( x(t),ū(t),t). (7) Based on the equation above we have that t V( x(t)+δx(t),t; ν +δν, t f + ) = d ( ) V( x(t),t; ν, t f )+V T xδx(t)+v T ν δν +V tf d ( ] [δx(t) T δν T xx V xν V xtf δx(t) ) V νx V νν V νtf δν V tf x V tf ν V tf t f (8) +V T x F V +δx(t)t xx F +δν T V νx F + V tf xf + ] [δx(t) T δν T xxxf V xνx F V xtf xf δx(t) V νxx F V ννx F V νtf xf δν. V tf xxf V tf νxf V tf t f xf The next step is to work with the expansion of the right-hand side of the HJB equation in (4). In particular, we have that V x ( x(t)+δx(t),t; ν +δν, t f + ) V x ( x(t),t; ν, t f )+V xx δx(t)+v xν δν +V xtf + ] [δx(t) T δν T xxx V xxν V xxtf δx(t) V xνx V xνν V xνtf δν. (9) V xtf x V xtf ν V xtf t f In addition, the running cost and the dynamics are expanded as follows: L(x(t),u(t),t) = L( x(t)+δx(t),ū(t)+δu(t),t) L( x(t),ū(t),t)+l T x δx(t)+lt u δu(t) + [ ][ ][ ] δx(t) T δu(t) T Lxx L xu δx(t), () L ux L uu δu(t) F(x(t),u(t),t) = F( x(t)+δx(t),ū(t)+δu(t),t) F( x(t),ū(t),t)+f x δx(t)+f u δu(t). () Therefore, the right hand side of (4) can be expressed as { min L( x(t),ū(t),t)+l T xδx(t)+l T uδu(t)+ [ ][ ][ ] δx(t) T δu(t) T Lxx L xu δx(t) δu(t) L ux L uu δu(t) + V T x F +VT x F xδx(t)+v T x F uδu(t)+ δx(t) T V xx F +δx(t) T V xx F x δx(t)+δx(t) T V xx F u δu(t) + δν T V νx F +δν T V νx F x δx(t)+δν T V νx F u δu(t)+ V tf xf + V tf xf x δx(t)+ V tf xf u δu(t) + ] [δx(t) T δν T xxxf V xνx F V xtf xf δx(t) } V νxx F V ννx F V νtf xf δν +H.O.T.. () V tf xxf V tf νxf V tf t f xf 3

4 Equating (8) with () and cancel like terms, we get d ( V +V T xδx(t)+v T ν δν +V tf + ] [δx(t) T δν T xx V xν V xtf δx(t) ) V νx V νν V νtf δν = V tf x V tf ν V tf t f { min L+L T xδx(t)+l T uδu(t)+ [ ][ ][ ] δx(t) T δu(t) T Lxx L xu δx(t) +V T δu(t) L ux L uu δu(t) xf x δx(t)+vxf T u δu(t) +δx(t) T V xx F x δx(t)+δx(t) T V xx F u δu(t)+δν T V νx F x δx(t)+δν T V νx F u δu(t) } + V tf xf x δx(t)+ V tf xf u δu(t). (3) To find the δu(t) that minimize the equation, we take derivative of the right hand side of (3) and set it to, = L u +L uu δu(t)+( L ux + LT xu +F T uv xx )δx(t)+f T uv x +F T uv xν δν +F T uv xtf. The update law for the control is thus given by (4) δu(t) = l(t)+k x (t)δx(t)+k ν (t)δν +K tf (t). () where the termsl(t),k x (t),k ν (t) andk tf (t) are defined as follows l(t) = L uu (L u +F T u V x), K x (t) = L uu ( L ux + LT xu +F T u V xx), K ν (t) = L uu F T u V xν, K tf (t) = L uu F T u V xt f. (6) Note that L uu is guaranteed to be invertible if the running cost L = g(x) + u T Ru, where R >. This type of cost is normal for a mechanical system where we would like to minimize the energy cost of the control. Substitution of the optimal policy variationδu back to the HJB equation results in a set of backward ordinary differential equations that propagate the expansion of the value function which consists of the terms V,V ν,v xx,v νν,v xν,v xtf and V νtf. These backward differential equations are given as follows d V = L lt L uu l, d V x = L x K T xl uu l+f T xv x, d V ν = K T νl u, d V t f = K T t f L u, d V xx = L xx K T x L uuk x +F T x V xx, d V νν = K T ν L uuk ν, (7) d V t f t f = K T t f L uu K tf, d V xν = L xu K ν +F T x V xν +V xx F u K ν, d V xt f = L xu K tf +F T x V xt f +V xx F u K tf, d V νt f = K T νf T uv xtf, where all the quantities are evaluated at ( x(t),ū(t), ν, t f ). To numerically solve the equations in (7) one has to compute the terminal conditions. In the next section we present the derivation for the terminal condition and provide an overview of the algorithm. 4

5 . Terminal Conditions The terminal conditions can be determined by the following procedure. From V(x(t ),t ;ν,t f ) = minj(x( ),u( )) = min u( ) u( ) we have that for anyt [t,t f ], V(x(t),t;ν,t f ) = minj(x( ),u( )) = min u( ) u( ) Therefore, =min u( ) { tf { tf t } L(x(t),u(t))+Φ(x(t f ),ν,t f ), (8) t } L(x(s),u(s))ds+Φ(x(t f ),ν,t f ). (9) V( x( t f )+δx( t f ), t f ; ν +δν, t f + ) { t f + } L(x(t),u(t),t)+Φ(x( t f + ), ν +δν, t f + ) t f L( x( t f ),ū( t f ),t) +Φ( x( t f )+δx( t f )+ x( t f ), ν +δν, t f + ) L( x( t f ),ū( t f ),t) +Φ( x( t f ), ν, t f )+Φ T x (δx( t f )+F( x( t f ),ū( t f ), t f ) ) () +Φ ν ( x( t f ),ū( t f ), t f ) T δν +Φ tf ( x( t f ),ū( t f ), t f ) [ ] T + δx+fδtf δν Φ [ ] xx Φ xν Φ xtf δx+fδtf Φ νx Φ νν Φ νtf δν Φ tf x Φ tf ν Φ tf t f () V( x( t f )+δx( t f ), t f ; ν +δν, t f + ) = Φ+Φ T xδx+φ T νδν +(L+Φ T xf +Φ tf ) + ] Φ xx Φ xν Φ xtf +Φ xx F [ ] δx(t) [δx(t) T δν T Φ νx Φ νν Φ νtf +Φ νx F δν, Φ tf x +F T Φ xx Φ tf ν +F T Φ xν Φ tf t f +Φ tf xf +F T Φ xx F () wherex( t f + ) is evaluated by x( t f )+δx( t f )+ x( t f ) and x( t f ) = F( x( t f ),ū( t f ), t f ). The arguments of the functions in the last line of equations are the same as those in the previous equations and are thus omitted. The minimization with respect to u is dropped on the third line of equations because t f + t f L(x(t),u(t)) is evaluated byl( x( t f ),ū( t f )) and the latter is only a function of the nominal control. Note that this approximation is relatively rough, since we approximate L(x(t),u(t)) byl( x( t f ),ū( t f )) instead ofl(x( t f ),u( t f )) and follow up with an t f + t f expansion on x( t f ) and u( t f ). But the simulation results suggest that such level of approximation is good enough. Hence, att = t f, the terminal conditions are V( x( t f ), t f ; ν, t f ) = Φ( x( t f ), ν, t f ), V x ( x( t f ), t f ; ν, t f ) = Φ x ( x( t f ), ν, t f ), V ν ( x( t f ), t f ; ν, t f ) = Φ ν ( x( t f ), ν, t f ), V tf ( x( t f ), t f ; ν, t f ) = L( x( t f ),ū( t f ))+Φ x ( x( t f ), ν, t f ) T F( x( t f ),ū( t f ))+Φ tf ( x( t f ), ν, t f ), V xx ( x( t f ), t f ; ν, t f ) = Φ xx ( x( t f ), ν, t f ), V νν ( x( t f ), t f ; ν, t f ) = Φ νν ( x( t f ), ν, t f ), V tf t f ( x( t f ), t f ; ν, t f ) = Φ tf t f ( x( t f ), ν, t f )+Φ tf x( x( t f ), ν, t f )F( x( t f ),ū( t f )) +F( x( t f ),ū( t f )) T Φ xx ( x( t f ), ν, t f )F( x( t f ),ū( t f )), V xν ( x( t f ), t f ; ν, t f ) = Φ xν ( x( t f ), ν, t f ), V xtf ( x( t f ), t f ; ν, t f ) = Φ xtf ( x( t f ), ν, t f )+Φ xx ( x( t f ), ν, t f )F( x( t f ),ū( t f )), V νtf ( x( t f ), t f ; ν, t f ) = Φ νtf ( x( t f ), ν, t f )+Φ νx ( x( t f ), ν, t f )F( x( t f ),ū( t f )). (3)

6 Given the boundary conditions of the value function and its derivatives, we can back-propagate the differential equations we derived earlier to find their values. Our next step is to update the control through (), and in order to do so, we need to find the update law ofδν and. We follow the derivation in [] and set [ ] [ ] [ ] δν Vνν ( x(t = ζ ),t ; ν, t f ) V νtf ( x(t ),t ; ν, t f ) Vν ( x( t f ), t f ; ν, t f ), (4) V tf ν( x(t ),t ; ν, t f ) V tf t f ( x(t ),t ; ν, t f ) V tf ( x( t f ), t f ; ν, t f ) whereζ [,] is introduced to ensure that the update ofδν and are not too large. 3 Simulation Results 3. Double Integrator We first apply the algorithm on a simple system, namely, the double integrator. We compare our numerical result with the analytical solution to verify the algorithm. The dynamics is given by ẋ = x, and ẋ = u. Initial condition is x() = [x ();x ()] = [;]. The cost J = t f ( + Ru ). Terminal constraint is x =. Introducing the Lagrange multiplier ν, the cost can be reformulated as J = ν(x ) + t f ( + Ru ). Given different values of R, we can find different optimal cost and terminal time. In particular, let R =.,,, the corresponding terminal times are.89,.46,.9, respectively. Optimal control an f per iteration whenr =.,, are shown in Figure. Now we solve this problem analytically to verify the simulation results. Denote the co-states by λ = [λ ;λ ], the Hamiltonian is given by The co-states satisfy the adjoint equations H = + Ru +λ x +λ u. () λ = H =, (6) x λ = H = λ. (7) x Utilizing the Pontryagin s minimum principle, the optimal control u can be calculated from = H u = Ru + λ. Hence, u = λ /R. Transversality conditions are such that λ (t f ) = ν, λ (t f ) =, H(t f ) =. Given the previous information, we are ready to solve the problem. From λ = andλ (t f ) = ν, we get λ (t) ν,t [,t f ]. Then from λ = λ andλ (t f ) =, we haveλ (t) = ν(t f t). Therefore, u = λ /R = ν R (t t f). (8) Note that the optimal control is a linear function of t. Furthermore, boundary conditions yields t f = (9 R) 4 and ν = 3 t f = 3 (9 R) 4. When R =.,,, t f =.89,.46,.9, respectively, which is consistent with the numerical simulation results presented in the control plots in Figure. 3. Cart Pole In this subsection, we apply our algorithm on the inverted pendulum on a cart, as known as the cart pole problem, with M = the mass of the cart, m = and l =. are the mass and length of the pendulum, g = 9.8 the gravitational acceleration and u the force applied to the cart. The state x = [x,ẋ,θ, θ]. The goal is to bring the state from x() = [,,π,] to p = [,,,], which represents the case where the pendulum is pointing strait up. The cost function is given by J = tf [c t +(x p) T Q(x p)+u T R u u]+λ T ([x 3 (t f );x 4 (t f )] [p 3 ;p 4 ]), 6

7 Control t f per iteration. Control (a) R=. t per iteration f (b) R=. Control (c) R= t f per iteration (d) R=... 3 (e) R= (f) R= Figure : Figure a, c, e show the numerical optimal control for the cases when R =.,,, respectively. Figure b, d, f present the t f per iteration for the cases when R =.,,, respectively. Dashed red lines represents the according analytical optimalt f. where Q = diag{,,,} and R u =.. Initial values are given as t f =, λ = [,] T. The multipliers γ =. and ǫ =.. We run the algorithm for 3 iterations and the convergence is achieved at around th iteration. Figure 3a presents the optimal control u. The corresponding optimal trajectories of the states are depicted in Figure. Cost and t f per iteration are shown in Figure 3b and 3c, respectively. x ẋ 4 θ θ Figure : Optimal trajectories of the states in blue. Red lines represent the desired terminal states. Control.. (a) Optimal control of the cart pole system. Cost per iteration 3 3 (b) Cost per iteration tf per iteration. 3 (c)t f per iteration. Figure 3: Cost an f per iteration for the cart pole system. 7

8 3.3 Quadrotor The dynamic model of the quadrotor includes 6 states: 3 for the position (r = (x,y,z) T ), 3 for the Euler angles (Φ = (φ,θ,ψ) T ), 3 for the velocity (ṙ = (ẋ,ẏ,ż) T ), 3 for the body angular rates ( Φ = (p,q,r) T ) and 4 for the motor speeds (Ω = (ω,ω,ω 3,ω 4 ) T ). The corresponding dynamics of the quadrotor is given as follows: dx = f(x)+gu, (9) where x = [r,φ,ṙ, Φ,Ω] T R 6, and u = (u,u,u 3,u 4 ) T R 4 is the control vector, where u represents the thrust force, and u,u 3,u 4 represent the pitching, rolling, yawing moments, respectively. The corresponding cost function is defined as J = tf [c t + (x p) T Q(x p) + u T R u u] + (x(t f) p) T Q f (x(t f ) p) + λ T ([x (t f );...;x 6 (t f )] [p ;...;p 6 ]), where p = [p ;...;p 6 ] R 6 denotes the desired terminal states. In the simulation, we set { 7,i =,,3;,i = 3; 6,i = 4,...,9; p(i) = and Q f (i,i) =, otherwise, (3),i =,,;, otherwise, and all the off-diagonal terms are assigned to. Q =.Q f. R u =.I. γ =. and ǫ =.. The desired terminal state p is chosen for the quadrotor to execute the take-off maneuver. iterations are included to ensure the convergence and the cost per iteration is presented in Figure b. The corresponding optimal state trajectories are shown in Figure 4. Optimal controlu is illustrated in Figure a. t f per iteration is presented in Figure c. x x 4 x x 4 x3 x4 x 4. x x 4 x6 x 8 x7 x 4 x8 x 4 x9 x x 4 x x 4 x x 9 x3 x4 x x Figure 4: Optimal trajectories of the states of the quadrotor in blue. Dashed red lines represent the desired terminal states. Acknowledgments References [] D.H. Jacobson and D.Q Mayne. Differential dynamic programming. Elsevier Sci. Publ., 97. [] Sergey Levine and Pieter Abbeel. Learning neural network policies with guided policy search under unknown dynamics. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. 8

9 Control x 4 4 Control 7. 8 x 4 7 Cost per iteration. t f per iteration Control3 3 x Control4 3 (a) Optimal controls (b) Cost per iteration.. (c)t f per iteration. Figure : Cost an f per iteration for the quadrotor. Weinberger, editors, Advances in Neural Information Processing Systems 7, pages Curran Associates, Inc., 4. [3] Y. Pan and E. Theodorou. Probabilistic differential dynamic programming. In Advances in Neural Information Processing Systems (NIPS), pages 97 9, 4. [4] J. Morimoto, G. Zeglin, and C. G Atkeson. Minimax differential dynamic programming: Application to a biped walking robot. In Proceedings of 3 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 3)., volume, pages IEEE, 3. [] Y. Tassa, N. Mansard, and E. Todorov. Control-limited differential dynamic programming. In 4 IEEE International Conference on Robotics and Automation (ICRA),, pages IEEE, 4. [6] Y. Tassa, T. Erez, and W. D. Smart. Receding horizon differential dynamic programming. In NIPS, 7. [7] P. Abbeel, A. Coates, M. Quigley, and A. Y Ng. An application of reinforcement learning to aerobatic helicopter flight. Advances in Neural Information Processing Systems (NIPS), 9:, 7. [8] E. Todorov and W. Li. A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems. In American Control Conference,, pages IEEE,. [9] E. Theodorou, Y. Tassa, and E. Todorov. Stochastic differential dynamic programming. In American Control Conference (ACC),, pages 3. IEEE,. 9

Game Theoretic Continuous Time Differential Dynamic Programming

Game Theoretic Continuous Time Differential Dynamic Programming Game heoretic Continuous ime Differential Dynamic Programming Wei Sun, Evangelos A. heodorou and Panagiotis siotras 3 Abstract In this work, we derive a Game heoretic Differential Dynamic Programming G-DDP

More information

Data-Driven Differential Dynamic Programming Using Gaussian Processes

Data-Driven Differential Dynamic Programming Using Gaussian Processes 5 American Control Conference Palmer House Hilton July -3, 5. Chicago, IL, USA Data-Driven Differential Dynamic Programming Using Gaussian Processes Yunpeng Pan and Evangelos A. Theodorou Abstract We present

More information

Probabilistic Differential Dynamic Programming

Probabilistic Differential Dynamic Programming Probabilistic Differential Dynamic Programming Yunpeng Pan and Evangelos A. Theodorou Daniel Guggenheim School of Aerospace Engineering Institute for Robotics and Intelligent Machines Georgia Institute

More information

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4. Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation

More information

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic

More information

Stochastic Differential Dynamic Programming

Stochastic Differential Dynamic Programming Stochastic Differential Dynamic Programming Evangelos heodorou, Yuval assa & Emo odorov Abstract Although there has been a significant amount of work in the area of stochastic optimal control theory towards

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

arxiv: v1 [cs.ro] 15 May 2017

arxiv: v1 [cs.ro] 15 May 2017 GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems Gilwoo Lee Siddhartha S. Srinivasa Matthew T. Mason arxiv:1705.05344v1 [cs.ro] 15 May 2017 Abstract As we aim to control

More information

Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion

Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion J. Morimoto and C. Atkeson Human Information Science Laboratories, ATR International, Department 3 2-2-2

More information

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Jun Morimoto Human Information Science Labs, Department 3, ATR International Keihanna Science City, Kyoto, JAPAN, 619-0288

More information

Suboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1)

Suboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1) Suboptimality of minmax MPC Seungho Lee In this paper, we consider particular case of Model Predictive Control (MPC) when the problem that needs to be solved in each sample time is the form of min max

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018 EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

Differential Dynamic Programming with Nonlinear Constraints

Differential Dynamic Programming with Nonlinear Constraints Differential Dynamic Programming with Nonlinear Constraints Zhaoming Xie 1 C. Karen Liu 2 Kris Hauser 3 Abstract Differential dynamic programming (DDP) is a widely used trajectory optimization technique

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Stochastic and Adaptive Optimal Control

Stochastic and Adaptive Optimal Control Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Optimal Control Theory

Optimal Control Theory Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal

More information

Principles of Optimal Control Spring 2008

Principles of Optimal Control Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 16.323 Principles of Optimal Control Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 16.323 Lecture

More information

Derivative-Free Trajectory Optimization with Unscented Dynamic Programming

Derivative-Free Trajectory Optimization with Unscented Dynamic Programming Derivative-Free Trajectory Optimization with Unscented Dynamic Programming Zachary Manchester and Scott Kuindersma Abstract Trajectory optimization algorithms are a core technology behind many modern nonlinear

More information

Feedback Optimal Control of Low-thrust Orbit Transfer in Central Gravity Field

Feedback Optimal Control of Low-thrust Orbit Transfer in Central Gravity Field Vol. 4, No. 4, 23 Feedback Optimal Control of Low-thrust Orbit Transfer in Central Gravity Field Ashraf H. Owis Department of Astronomy, Space and Meteorology, Faculty of Science, Cairo University Department

More information

Pontryagin s maximum principle

Pontryagin s maximum principle Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin

More information

Variable Impedance Control A Reinforcement Learning Approach

Variable Impedance Control A Reinforcement Learning Approach Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness

More information

Inverse Optimality Design for Biological Movement Systems

Inverse Optimality Design for Biological Movement Systems Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google

More information

Optimal Control, Trajectory Optimization, Learning Dynamics

Optimal Control, Trajectory Optimization, Learning Dynamics Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Optimal Control, Trajectory Optimization, Learning Dynamics Katerina Fragkiadaki So far.. Most Reinforcement Learning

More information

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832

Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832 Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832 Russ Tedrake Massachusetts Institute of Technology c Russ Tedrake, 2009 2 c Russ Tedrake,

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67 CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Chapter2 p. 1/67 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Main Purpose: Introduce the maximum principle as a necessary condition to be satisfied by any optimal

More information

Quadrotor Modeling and Control

Quadrotor Modeling and Control 16-311 Introduction to Robotics Guest Lecture on Aerial Robotics Quadrotor Modeling and Control Nathan Michael February 05, 2014 Lecture Outline Modeling: Dynamic model from first principles Propeller

More information

Pontryagin s Minimum Principle 1

Pontryagin s Minimum Principle 1 ECE 680 Fall 2013 Pontryagin s Minimum Principle 1 In this handout, we provide a derivation of the minimum principle of Pontryagin, which is a generalization of the Euler-Lagrange equations that also includes

More information

Learning a Low-Level Motor Controller for UAVs

Learning a Low-Level Motor Controller for UAVs Learning a Low-Level Motor Controller for UAVs Joseph Lorenzetti Abstract Many control algorithms for Unmanned Aerial Vehicles (UAVs) have been proven to be effective for standard flight tasks under nominal

More information

Optimization-Based Control. Richard M. Murray Control and Dynamical Systems California Institute of Technology

Optimization-Based Control. Richard M. Murray Control and Dynamical Systems California Institute of Technology Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology Version v2.1b (2 Oct 218) c California Institute of Technology All rights reserved. This manuscript

More information

Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems

Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems Akihiko Yamaguchi 1 and Christopher G. Atkeson 1 Abstract We explore a model-based approach to reinforcement learning

More information

Solution of Stochastic Optimal Control Problems and Financial Applications

Solution of Stochastic Optimal Control Problems and Financial Applications Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty

More information

arxiv: v2 [cs.lg] 16 Feb 2016

arxiv: v2 [cs.lg] 16 Feb 2016 Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search Tianhao Zhang, Gregory Kahn, Sergey Levine, Pieter Abbeel arxiv:1509.06791v2 [cs.lg] 16 Feb 2016 Abstract Model

More information

Numerical Methods for Constrained Optimal Control Problems

Numerical Methods for Constrained Optimal Control Problems Numerical Methods for Constrained Optimal Control Problems Hartono Hartono A Thesis submitted for the degree of Doctor of Philosophy School of Mathematics and Statistics University of Western Australia

More information

Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems

Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems Submitted to the st International Conference on Informatics in Control, Automation and Robotics Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems Weiwei Li Department

More information

Numerical Optimal Control Overview. Moritz Diehl

Numerical Optimal Control Overview. Moritz Diehl Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path constraints h(x, u) 0 initial value x0 states x(t) terminal constraint r(x(t )) 0 controls u(t) 0 t T minimize

More information

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Pontus Giselsson Department of Automatic Control LTH Lund University Box 118, SE-221 00 Lund, Sweden pontusg@control.lth.se

More information

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function: Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic

More information

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight

More information

Two-Point Boundary Value Problem and Optimal Feedback Control based on Differential Algebra

Two-Point Boundary Value Problem and Optimal Feedback Control based on Differential Algebra Two-Point Boundary Value Problem and Optimal Feedback Control based on Differential Algebra Politecnico di Milano Department of Aerospace Engineering Milan, Italy Taylor Methods and Computer Assisted Proofs

More information

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Hamilton-Jacobi-Bellman Equation Feb 25, 2008 Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS

More information

Variance Reduction for Policy Gradient Methods. March 13, 2017

Variance Reduction for Policy Gradient Methods. March 13, 2017 Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits

More information

Chapter 2 Optimal Control Problem

Chapter 2 Optimal Control Problem Chapter 2 Optimal Control Problem Optimal control of any process can be achieved either in open or closed loop. In the following two chapters we concentrate mainly on the first class. The first chapter

More information

EVALUATING MODEL FREE POLICY OPTIMIZATION STRATEGIES FOR NON LINEAR SYSTEMS

EVALUATING MODEL FREE POLICY OPTIMIZATION STRATEGIES FOR NON LINEAR SYSTEMS EVALUATING MODEL FREE POLICY OPTIMIZATION STRATEGIES FOR NON LINEAR SYSTEMS BY ADITYA H CHUKKA A thesis submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey in partial

More information

moments of inertia at the center of gravity (C.G.) of the first and second pendulums are I 1 and I 2, respectively. Fig. 1. Double inverted pendulum T

moments of inertia at the center of gravity (C.G.) of the first and second pendulums are I 1 and I 2, respectively. Fig. 1. Double inverted pendulum T Real-Time Swing-up of Double Inverted Pendulum by Nonlinear Model Predictive Control Pathompong Jaiwat 1 and Toshiyuki Ohtsuka 2 Abstract In this study, the swing-up of a double inverted pendulum is controlled

More information

Runga-Kutta Schemes. Exact evolution over a small time step: Expand both sides in a small time increment: d(δt) F (x(t + δt),t+ δt) Ft + FF x ( t)

Runga-Kutta Schemes. Exact evolution over a small time step: Expand both sides in a small time increment: d(δt) F (x(t + δt),t+ δt) Ft + FF x ( t) Runga-Kutta Schemes Exact evolution over a small time step: x(t + t) =x(t)+ t 0 d(δt) F (x(t + δt),t+ δt) Expand both sides in a small time increment: x(t + t) =x(t)+x (t) t + 1 2 x (t)( t) 2 + 1 6 x (t)+

More information

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &

More information

Homework Solution # 3

Homework Solution # 3 ECSE 644 Optimal Control Feb, 4 Due: Feb 17, 4 (Tuesday) Homework Solution # 3 1 (5%) Consider the discrete nonlinear control system in Homework # For the optimal control and trajectory that you have found

More information

Minimization with Equality Constraints

Minimization with Equality Constraints Path Constraints and Numerical Optimization Robert Stengel Optimal Control and Estimation, MAE 546, Princeton University, 2015 State and Control Equality Constraints Pseudoinverse State and Control Inequality

More information

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28 OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from

More information

SWEEP METHOD IN ANALYSIS OPTIMAL CONTROL FOR RENDEZ-VOUS PROBLEMS

SWEEP METHOD IN ANALYSIS OPTIMAL CONTROL FOR RENDEZ-VOUS PROBLEMS J. Appl. Math. & Computing Vol. 23(2007), No. 1-2, pp. 243-256 Website: http://jamc.net SWEEP METHOD IN ANALYSIS OPTIMAL CONTROL FOR RENDEZ-VOUS PROBLEMS MIHAI POPESCU Abstract. This paper deals with determining

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

Learning the optimal state-feedback using deep networks

Learning the optimal state-feedback using deep networks Learning the optimal state-feedback using deep networks Carlos Sánchez-Sánchez and Dario Izzo Advanced Concepts Team European Space Agency Noordwijk, The Netherlands Email: carlos.sanchez@esa.int, dario.izzo@esa.int

More information

Stochastic optimal control theory

Stochastic optimal control theory Stochastic optimal control theory Bert Kappen SNN Radboud University Nijmegen the Netherlands July 5, 2008 Bert Kappen Introduction Optimal control theory: Optimize sum of a path cost and end cost. Result

More information

Proper Orthogonal Decomposition. POD for PDE Constrained Optimization. Stefan Volkwein

Proper Orthogonal Decomposition. POD for PDE Constrained Optimization. Stefan Volkwein Proper Orthogonal Decomposition for PDE Constrained Optimization Institute of Mathematics and Statistics, University of Constance Joined work with F. Diwoky, M. Hinze, D. Hömberg, M. Kahlbacher, E. Kammann,

More information

Adaptive Nonlinear Control Allocation of. Non-minimum Phase Uncertain Systems

Adaptive Nonlinear Control Allocation of. Non-minimum Phase Uncertain Systems 2009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 10-12, 2009 ThA18.3 Adaptive Nonlinear Control Allocation of Non-minimum Phase Uncertain Systems Fang Liao, Kai-Yew Lum,

More information

Dynamics and Control of Rotorcraft

Dynamics and Control of Rotorcraft Dynamics and Control of Rotorcraft Helicopter Aerodynamics and Dynamics Abhishek Department of Aerospace Engineering Indian Institute of Technology, Kanpur February 3, 2018 Overview Flight Dynamics Model

More information

Robot Control Basics CS 685

Robot Control Basics CS 685 Robot Control Basics CS 685 Control basics Use some concepts from control theory to understand and learn how to control robots Control Theory general field studies control and understanding of behavior

More information

Minimum Angular Acceleration Control of Articulated Body Dynamics

Minimum Angular Acceleration Control of Articulated Body Dynamics Minimum Angular Acceleration Control of Articulated Body Dynamics Javier R. Movellan UCSD Abstract As robots find applications in daily life conditions it becomes important to develop controllers that

More information

High-Order Local Dynamic Programming

High-Order Local Dynamic Programming High-Order Local Dynamic Programming Yuval Tassa Interdisciplinary Center for Neural Computation Hebrew University, Jerusalem, Israel tassa@alice.nc.hui.ac.il Emanuel Todorov Applied Mathematics and Computer

More information

2 Lyapunov Stability. x(0) x 0 < δ x(t) x 0 < ɛ

2 Lyapunov Stability. x(0) x 0 < δ x(t) x 0 < ɛ 1 2 Lyapunov Stability Whereas I/O stability is concerned with the effect of inputs on outputs, Lyapunov stability deals with unforced systems: ẋ = f(x, t) (1) where x R n, t R +, and f : R n R + R n.

More information

OPTIMAL SPACECRAF1 ROTATIONAL MANEUVERS

OPTIMAL SPACECRAF1 ROTATIONAL MANEUVERS STUDIES IN ASTRONAUTICS 3 OPTIMAL SPACECRAF1 ROTATIONAL MANEUVERS JOHNL.JUNKINS Texas A&M University, College Station, Texas, U.S.A. and JAMES D.TURNER Cambridge Research, Division of PRA, Inc., Cambridge,

More information

Bayesian Decision Theory in Sensorimotor Control

Bayesian Decision Theory in Sensorimotor Control Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced

More information

On the stability of receding horizon control with a general terminal cost

On the stability of receding horizon control with a general terminal cost On the stability of receding horizon control with a general terminal cost Ali Jadbabaie and John Hauser Abstract We study the stability and region of attraction properties of a family of receding horizon

More information

Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy

Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Ali Heydari, Member, IEEE Abstract Adaptive optimal control using value iteration initiated from

More information

Optimization-Based Control

Optimization-Based Control Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v2.1a, February 15, 21 c California Institute of Technology All rights reserved. This

More information

Theory and Implementation of Biomimetic Motor Controllers

Theory and Implementation of Biomimetic Motor Controllers Theory and Implementation of Biomimetic Motor Controllers Thesis submitted for the degree of Doctor of Philosophy by Yuval Tassa Submitted to the Senate of the Hebrew University of Jerusalem February 2011

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement

More information

Optimal control of nonlinear systems with input constraints using linear time varying approximations

Optimal control of nonlinear systems with input constraints using linear time varying approximations ISSN 1392-5113 Nonlinear Analysis: Modelling and Control, 216, Vol. 21, No. 3, 4 412 http://dx.doi.org/1.15388/na.216.3.7 Optimal control of nonlinear systems with input constraints using linear time varying

More information

Leveraging Dynamical System Theory to Incorporate Design Constraints in Multidisciplinary Design

Leveraging Dynamical System Theory to Incorporate Design Constraints in Multidisciplinary Design Leveraging Dynamical System Theory to Incorporate Design Constraints in Multidisciplinary Design Bradley A. Steinfeldt and Robert D. Braun Georgia Institute of Technology, Atlanta, GA 3332-15 This work

More information

Parametric Integrated Perturbation Analysis - Sequential Quadratic Programming Approach for Minimum-time Model Predictive Control

Parametric Integrated Perturbation Analysis - Sequential Quadratic Programming Approach for Minimum-time Model Predictive Control Preprints of the 9th World Congress The International Federation of Automatic Control Cape Town, South Africa. August 4-9, 4 Parametric Integrated Perturbation Analysis - Sequential Quadratic Programming

More information

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011 Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance

More information

THE HAMILTON-JACOBI THEORY FOR SOLVING OPTIMAL FEEDBACK CONTROL PROBLEMS WITH GENERAL BOUNDARY CONDITIONS

THE HAMILTON-JACOBI THEORY FOR SOLVING OPTIMAL FEEDBACK CONTROL PROBLEMS WITH GENERAL BOUNDARY CONDITIONS THE HAMILTON-JACOBI THEORY FOR SOLVING OPTIMAL FEEDBACK CONTROL PROBLEMS WITH GENERAL BOUNDARY CONDITIONS by Chandeok Park A dissertation submitted in partial fulfillment of the requirements for the degree

More information

with Application to Autonomous Vehicles

with Application to Autonomous Vehicles Nonlinear with Application to Autonomous Vehicles (Ph.D. Candidate) C. Silvestre (Supervisor) P. Oliveira (Co-supervisor) Institute for s and Robotics Instituto Superior Técnico Portugal January 2010 Presentation

More information

EE C128 / ME C134 Feedback Control Systems

EE C128 / ME C134 Feedback Control Systems EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of

More information

CS491/691: Introduction to Aerial Robotics

CS491/691: Introduction to Aerial Robotics CS491/691: Introduction to Aerial Robotics Topic: Midterm Preparation Dr. Kostas Alexis (CSE) Areas of Focus Coordinate system transformations (CST) MAV Dynamics (MAVD) Navigation Sensors (NS) State Estimation

More information

Optimal Control - Homework Exercise 3

Optimal Control - Homework Exercise 3 Optimal Control - Homework Exercise 3 December 17, 2010 In this exercise two different problems will be considered, first the so called Zermelo problem the problem is to steer a boat in streaming water,

More information

Physics 106a, Caltech 13 November, Lecture 13: Action, Hamilton-Jacobi Theory. Action-Angle Variables

Physics 106a, Caltech 13 November, Lecture 13: Action, Hamilton-Jacobi Theory. Action-Angle Variables Physics 06a, Caltech 3 November, 08 Lecture 3: Action, Hamilton-Jacobi Theory Starred sections are advanced topics for interest and future reference. The unstarred material will not be tested on the final

More information

Nonlinear and Neural Network-based Control of a Small Four-Rotor Aerial Robot

Nonlinear and Neural Network-based Control of a Small Four-Rotor Aerial Robot Nonlinear and Neural Network-based Control of a Small Four-Rotor Aerial Robot Holger Voos Abstract Small four-rotor aerial robots, so called quadrotor UAVs, have an enormous potential for all kind of neararea

More information

A DYNAMIC GAME APPROACH TO TRAINING ROBUST

A DYNAMIC GAME APPROACH TO TRAINING ROBUST A DYNAMIC GAME APPROACH TO TRAINING ROBUST DEEP POLICIES. Anonymous authors Paper under double-blind review ABSTRACT We present a method for evaluating the sensitivity of deep reinforcement learning (RL)

More information

Poincaré-Map-Based Reinforcement Learning For Biped Walking

Poincaré-Map-Based Reinforcement Learning For Biped Walking Poincaré-Map-Based Reinforcement Learning For Biped Walking Jun Morimoto,, Jun Nakanishi,, Gen Endo,3, and Gordon Cheng, Computational Brain Project, ICORP, JST ATR Computational Neuroscience Laboratories

More information

Computing Optimized Nonlinear Sliding Surfaces

Computing Optimized Nonlinear Sliding Surfaces Computing Optimized Nonlinear Sliding Surfaces Azad Ghaffari and Mohammad Javad Yazdanpanah Abstract In this paper, we have concentrated on real systems consisting of structural uncertainties and affected

More information

Physics 351 Wednesday, April 22, 2015

Physics 351 Wednesday, April 22, 2015 Physics 351 Wednesday, April 22, 2015 HW13 due Friday. The last one! You read Taylor s Chapter 16 this week (waves, stress, strain, fluids), most of which is Phys 230 review. Next weekend, you ll read

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Lecture 11 Optimal Control

Lecture 11 Optimal Control Lecture 11 Optimal Control The Maximum Principle Revisited Examples Numerical methods/optimica Examples, Lab 3 Material Lecture slides, including material by J. Åkesson, Automatic Control LTH Glad & Ljung,

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Linearly-Solvable Stochastic Optimal Control Problems

Linearly-Solvable Stochastic Optimal Control Problems Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014

More information

1. Type your solutions. This homework is mainly a programming assignment.

1. Type your solutions. This homework is mainly a programming assignment. THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K S # 6 + 7 Ahmad F. Taha October 22, 2015 READ Homework Instructions: 1. Type your solutions. This homework

More information

HIGH-ORDER STATE FEEDBACK GAIN SENSITIVITY CALCULATIONS USING COMPUTATIONAL DIFFERENTIATION

HIGH-ORDER STATE FEEDBACK GAIN SENSITIVITY CALCULATIONS USING COMPUTATIONAL DIFFERENTIATION (Preprint) AAS 12-637 HIGH-ORDER STATE FEEDBACK GAIN SENSITIVITY CALCULATIONS USING COMPUTATIONAL DIFFERENTIATION Ahmad Bani Younes, James Turner, Manoranjan Majji, and John Junkins INTRODUCTION A nonlinear

More information

Calculus of Variations Summer Term 2015

Calculus of Variations Summer Term 2015 Calculus of Variations Summer Term 2015 Lecture 14 Universität des Saarlandes 24. Juni 2015 c Daria Apushkinskaya (UdS) Calculus of variations lecture 14 24. Juni 2015 1 / 20 Purpose of Lesson Purpose

More information

in Bounded Domains Ariane Trescases CMLA, ENS Cachan

in Bounded Domains Ariane Trescases CMLA, ENS Cachan CMLA, ENS Cachan Joint work with Yan GUO, Chanwoo KIM and Daniela TONON International Conference on Nonlinear Analysis: Boundary Phenomena for Evolutionnary PDE Academia Sinica December 21, 214 Outline

More information

Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems.

Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems. ES22 Lecture Notes #11 Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems. J = φ(x( ) + L(x,u,t)dt ; x= f(x,u,t)

More information

Robot Dynamics - Rotary Wing UAS: Control of a Quadrotor

Robot Dynamics - Rotary Wing UAS: Control of a Quadrotor Robot Dynamics Rotary Wing AS: Control of a Quadrotor 5-85- V Marco Hutter, Roland Siegwart and Thomas Stastny Robot Dynamics - Rotary Wing AS: Control of a Quadrotor 7..6 Contents Rotary Wing AS. Introduction

More information

Deterministic Optimal Control

Deterministic Optimal Control Online Appendix A Deterministic Optimal Control As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality. Albert Einstein (1879

More information

arxiv: v1 [q-fin.mf] 5 Jul 2016

arxiv: v1 [q-fin.mf] 5 Jul 2016 arxiv:607.037v [q-fin.mf] 5 Jul 206 Dynamic optimization and its relation to classical and quantum constrained systems. Mauricio Contreras, Rely Pellicer and Marcelo Villena. July 6, 206 We study the structure

More information