arxiv: v1 [cs.sy] 3 Sep 2015
|
|
- Edgar Neal
- 5 years ago
- Views:
Transcription
1 Model Based Reinforcement Learning with Final Time Horizon Optimization arxiv:9.86v [cs.sy] 3 Sep Wei Sun School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. wsun4@gatech.edu Evangelos Theodorou School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. evangelos.theodorou@ae.gatech.edu Panagiotis Tsiotras School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. tsiotras@gatech.edu Abstract We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous results in model based trajectory optimization. Our analysis shows that the proposed algorithm recovers the theoretical optimal solution on linear low dimensional problem. Finally we provide application results on nonlinear systems. Introduction Trajectory optimization is one of the most active areas of research in machine learning and control theory with a plethora of applications in robotics, autonomous systems and computational neuroscience. Among the different methodologies, Differential Dynamic Programming (DDP) is a model based reinforcement learning algorithm that relies on linear approximation of dynamics and quadratic approximations of cost functions along nominal trajectories. Even though there has been almost 4 years since the fundamental work by Jacobson and Mayne on Differential Dynamic Programming [], it is a fact that research on trajectory optimization and model based reinforcement learning is performed nowadays by having a main ingredient DDP. In the NIPS community, recently published state of art methods on trajectory optimization use DDP to perform guided policy search [] and data-efficient probabilistic trajectory optimization [3]. Earlier work on DDP includes minmax [4], control limited [], receding horizon [6, 7], and stochastic optimal control formulations [8, 9]. Despite all of this research on trajectory optimization using model based reinforcement learning methods such as DDP, there has not been any effort towards the development of model based trajectory optimization algorithms in which the time horizon is not a-priori specified. The time horizon is one of the important free tuning parameters in trajectory optimization algorithms and in most case is manually tuned based on the experience of the engineer. In this paper we present a new algorithm on model based reinforcement learning in which optimization is performed with respect to control and the time horizon. While free time horizon DDP has been initially derived by Jacobson and Mayne in [], the resulting algorithm is not implementable
2 and it relies on the assumption that the initialization of the algorithm starts close to the optimal control solution. This will become more clear in the next section as we present our analysis on free final time model based trajectory optimization. Problem Formulation and Analysis We consider model based reinforcement learning problems in which optimization occurs with respect to control and time horizon. In mathematical terms these problems are formulated as follows: [ tf ] V(x(t ),t ;ν,t f ) = minj(x( ),u( )) = min Φ(x(t f ),ν,t f )+ L(x(t),u(t),t), () u( ) u( ) t where the term Φ(x(t f ),ν,t f ) is defined as Φ(x(t f ),ν,t f ) = φ(x(t f ),t f ) + ν T ψ(x(t f ),t f ), φ(x(t f ),t f ) is the terminal cost, ψ(x(t f ),t f ) is the terminal constraint and ν is the corresponding Lagrange mulitplier. L(x(t),u(t),t) is the running cost accumulated along the time horizont f, which is not specified a-priori. The cost function J(x( ), u( )) in () is minimized subject to the dynamics: dx(t) = F(x(t),u(t),t), x = x(t ). () where x R n is the state and the u R m is the control of the dynamics. Note that the value functionv(x(t ),t ;ν,t f ) is now a function of the Lagrange multiplierν and the terminal timet f. This is important for the derivation of the free time horizon algorithm since expansions of the value function are computed not only with respect to nominal controls and state trajectories but also with respect to nominal ν and t f.. Derivation of Differential Dynamic Programming (DDP) with Free Final Time Our analysis and derivation of the free time horizon model based reinforcement learning is in continuous time. As it is shown, a set of backward ordinary differential equations is derived that back propagates the value function along the nominal trajectory. In particular, given a nominal trajectory ( x( ),ū( )) with nominal Lagrange multiplier ν and terminal time t f, we start our analysis with the linearization of the dynamics as follows: dx(t) = F( x(t)+δx(t),ū(t)+δu(t),t) dδx(t) = F x ( x(t),ū(t),t)δx(t)+f u ( x(t),ū(t),t)δu(t). (3) All the quantities in the derivation later are evaluated at( x(t),ū(t), ν, t f ) unless otherwise specified. Since our derivation is in continuous time, we consider the corresponding Hamilton-Jacobi-Bellman equation: V(x(t),t;ν,t f) t [ ] = min H(x(t),t;ν,t f ), (4) u(t) under the terminal conditionv(x(t f ),t f ;ν,t f ) = Φ(x(t f ),ν,t f ), and with the Hamiltonian functionh(x(t),t;ν,t f ) defined as follows: H(x(t),t;ν,t f ) = L(x(t),u(t),t)+V x (x(t),t;ν,t f ) T F(x(t),u(t),t). () We take expansions of the terms on both sides of 4 around ( x,ū, ν, t f ). Notice that this is in contrast with the derivation of free final time DDP in [] in which the expansion takes place around (x,u ). Hence, the key assumption in [] is thatū is close to the optimal controlu, which makes the algorithm hard to implement, especially when the optimal t f is not known a-priori. Moreover, expansion of the Hamiltonian H around u yields H u u=u =, which results in dropping terms from the derivation.
3 The left-hand side of (4) can be expanded as t V( x(t)+δx(t),t; ν +δν, t f + ) ( ) V( x(t),t; ν, t f )+V T x t δx(t)+vt ν δν +V t f + ( ] [δx(t) T δν T xx V xν V xtf δx(t) ) V νx V νν V νtf δν. t V tf x V tf ν V tf t f (6) Next we make use of the fact that d ( ) = t ( )+ x ( )T F( x(t),ū(t),t) t ( ) = d ( )+ x ( )T F( x(t),ū(t),t). (7) Based on the equation above we have that t V( x(t)+δx(t),t; ν +δν, t f + ) = d ( ) V( x(t),t; ν, t f )+V T xδx(t)+v T ν δν +V tf d ( ] [δx(t) T δν T xx V xν V xtf δx(t) ) V νx V νν V νtf δν V tf x V tf ν V tf t f (8) +V T x F V +δx(t)t xx F +δν T V νx F + V tf xf + ] [δx(t) T δν T xxxf V xνx F V xtf xf δx(t) V νxx F V ννx F V νtf xf δν. V tf xxf V tf νxf V tf t f xf The next step is to work with the expansion of the right-hand side of the HJB equation in (4). In particular, we have that V x ( x(t)+δx(t),t; ν +δν, t f + ) V x ( x(t),t; ν, t f )+V xx δx(t)+v xν δν +V xtf + ] [δx(t) T δν T xxx V xxν V xxtf δx(t) V xνx V xνν V xνtf δν. (9) V xtf x V xtf ν V xtf t f In addition, the running cost and the dynamics are expanded as follows: L(x(t),u(t),t) = L( x(t)+δx(t),ū(t)+δu(t),t) L( x(t),ū(t),t)+l T x δx(t)+lt u δu(t) + [ ][ ][ ] δx(t) T δu(t) T Lxx L xu δx(t), () L ux L uu δu(t) F(x(t),u(t),t) = F( x(t)+δx(t),ū(t)+δu(t),t) F( x(t),ū(t),t)+f x δx(t)+f u δu(t). () Therefore, the right hand side of (4) can be expressed as { min L( x(t),ū(t),t)+l T xδx(t)+l T uδu(t)+ [ ][ ][ ] δx(t) T δu(t) T Lxx L xu δx(t) δu(t) L ux L uu δu(t) + V T x F +VT x F xδx(t)+v T x F uδu(t)+ δx(t) T V xx F +δx(t) T V xx F x δx(t)+δx(t) T V xx F u δu(t) + δν T V νx F +δν T V νx F x δx(t)+δν T V νx F u δu(t)+ V tf xf + V tf xf x δx(t)+ V tf xf u δu(t) + ] [δx(t) T δν T xxxf V xνx F V xtf xf δx(t) } V νxx F V ννx F V νtf xf δν +H.O.T.. () V tf xxf V tf νxf V tf t f xf 3
4 Equating (8) with () and cancel like terms, we get d ( V +V T xδx(t)+v T ν δν +V tf + ] [δx(t) T δν T xx V xν V xtf δx(t) ) V νx V νν V νtf δν = V tf x V tf ν V tf t f { min L+L T xδx(t)+l T uδu(t)+ [ ][ ][ ] δx(t) T δu(t) T Lxx L xu δx(t) +V T δu(t) L ux L uu δu(t) xf x δx(t)+vxf T u δu(t) +δx(t) T V xx F x δx(t)+δx(t) T V xx F u δu(t)+δν T V νx F x δx(t)+δν T V νx F u δu(t) } + V tf xf x δx(t)+ V tf xf u δu(t). (3) To find the δu(t) that minimize the equation, we take derivative of the right hand side of (3) and set it to, = L u +L uu δu(t)+( L ux + LT xu +F T uv xx )δx(t)+f T uv x +F T uv xν δν +F T uv xtf. The update law for the control is thus given by (4) δu(t) = l(t)+k x (t)δx(t)+k ν (t)δν +K tf (t). () where the termsl(t),k x (t),k ν (t) andk tf (t) are defined as follows l(t) = L uu (L u +F T u V x), K x (t) = L uu ( L ux + LT xu +F T u V xx), K ν (t) = L uu F T u V xν, K tf (t) = L uu F T u V xt f. (6) Note that L uu is guaranteed to be invertible if the running cost L = g(x) + u T Ru, where R >. This type of cost is normal for a mechanical system where we would like to minimize the energy cost of the control. Substitution of the optimal policy variationδu back to the HJB equation results in a set of backward ordinary differential equations that propagate the expansion of the value function which consists of the terms V,V ν,v xx,v νν,v xν,v xtf and V νtf. These backward differential equations are given as follows d V = L lt L uu l, d V x = L x K T xl uu l+f T xv x, d V ν = K T νl u, d V t f = K T t f L u, d V xx = L xx K T x L uuk x +F T x V xx, d V νν = K T ν L uuk ν, (7) d V t f t f = K T t f L uu K tf, d V xν = L xu K ν +F T x V xν +V xx F u K ν, d V xt f = L xu K tf +F T x V xt f +V xx F u K tf, d V νt f = K T νf T uv xtf, where all the quantities are evaluated at ( x(t),ū(t), ν, t f ). To numerically solve the equations in (7) one has to compute the terminal conditions. In the next section we present the derivation for the terminal condition and provide an overview of the algorithm. 4
5 . Terminal Conditions The terminal conditions can be determined by the following procedure. From V(x(t ),t ;ν,t f ) = minj(x( ),u( )) = min u( ) u( ) we have that for anyt [t,t f ], V(x(t),t;ν,t f ) = minj(x( ),u( )) = min u( ) u( ) Therefore, =min u( ) { tf { tf t } L(x(t),u(t))+Φ(x(t f ),ν,t f ), (8) t } L(x(s),u(s))ds+Φ(x(t f ),ν,t f ). (9) V( x( t f )+δx( t f ), t f ; ν +δν, t f + ) { t f + } L(x(t),u(t),t)+Φ(x( t f + ), ν +δν, t f + ) t f L( x( t f ),ū( t f ),t) +Φ( x( t f )+δx( t f )+ x( t f ), ν +δν, t f + ) L( x( t f ),ū( t f ),t) +Φ( x( t f ), ν, t f )+Φ T x (δx( t f )+F( x( t f ),ū( t f ), t f ) ) () +Φ ν ( x( t f ),ū( t f ), t f ) T δν +Φ tf ( x( t f ),ū( t f ), t f ) [ ] T + δx+fδtf δν Φ [ ] xx Φ xν Φ xtf δx+fδtf Φ νx Φ νν Φ νtf δν Φ tf x Φ tf ν Φ tf t f () V( x( t f )+δx( t f ), t f ; ν +δν, t f + ) = Φ+Φ T xδx+φ T νδν +(L+Φ T xf +Φ tf ) + ] Φ xx Φ xν Φ xtf +Φ xx F [ ] δx(t) [δx(t) T δν T Φ νx Φ νν Φ νtf +Φ νx F δν, Φ tf x +F T Φ xx Φ tf ν +F T Φ xν Φ tf t f +Φ tf xf +F T Φ xx F () wherex( t f + ) is evaluated by x( t f )+δx( t f )+ x( t f ) and x( t f ) = F( x( t f ),ū( t f ), t f ). The arguments of the functions in the last line of equations are the same as those in the previous equations and are thus omitted. The minimization with respect to u is dropped on the third line of equations because t f + t f L(x(t),u(t)) is evaluated byl( x( t f ),ū( t f )) and the latter is only a function of the nominal control. Note that this approximation is relatively rough, since we approximate L(x(t),u(t)) byl( x( t f ),ū( t f )) instead ofl(x( t f ),u( t f )) and follow up with an t f + t f expansion on x( t f ) and u( t f ). But the simulation results suggest that such level of approximation is good enough. Hence, att = t f, the terminal conditions are V( x( t f ), t f ; ν, t f ) = Φ( x( t f ), ν, t f ), V x ( x( t f ), t f ; ν, t f ) = Φ x ( x( t f ), ν, t f ), V ν ( x( t f ), t f ; ν, t f ) = Φ ν ( x( t f ), ν, t f ), V tf ( x( t f ), t f ; ν, t f ) = L( x( t f ),ū( t f ))+Φ x ( x( t f ), ν, t f ) T F( x( t f ),ū( t f ))+Φ tf ( x( t f ), ν, t f ), V xx ( x( t f ), t f ; ν, t f ) = Φ xx ( x( t f ), ν, t f ), V νν ( x( t f ), t f ; ν, t f ) = Φ νν ( x( t f ), ν, t f ), V tf t f ( x( t f ), t f ; ν, t f ) = Φ tf t f ( x( t f ), ν, t f )+Φ tf x( x( t f ), ν, t f )F( x( t f ),ū( t f )) +F( x( t f ),ū( t f )) T Φ xx ( x( t f ), ν, t f )F( x( t f ),ū( t f )), V xν ( x( t f ), t f ; ν, t f ) = Φ xν ( x( t f ), ν, t f ), V xtf ( x( t f ), t f ; ν, t f ) = Φ xtf ( x( t f ), ν, t f )+Φ xx ( x( t f ), ν, t f )F( x( t f ),ū( t f )), V νtf ( x( t f ), t f ; ν, t f ) = Φ νtf ( x( t f ), ν, t f )+Φ νx ( x( t f ), ν, t f )F( x( t f ),ū( t f )). (3)
6 Given the boundary conditions of the value function and its derivatives, we can back-propagate the differential equations we derived earlier to find their values. Our next step is to update the control through (), and in order to do so, we need to find the update law ofδν and. We follow the derivation in [] and set [ ] [ ] [ ] δν Vνν ( x(t = ζ ),t ; ν, t f ) V νtf ( x(t ),t ; ν, t f ) Vν ( x( t f ), t f ; ν, t f ), (4) V tf ν( x(t ),t ; ν, t f ) V tf t f ( x(t ),t ; ν, t f ) V tf ( x( t f ), t f ; ν, t f ) whereζ [,] is introduced to ensure that the update ofδν and are not too large. 3 Simulation Results 3. Double Integrator We first apply the algorithm on a simple system, namely, the double integrator. We compare our numerical result with the analytical solution to verify the algorithm. The dynamics is given by ẋ = x, and ẋ = u. Initial condition is x() = [x ();x ()] = [;]. The cost J = t f ( + Ru ). Terminal constraint is x =. Introducing the Lagrange multiplier ν, the cost can be reformulated as J = ν(x ) + t f ( + Ru ). Given different values of R, we can find different optimal cost and terminal time. In particular, let R =.,,, the corresponding terminal times are.89,.46,.9, respectively. Optimal control an f per iteration whenr =.,, are shown in Figure. Now we solve this problem analytically to verify the simulation results. Denote the co-states by λ = [λ ;λ ], the Hamiltonian is given by The co-states satisfy the adjoint equations H = + Ru +λ x +λ u. () λ = H =, (6) x λ = H = λ. (7) x Utilizing the Pontryagin s minimum principle, the optimal control u can be calculated from = H u = Ru + λ. Hence, u = λ /R. Transversality conditions are such that λ (t f ) = ν, λ (t f ) =, H(t f ) =. Given the previous information, we are ready to solve the problem. From λ = andλ (t f ) = ν, we get λ (t) ν,t [,t f ]. Then from λ = λ andλ (t f ) =, we haveλ (t) = ν(t f t). Therefore, u = λ /R = ν R (t t f). (8) Note that the optimal control is a linear function of t. Furthermore, boundary conditions yields t f = (9 R) 4 and ν = 3 t f = 3 (9 R) 4. When R =.,,, t f =.89,.46,.9, respectively, which is consistent with the numerical simulation results presented in the control plots in Figure. 3. Cart Pole In this subsection, we apply our algorithm on the inverted pendulum on a cart, as known as the cart pole problem, with M = the mass of the cart, m = and l =. are the mass and length of the pendulum, g = 9.8 the gravitational acceleration and u the force applied to the cart. The state x = [x,ẋ,θ, θ]. The goal is to bring the state from x() = [,,π,] to p = [,,,], which represents the case where the pendulum is pointing strait up. The cost function is given by J = tf [c t +(x p) T Q(x p)+u T R u u]+λ T ([x 3 (t f );x 4 (t f )] [p 3 ;p 4 ]), 6
7 Control t f per iteration. Control (a) R=. t per iteration f (b) R=. Control (c) R= t f per iteration (d) R=... 3 (e) R= (f) R= Figure : Figure a, c, e show the numerical optimal control for the cases when R =.,,, respectively. Figure b, d, f present the t f per iteration for the cases when R =.,,, respectively. Dashed red lines represents the according analytical optimalt f. where Q = diag{,,,} and R u =.. Initial values are given as t f =, λ = [,] T. The multipliers γ =. and ǫ =.. We run the algorithm for 3 iterations and the convergence is achieved at around th iteration. Figure 3a presents the optimal control u. The corresponding optimal trajectories of the states are depicted in Figure. Cost and t f per iteration are shown in Figure 3b and 3c, respectively. x ẋ 4 θ θ Figure : Optimal trajectories of the states in blue. Red lines represent the desired terminal states. Control.. (a) Optimal control of the cart pole system. Cost per iteration 3 3 (b) Cost per iteration tf per iteration. 3 (c)t f per iteration. Figure 3: Cost an f per iteration for the cart pole system. 7
8 3.3 Quadrotor The dynamic model of the quadrotor includes 6 states: 3 for the position (r = (x,y,z) T ), 3 for the Euler angles (Φ = (φ,θ,ψ) T ), 3 for the velocity (ṙ = (ẋ,ẏ,ż) T ), 3 for the body angular rates ( Φ = (p,q,r) T ) and 4 for the motor speeds (Ω = (ω,ω,ω 3,ω 4 ) T ). The corresponding dynamics of the quadrotor is given as follows: dx = f(x)+gu, (9) where x = [r,φ,ṙ, Φ,Ω] T R 6, and u = (u,u,u 3,u 4 ) T R 4 is the control vector, where u represents the thrust force, and u,u 3,u 4 represent the pitching, rolling, yawing moments, respectively. The corresponding cost function is defined as J = tf [c t + (x p) T Q(x p) + u T R u u] + (x(t f) p) T Q f (x(t f ) p) + λ T ([x (t f );...;x 6 (t f )] [p ;...;p 6 ]), where p = [p ;...;p 6 ] R 6 denotes the desired terminal states. In the simulation, we set { 7,i =,,3;,i = 3; 6,i = 4,...,9; p(i) = and Q f (i,i) =, otherwise, (3),i =,,;, otherwise, and all the off-diagonal terms are assigned to. Q =.Q f. R u =.I. γ =. and ǫ =.. The desired terminal state p is chosen for the quadrotor to execute the take-off maneuver. iterations are included to ensure the convergence and the cost per iteration is presented in Figure b. The corresponding optimal state trajectories are shown in Figure 4. Optimal controlu is illustrated in Figure a. t f per iteration is presented in Figure c. x x 4 x x 4 x3 x4 x 4. x x 4 x6 x 8 x7 x 4 x8 x 4 x9 x x 4 x x 4 x x 9 x3 x4 x x Figure 4: Optimal trajectories of the states of the quadrotor in blue. Dashed red lines represent the desired terminal states. Acknowledgments References [] D.H. Jacobson and D.Q Mayne. Differential dynamic programming. Elsevier Sci. Publ., 97. [] Sergey Levine and Pieter Abbeel. Learning neural network policies with guided policy search under unknown dynamics. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. 8
9 Control x 4 4 Control 7. 8 x 4 7 Cost per iteration. t f per iteration Control3 3 x Control4 3 (a) Optimal controls (b) Cost per iteration.. (c)t f per iteration. Figure : Cost an f per iteration for the quadrotor. Weinberger, editors, Advances in Neural Information Processing Systems 7, pages Curran Associates, Inc., 4. [3] Y. Pan and E. Theodorou. Probabilistic differential dynamic programming. In Advances in Neural Information Processing Systems (NIPS), pages 97 9, 4. [4] J. Morimoto, G. Zeglin, and C. G Atkeson. Minimax differential dynamic programming: Application to a biped walking robot. In Proceedings of 3 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 3)., volume, pages IEEE, 3. [] Y. Tassa, N. Mansard, and E. Todorov. Control-limited differential dynamic programming. In 4 IEEE International Conference on Robotics and Automation (ICRA),, pages IEEE, 4. [6] Y. Tassa, T. Erez, and W. D. Smart. Receding horizon differential dynamic programming. In NIPS, 7. [7] P. Abbeel, A. Coates, M. Quigley, and A. Y Ng. An application of reinforcement learning to aerobatic helicopter flight. Advances in Neural Information Processing Systems (NIPS), 9:, 7. [8] E. Todorov and W. Li. A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems. In American Control Conference,, pages IEEE,. [9] E. Theodorou, Y. Tassa, and E. Todorov. Stochastic differential dynamic programming. In American Control Conference (ACC),, pages 3. IEEE,. 9
Game Theoretic Continuous Time Differential Dynamic Programming
Game heoretic Continuous ime Differential Dynamic Programming Wei Sun, Evangelos A. heodorou and Panagiotis siotras 3 Abstract In this work, we derive a Game heoretic Differential Dynamic Programming G-DDP
More informationData-Driven Differential Dynamic Programming Using Gaussian Processes
5 American Control Conference Palmer House Hilton July -3, 5. Chicago, IL, USA Data-Driven Differential Dynamic Programming Using Gaussian Processes Yunpeng Pan and Evangelos A. Theodorou Abstract We present
More informationProbabilistic Differential Dynamic Programming
Probabilistic Differential Dynamic Programming Yunpeng Pan and Evangelos A. Theodorou Daniel Guggenheim School of Aerospace Engineering Institute for Robotics and Intelligent Machines Georgia Institute
More informationOptimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.
Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation
More informationReal Time Stochastic Control and Decision Making: From theory to algorithms and applications
Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic
More informationStochastic Differential Dynamic Programming
Stochastic Differential Dynamic Programming Evangelos heodorou, Yuval assa & Emo odorov Abstract Although there has been a significant amount of work in the area of stochastic optimal control theory towards
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationarxiv: v1 [cs.ro] 15 May 2017
GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems Gilwoo Lee Siddhartha S. Srinivasa Matthew T. Mason arxiv:1705.05344v1 [cs.ro] 15 May 2017 Abstract As we aim to control
More informationRobust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion
Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion J. Morimoto and C. Atkeson Human Information Science Laboratories, ATR International, Department 3 2-2-2
More informationMinimax Differential Dynamic Programming: An Application to Robust Biped Walking
Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Jun Morimoto Human Information Science Labs, Department 3, ATR International Keihanna Science City, Kyoto, JAPAN, 619-0288
More informationSuboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1)
Suboptimality of minmax MPC Seungho Lee In this paper, we consider particular case of Model Predictive Control (MPC) when the problem that needs to be solved in each sample time is the form of min max
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationEN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018
EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal
More informationDeterministic Dynamic Programming
Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0
More informationDifferential Dynamic Programming with Nonlinear Constraints
Differential Dynamic Programming with Nonlinear Constraints Zhaoming Xie 1 C. Karen Liu 2 Kris Hauser 3 Abstract Differential dynamic programming (DDP) is a widely used trajectory optimization technique
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationStochastic and Adaptive Optimal Control
Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationOptimal Control Theory
Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal
More informationPrinciples of Optimal Control Spring 2008
MIT OpenCourseWare http://ocw.mit.edu 16.323 Principles of Optimal Control Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 16.323 Lecture
More informationDerivative-Free Trajectory Optimization with Unscented Dynamic Programming
Derivative-Free Trajectory Optimization with Unscented Dynamic Programming Zachary Manchester and Scott Kuindersma Abstract Trajectory optimization algorithms are a core technology behind many modern nonlinear
More informationFeedback Optimal Control of Low-thrust Orbit Transfer in Central Gravity Field
Vol. 4, No. 4, 23 Feedback Optimal Control of Low-thrust Orbit Transfer in Central Gravity Field Ashraf H. Owis Department of Astronomy, Space and Meteorology, Faculty of Science, Cairo University Department
More informationPontryagin s maximum principle
Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin
More informationVariable Impedance Control A Reinforcement Learning Approach
Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness
More informationInverse Optimality Design for Biological Movement Systems
Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google
More informationOptimal Control, Trajectory Optimization, Learning Dynamics
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Optimal Control, Trajectory Optimization, Learning Dynamics Katerina Fragkiadaki So far.. Most Reinforcement Learning
More informationLinear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationUnderactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832
Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832 Russ Tedrake Massachusetts Institute of Technology c Russ Tedrake, 2009 2 c Russ Tedrake,
More informationRobotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:
Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the
More informationCHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67
CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Chapter2 p. 1/67 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Main Purpose: Introduce the maximum principle as a necessary condition to be satisfied by any optimal
More informationQuadrotor Modeling and Control
16-311 Introduction to Robotics Guest Lecture on Aerial Robotics Quadrotor Modeling and Control Nathan Michael February 05, 2014 Lecture Outline Modeling: Dynamic model from first principles Propeller
More informationPontryagin s Minimum Principle 1
ECE 680 Fall 2013 Pontryagin s Minimum Principle 1 In this handout, we provide a derivation of the minimum principle of Pontryagin, which is a generalization of the Euler-Lagrange equations that also includes
More informationLearning a Low-Level Motor Controller for UAVs
Learning a Low-Level Motor Controller for UAVs Joseph Lorenzetti Abstract Many control algorithms for Unmanned Aerial Vehicles (UAVs) have been proven to be effective for standard flight tasks under nominal
More informationOptimization-Based Control. Richard M. Murray Control and Dynamical Systems California Institute of Technology
Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology Version v2.1b (2 Oct 218) c California Institute of Technology All rights reserved. This manuscript
More informationNeural Networks and Differential Dynamic Programming for Reinforcement Learning Problems
Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems Akihiko Yamaguchi 1 and Christopher G. Atkeson 1 Abstract We explore a model-based approach to reinforcement learning
More informationSolution of Stochastic Optimal Control Problems and Financial Applications
Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty
More informationarxiv: v2 [cs.lg] 16 Feb 2016
Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search Tianhao Zhang, Gregory Kahn, Sergey Levine, Pieter Abbeel arxiv:1509.06791v2 [cs.lg] 16 Feb 2016 Abstract Model
More informationNumerical Methods for Constrained Optimal Control Problems
Numerical Methods for Constrained Optimal Control Problems Hartono Hartono A Thesis submitted for the degree of Doctor of Philosophy School of Mathematics and Statistics University of Western Australia
More informationIterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems
Submitted to the st International Conference on Informatics in Control, Automation and Robotics Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems Weiwei Li Department
More informationNumerical Optimal Control Overview. Moritz Diehl
Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path constraints h(x, u) 0 initial value x0 states x(t) terminal constraint r(x(t )) 0 controls u(t) 0 t T minimize
More informationAdaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees
Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Pontus Giselsson Department of Automatic Control LTH Lund University Box 118, SE-221 00 Lund, Sweden pontusg@control.lth.se
More informationOptimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:
Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic
More informationELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems
ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight
More informationTwo-Point Boundary Value Problem and Optimal Feedback Control based on Differential Algebra
Two-Point Boundary Value Problem and Optimal Feedback Control based on Differential Algebra Politecnico di Milano Department of Aerospace Engineering Milan, Italy Taylor Methods and Computer Assisted Proofs
More informationHamilton-Jacobi-Bellman Equation Feb 25, 2008
Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS
More informationVariance Reduction for Policy Gradient Methods. March 13, 2017
Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits
More informationChapter 2 Optimal Control Problem
Chapter 2 Optimal Control Problem Optimal control of any process can be achieved either in open or closed loop. In the following two chapters we concentrate mainly on the first class. The first chapter
More informationEVALUATING MODEL FREE POLICY OPTIMIZATION STRATEGIES FOR NON LINEAR SYSTEMS
EVALUATING MODEL FREE POLICY OPTIMIZATION STRATEGIES FOR NON LINEAR SYSTEMS BY ADITYA H CHUKKA A thesis submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey in partial
More informationmoments of inertia at the center of gravity (C.G.) of the first and second pendulums are I 1 and I 2, respectively. Fig. 1. Double inverted pendulum T
Real-Time Swing-up of Double Inverted Pendulum by Nonlinear Model Predictive Control Pathompong Jaiwat 1 and Toshiyuki Ohtsuka 2 Abstract In this study, the swing-up of a double inverted pendulum is controlled
More informationRunga-Kutta Schemes. Exact evolution over a small time step: Expand both sides in a small time increment: d(δt) F (x(t + δt),t+ δt) Ft + FF x ( t)
Runga-Kutta Schemes Exact evolution over a small time step: x(t + t) =x(t)+ t 0 d(δt) F (x(t + δt),t+ δt) Expand both sides in a small time increment: x(t + t) =x(t)+x (t) t + 1 2 x (t)( t) 2 + 1 6 x (t)+
More informationRobotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning
Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &
More informationHomework Solution # 3
ECSE 644 Optimal Control Feb, 4 Due: Feb 17, 4 (Tuesday) Homework Solution # 3 1 (5%) Consider the discrete nonlinear control system in Homework # For the optimal control and trajectory that you have found
More informationMinimization with Equality Constraints
Path Constraints and Numerical Optimization Robert Stengel Optimal Control and Estimation, MAE 546, Princeton University, 2015 State and Control Equality Constraints Pseudoinverse State and Control Inequality
More informationOPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28
OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from
More informationSWEEP METHOD IN ANALYSIS OPTIMAL CONTROL FOR RENDEZ-VOUS PROBLEMS
J. Appl. Math. & Computing Vol. 23(2007), No. 1-2, pp. 243-256 Website: http://jamc.net SWEEP METHOD IN ANALYSIS OPTIMAL CONTROL FOR RENDEZ-VOUS PROBLEMS MIHAI POPESCU Abstract. This paper deals with determining
More informationRobotics. Control Theory. Marc Toussaint U Stuttgart
Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationLearning the optimal state-feedback using deep networks
Learning the optimal state-feedback using deep networks Carlos Sánchez-Sánchez and Dario Izzo Advanced Concepts Team European Space Agency Noordwijk, The Netherlands Email: carlos.sanchez@esa.int, dario.izzo@esa.int
More informationStochastic optimal control theory
Stochastic optimal control theory Bert Kappen SNN Radboud University Nijmegen the Netherlands July 5, 2008 Bert Kappen Introduction Optimal control theory: Optimize sum of a path cost and end cost. Result
More informationProper Orthogonal Decomposition. POD for PDE Constrained Optimization. Stefan Volkwein
Proper Orthogonal Decomposition for PDE Constrained Optimization Institute of Mathematics and Statistics, University of Constance Joined work with F. Diwoky, M. Hinze, D. Hömberg, M. Kahlbacher, E. Kammann,
More informationAdaptive Nonlinear Control Allocation of. Non-minimum Phase Uncertain Systems
2009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 10-12, 2009 ThA18.3 Adaptive Nonlinear Control Allocation of Non-minimum Phase Uncertain Systems Fang Liao, Kai-Yew Lum,
More informationDynamics and Control of Rotorcraft
Dynamics and Control of Rotorcraft Helicopter Aerodynamics and Dynamics Abhishek Department of Aerospace Engineering Indian Institute of Technology, Kanpur February 3, 2018 Overview Flight Dynamics Model
More informationRobot Control Basics CS 685
Robot Control Basics CS 685 Control basics Use some concepts from control theory to understand and learn how to control robots Control Theory general field studies control and understanding of behavior
More informationMinimum Angular Acceleration Control of Articulated Body Dynamics
Minimum Angular Acceleration Control of Articulated Body Dynamics Javier R. Movellan UCSD Abstract As robots find applications in daily life conditions it becomes important to develop controllers that
More informationHigh-Order Local Dynamic Programming
High-Order Local Dynamic Programming Yuval Tassa Interdisciplinary Center for Neural Computation Hebrew University, Jerusalem, Israel tassa@alice.nc.hui.ac.il Emanuel Todorov Applied Mathematics and Computer
More information2 Lyapunov Stability. x(0) x 0 < δ x(t) x 0 < ɛ
1 2 Lyapunov Stability Whereas I/O stability is concerned with the effect of inputs on outputs, Lyapunov stability deals with unforced systems: ẋ = f(x, t) (1) where x R n, t R +, and f : R n R + R n.
More informationOPTIMAL SPACECRAF1 ROTATIONAL MANEUVERS
STUDIES IN ASTRONAUTICS 3 OPTIMAL SPACECRAF1 ROTATIONAL MANEUVERS JOHNL.JUNKINS Texas A&M University, College Station, Texas, U.S.A. and JAMES D.TURNER Cambridge Research, Division of PRA, Inc., Cambridge,
More informationBayesian Decision Theory in Sensorimotor Control
Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced
More informationOn the stability of receding horizon control with a general terminal cost
On the stability of receding horizon control with a general terminal cost Ali Jadbabaie and John Hauser Abstract We study the stability and region of attraction properties of a family of receding horizon
More informationStability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy
Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Ali Heydari, Member, IEEE Abstract Adaptive optimal control using value iteration initiated from
More informationOptimization-Based Control
Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v2.1a, February 15, 21 c California Institute of Technology All rights reserved. This
More informationTheory and Implementation of Biomimetic Motor Controllers
Theory and Implementation of Biomimetic Motor Controllers Thesis submitted for the degree of Doctor of Philosophy by Yuval Tassa Submitted to the Senate of the Hebrew University of Jerusalem February 2011
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement
More informationOptimal control of nonlinear systems with input constraints using linear time varying approximations
ISSN 1392-5113 Nonlinear Analysis: Modelling and Control, 216, Vol. 21, No. 3, 4 412 http://dx.doi.org/1.15388/na.216.3.7 Optimal control of nonlinear systems with input constraints using linear time varying
More informationLeveraging Dynamical System Theory to Incorporate Design Constraints in Multidisciplinary Design
Leveraging Dynamical System Theory to Incorporate Design Constraints in Multidisciplinary Design Bradley A. Steinfeldt and Robert D. Braun Georgia Institute of Technology, Atlanta, GA 3332-15 This work
More informationParametric Integrated Perturbation Analysis - Sequential Quadratic Programming Approach for Minimum-time Model Predictive Control
Preprints of the 9th World Congress The International Federation of Automatic Control Cape Town, South Africa. August 4-9, 4 Parametric Integrated Perturbation Analysis - Sequential Quadratic Programming
More informationHJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011
Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance
More informationTHE HAMILTON-JACOBI THEORY FOR SOLVING OPTIMAL FEEDBACK CONTROL PROBLEMS WITH GENERAL BOUNDARY CONDITIONS
THE HAMILTON-JACOBI THEORY FOR SOLVING OPTIMAL FEEDBACK CONTROL PROBLEMS WITH GENERAL BOUNDARY CONDITIONS by Chandeok Park A dissertation submitted in partial fulfillment of the requirements for the degree
More informationwith Application to Autonomous Vehicles
Nonlinear with Application to Autonomous Vehicles (Ph.D. Candidate) C. Silvestre (Supervisor) P. Oliveira (Co-supervisor) Institute for s and Robotics Instituto Superior Técnico Portugal January 2010 Presentation
More informationEE C128 / ME C134 Feedback Control Systems
EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of
More informationCS491/691: Introduction to Aerial Robotics
CS491/691: Introduction to Aerial Robotics Topic: Midterm Preparation Dr. Kostas Alexis (CSE) Areas of Focus Coordinate system transformations (CST) MAV Dynamics (MAVD) Navigation Sensors (NS) State Estimation
More informationOptimal Control - Homework Exercise 3
Optimal Control - Homework Exercise 3 December 17, 2010 In this exercise two different problems will be considered, first the so called Zermelo problem the problem is to steer a boat in streaming water,
More informationPhysics 106a, Caltech 13 November, Lecture 13: Action, Hamilton-Jacobi Theory. Action-Angle Variables
Physics 06a, Caltech 3 November, 08 Lecture 3: Action, Hamilton-Jacobi Theory Starred sections are advanced topics for interest and future reference. The unstarred material will not be tested on the final
More informationNonlinear and Neural Network-based Control of a Small Four-Rotor Aerial Robot
Nonlinear and Neural Network-based Control of a Small Four-Rotor Aerial Robot Holger Voos Abstract Small four-rotor aerial robots, so called quadrotor UAVs, have an enormous potential for all kind of neararea
More informationA DYNAMIC GAME APPROACH TO TRAINING ROBUST
A DYNAMIC GAME APPROACH TO TRAINING ROBUST DEEP POLICIES. Anonymous authors Paper under double-blind review ABSTRACT We present a method for evaluating the sensitivity of deep reinforcement learning (RL)
More informationPoincaré-Map-Based Reinforcement Learning For Biped Walking
Poincaré-Map-Based Reinforcement Learning For Biped Walking Jun Morimoto,, Jun Nakanishi,, Gen Endo,3, and Gordon Cheng, Computational Brain Project, ICORP, JST ATR Computational Neuroscience Laboratories
More informationComputing Optimized Nonlinear Sliding Surfaces
Computing Optimized Nonlinear Sliding Surfaces Azad Ghaffari and Mohammad Javad Yazdanpanah Abstract In this paper, we have concentrated on real systems consisting of structural uncertainties and affected
More informationPhysics 351 Wednesday, April 22, 2015
Physics 351 Wednesday, April 22, 2015 HW13 due Friday. The last one! You read Taylor s Chapter 16 this week (waves, stress, strain, fluids), most of which is Phys 230 review. Next weekend, you ll read
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationLecture 11 Optimal Control
Lecture 11 Optimal Control The Maximum Principle Revisited Examples Numerical methods/optimica Examples, Lab 3 Material Lecture slides, including material by J. Åkesson, Automatic Control LTH Glad & Ljung,
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationLinearly-Solvable Stochastic Optimal Control Problems
Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014
More information1. Type your solutions. This homework is mainly a programming assignment.
THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K S # 6 + 7 Ahmad F. Taha October 22, 2015 READ Homework Instructions: 1. Type your solutions. This homework
More informationHIGH-ORDER STATE FEEDBACK GAIN SENSITIVITY CALCULATIONS USING COMPUTATIONAL DIFFERENTIATION
(Preprint) AAS 12-637 HIGH-ORDER STATE FEEDBACK GAIN SENSITIVITY CALCULATIONS USING COMPUTATIONAL DIFFERENTIATION Ahmad Bani Younes, James Turner, Manoranjan Majji, and John Junkins INTRODUCTION A nonlinear
More informationCalculus of Variations Summer Term 2015
Calculus of Variations Summer Term 2015 Lecture 14 Universität des Saarlandes 24. Juni 2015 c Daria Apushkinskaya (UdS) Calculus of variations lecture 14 24. Juni 2015 1 / 20 Purpose of Lesson Purpose
More informationin Bounded Domains Ariane Trescases CMLA, ENS Cachan
CMLA, ENS Cachan Joint work with Yan GUO, Chanwoo KIM and Daniela TONON International Conference on Nonlinear Analysis: Boundary Phenomena for Evolutionnary PDE Academia Sinica December 21, 214 Outline
More informationTheoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems.
ES22 Lecture Notes #11 Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems. J = φ(x( ) + L(x,u,t)dt ; x= f(x,u,t)
More informationRobot Dynamics - Rotary Wing UAS: Control of a Quadrotor
Robot Dynamics Rotary Wing AS: Control of a Quadrotor 5-85- V Marco Hutter, Roland Siegwart and Thomas Stastny Robot Dynamics - Rotary Wing AS: Control of a Quadrotor 7..6 Contents Rotary Wing AS. Introduction
More informationDeterministic Optimal Control
Online Appendix A Deterministic Optimal Control As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality. Albert Einstein (1879
More informationarxiv: v1 [q-fin.mf] 5 Jul 2016
arxiv:607.037v [q-fin.mf] 5 Jul 206 Dynamic optimization and its relation to classical and quantum constrained systems. Mauricio Contreras, Rely Pellicer and Marcelo Villena. July 6, 206 We study the structure
More information