Stochastic Differential Dynamic Programming

Size: px
Start display at page:

Download "Stochastic Differential Dynamic Programming"

Transcription

1 Stochastic Differential Dynamic Programming Evangelos heodorou, Yuval assa & Emo odorov Abstract Although there has been a significant amount of work in the area of stochastic optimal control theory towards the development of new algorithms, the problem of how to control a stochastic nonlinear system remains an open research topic. Recent iterative linear quadratic optimal control methods ilqg [], [] handle control and state multiplicative noise while they are derived based on first order approximation of dynamics. On the other hand, methods such as Differential Dynamic Programming expand the dynamics up to the second order but so far they can handle nonlinear systems with additive noise. In this work we present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state and control multiplicative process noise, and proceed to derive the second-order expansion of the cost-to-go. We find the correction terms that arise from the stochastic assumption. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP. I. INRODUCION Optimal Control describes the choice of actions that minimize future costs under the constraint of state space dynamics. In the continuous nonlinear case, local methods are one of the very few classes of algorithms which successfully solve general, high-dimensional Optimal Control problems. hese methods are based on the observation that optimal solutions form extremal trajectories, i.e. are solutions to a calculus-of-variations problem. Differential Dynamic Programming, or DDP, is a powerful local dynamic programming algorithm, which generates both open and closed loop control policies along a trajectory. he DDP algorithm, introduced in [3], computes a quadratic approximation of the cost-to-go and correspondingly, a local linear-feedback controller. he state space dynamics are also quadratically approximated around a trajectory. As in the Linear-Quadratic-Gaussian case, fixed additive noise has no effect on the controllers generated by DDP, and when described in the literature, the dynamics are usually assumed deterministic. Past work on nonlinear optimal control allows the use of DDP in problems with state and control constraints [4],[5]. In this work, state and control constraints are expanded up to the first order and the KK conditions are formulated resulting in a unconstrained quadratic optimization problem. E. heodorou is with the Computational Learning and Motor Control Lab, Departments of Computer Science and Neuroscience, University of Southern California etheodor@usc.edu Y. assa is with the Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, Israel tassa@alice.nc.huji.ac.il E. odorov is with the Department of Computer Science and Engineering and the Department of Applied Mathematics, University of Washington Seattle, WA, todorov@cs.washington.edu he application of DDP in real robotic high dimensional control tasks created the need for extensions that relax the condition for accurate models. In this vain, in [6] DDP is extended for the case of min - max formulation. he goal for this formulation is to make DDP robust against the model uncertainties and hybrid dynamics in a biped robotic locomotion task. In [7] implementation improvements regarding the evaluation of the value function allow the use of DDP in a receding horizon mode. In all the past work related to DDP, the optimal control problem is considered to be deterministic. While the impartiality to noise can be considered a feature if the noise is indeed fixed, in many cases varying noise covariance is an important feature of the problem, as with control-multiplicative noise which is common in biological systems []. his latter case was addressed within the iterative-lqg framework [], in which the optimal controller is derived based on the first order approximation of the dynamics and the second order approximation of the cost to go. However, for the iterative nonlinear optimal control algorithms such as DDP in which second order expansion of the dynamics is considered, the more general case of state and/or control multiplicative noise appears to have never been tackled. In this paper, we derive the DDP algorithm for state and control multiplicative process noise. We find that despite the potential of cubic and quartic terms, these cancel out, allowing us to maintain the quadratic form of the approximation. Moreover we show how the new generalized formulation of Stochastic Differential Dynamic Programming SDDP recovers the standard DDP deterministic solution as well as the special cases in which only state multiplicative or control multiplicative noise is considered. he remaining of this work is organized as follows: in the next section we provide the definition of the SDDP. In section III, the second order expansion of the cost to go is presented and in section IV the optimal controls are derived and the overall SDDP algorithm is presented. In addition in section IV we show how SDDP recovers the deterministic solution as well as the cases of only control multiplicative, only state multiplicative and only additive noise. Finally in sections V and VI simulation results and conclusions are discussed. II. SOCHASIC DIFFERENIAL DYNAMIC PROGRAMMING We consider the class of nonlinear stochastic optimal control problems with cost v π x, t = E [ hx + t 0 l τ, xτ, πτ, xτ dτ ]

2 subject to the stochastic dynamics of the form: dx = fx, udt + F x, udω where x R n is the state, u R p is the control and dω R m is brownian noise. he term hx in the cost function, is the terminal cost while the l τ, xτ, πτ, xτ is the instantaneous cost rate which is a function of the state x and control policy πτ, xτ. he cost-to - go v π x, t is defined as the expected cost accumulated over the time horizon t 0,, starting from the initial state x t to the final state x. o enhance the readability of our derivations we write the dynamics as a function Φ R n of the state, control and instantiation of the noise: Φx, u, dω fx, udt + F x, udω 3 It will sometimes be convenient to write the matrix F x, u R n m in terms of its rows or columns: Fr x, u [ ] F x, u =. = Fc x, u,..., Fc p x, u Fr n x, u Every element of the vector Φx, u, dω R n can now be expressed as: Φ j x, u, dω = f j x, uδt + F j r x, udω Given a nominal trajectory of states and controls x, ū we expand the dynamics around this trajectory to second order: Φ x + δx, ū +, dω = Φ x, ū, dω + x Φ δx + u Φ + Oδx,, dω where Oδx,, dω R n contains all the second order terms in the deviations in states, controls and noise. Writing this term element-wise: Oδx,, dω = O δx,, dω. O n δx,, dω, we can express the elements O j δx,, dω R as: O j δx,, dω = δx xx Φ j xu Φ j ux Φ j uu Φ j δx. We would now like to express the derivatives of Φ in terms of the given quantities. Beginning with the first-order terms, we find that: x Φ = x fx, uδt + x m u Φ = u fx, uδt + u m Not to be confused with big-o. F i c dω i t F i c dω i t Next we find the second order derivatives and we have that: xx Φ j = xx f j x, uδt + xx F r j x, udω t uu Φ j = uu f j x, uδt + uu F r j x, udω t ux Φ j = ux f j x, uδt + ux F r j x, udω t xu Φ j = ux Φ j After expanding the dynamics up to the second order we can transition from continuous to discrete time. More precisely the discrete-time dynamics are formulated as: δx t+δt = + I n n + x fx, uδt + x m u fx, uδt + u m F i c + F x, u δtξ t + O d δx,, ξ, δt F i c ξ i t δt δx t ξ i t δt t with δt = t k+ t k corresponding to a small discretization interval. Note that the term O d is the equivalent of O but in discrete time and therefore it is now a function of δt. In fact, since O d contains all the second order expansion terms of the dynamics it contains second order derivatives WR state and control expressed as follows: xx Φ j = xx f j x, uδt + xx F r j x, uξ t δt uu Φ j = uu f j x, uδt + uu F r j x, uξ t δt ux Φ j = ux f j x, uδt + ux F r j x, uξ t δt xu Φ j = ux Φ j he random variable ξ R m is zero mean and Gaussian distributed with covariance Σ = σ I m m he discretized dynamics can be written in a more compact form by grouping the state, control and noise dependent terms, and leaving the second order term separate: δx t+δt = A t δx t + B t t + Γ t ξ t + O d δx,, ξ, δt 4 where the matrices A t R n n, B t R n p and Γ t R n m are defined as A t = I n n + x fx, uδt B t = u fx, uδt Γ t = [ Γ Γ Γ m ] with Γ i R n defined Γ i = u F c i t + x F c i δx t + F c i. For the derivation of the optimal control it is useful to expresses Γ t as the summation of terms that depend on variations in state and controls and terms that are independent of such variations. More precisely we will have that:

3 Γ t = t δx, + F x, u 5 where each column vector of t is defined as δx, = u F c i t + x F c i δx t. i t III. VALUE FUNCION SECOND ORDER APPROXIMAION As in classical DDP, the derivation of stochastic DDP requires the second order expansion of the cost-to-go function around a nominal trajectory x: V x + δx = V x + V x δx + δx V xx δx 6 Substitution of the discretized dynamics 4 in the second order Value function expansion 6 results in: V x t+δt + δx t+δt = V x t+δt + V x A t δx t + B t t + Γ t ξ + O d + A t δx t + B t t + Γ t ξ + O d V xx A t δx t + B t t + Γ t ξ + O d 7 Next we will compute E V x t+δt + δx t+δt which requires the calculation of the expectation of the all the terms that appear in the equation above. his is what the rest of the analysis is dedicated to. More precisely in the next two sections we will calculate the expectation of the terms: and E V x δx t+δt E δx t+δtv xx δx t+δt where the state deviation δx t+δt at time instant t + δt is given by the linearized dynamics: 8 9 δx t+δt = A t δx t + B t t + Γ t ξ + O d 0 he analysis that follows in section III-A consist of the computation of the expectation of the four terms which result from the substitution of the linearized dynamics 0 into 8. In section III-B we compute the expectation of the 6 terms that result from the substitution of 0 into 9. A. Expectation of the first order term of the value function expansion x V δx t+δt. he expectation of the first order term results in: E V x A t δx t + B t t + Γ t ξ t + O d = V x A t δx t + B t t + E O d In order to find the expectation of O d R n we need to find the expectation of each one of the elements of this column vector. hus we will have that: E E O j δx,, ξ t, δt = δx xx Φ j xu Φ j ux Φ j uu Φ j = δt δx xx f j xu f j ux f j herefore we will have that: uu f j δx δx E V x δx t+δt = V x A t δx t + B t t + Õd Where the term Õd is defined as: Õ d δx,, δt = Õ δx,, δt Õ n δx,, δt = = Õj 3 4 he term x V Õ d is quadratic in variations in the states and controls δx, an thus there are the symmetric matrices F R n n, Z R m m and L R m n such that: V x Õd = δx Fδx + Z + Lδx 5 with F = xx f j V xj 6 Z = uu f j V xj 7 L = ux f j V xj 8 From the analysis above we can see that the expectation x V δx t+δt is a quadratic function with respect to variations in states and controls δx,. As we will prove in the next section the expectation of δx t+δt xxv δx t+δt is also a quadratic function of variations in states and controls δx,. B. Expectation of the second order term of the value function expansion δx t+δt xxv δx t+δt. In this section we compute all the terms that appear due to the expectation of the second approximation of the value function E δx t+δt xxv δx t+δt. he term δxt+δt is given by the stochastic dynamics in 0. Substitution of 0 results in 6 terms. o make our analysis clear we classify these 6 terms terms above into five classes. More precisely we will have that:

4 E δx t+δtv xxδx t+δt = E + E + E 3 + E 4 + E 5 9 where the terms E, E, E 3, E 4 and E 5 are defined as follows: E = E δx t A t V xx A t δx t + E t Bt V xx B t t +E δx t A t V xx B t t + E t Bt V xx A t δx t 0 E = E ξ t Γ t V xx A t δx + E ξ t Γ t V xx B t + E δx A t V xx Γ t ξ t + E Bt V xx Γ t ξ t + E ξ t Γ t V xx Γ t ξ t E 3 = E O d V xx Γ t ξ t + E ξ t Γ t V xx O d E 4 = E δx t A t V xx O d + E t B t V xx O d + E O d V xx B t t + E O d V xx A t δx t E 5 = E O d V xx O d 3 4 In the first category we have all these terms that depend neither on ξ t and nor on O d δx,, ξ t, δt. hese are the terms that define E. he second category E includes terms that depend on ξ t but not on O d δx,, ξ t, δt. In the third class E 3, there are terms that depends both on O d δx,, ξ t, δt and ξ t. In the fourth class E 4, we have terms that depend on O d δx,, ξ t, δt. Finally in the fifth class E 5, we have all these terms that depend on O d δx,, ξ t, δt quadratically. he expectation operator will cancel all the terms that include noise up the first order. Moreover, the mean operator for terms that depend on the noise quadratically will result in covariance. We compute the expectations of all the terms in the E class. More precisely we will have that: E δx t A t V xx A t δx t = δx t A t V xx A t δx t E t B t V xx B t t = t B t V xx B t t E δx t A t V xx B t t = δx t A t V xx B t t E t B t V xx A t δx t = t B t V xx A t δx t 5 We continue our analysis by calculating all the terms in the class E. More presicely we will have: E ξ t Γ t V xx A t δx = 0 E ξ t Γ t V xx B t = 0 E ξ t Γ t V xx A t δx = 0 E ξ t Γ t V xx B t = 0 6 he terms above are equal to zero since the brownian noise is zero mean. he expectation of the term that does not depend on O d δx,, ξ t, δt and it is quadratic with respect to the noise is given as follows: E ξ t Γ t V xx Γ t ξ t = trace Γ t V xx Γ t Σ ω 7 Since matrix Γ depends on variations in states and controls δx, we can further massage the expressions above so that it can be expressed as quadratic functions in δx,. trace Γ t V xx Γ t Σ ω = σ dω δt 8 Γ trace V xx Γ Γ m Γ m 9 = σdωδt Γ i V xx Γ i 30 he last equation is written in the form: trace Γ t V xx Γ t Σ ω = δx Fδx + δx L + Z + Ũ + δx S + γ 3 Where the terms F R n m, L R n p, Z R p p, Ũ R p, S R n and γ R are defined as follows: F = σ δt x F c i V xx x F c i 3 L = σ δt Z = σ δt Ũ = σ δt S = σ δt γ = σ δt x F c i V xx u F c i 33 u F c i V xx u F c i 34 u F c i V xx F c i 35 x F c i V xx F c i 36 F c i V xx F c i 37 For those terms that depend both on O d δx,, ξ t, δt and on the noise class E 3 we will have: E O d V xx Γ t ξ t = E trace Vxx Γ t ξ t O d = trace V xx Γ t E ξ t O d 38 By writing the term O d δx,, ξ t, δt in a matrix form and putting the noise vector insight the this matrix we have: E O d V xx Γ t ξ t = 39 trace V xx Γ t E [ ξ t O ξ t O n ]

5 Calculation of the expectation above requires to find the δtξt terms E O j more precisely we will have: E δtξt O j = 40 E δtξt δx Φ i xxδx + E δtξt Φ i uu + E δtξt Φ i uxδx We first calculate the term: E δtξt δx xx Φ i δx 4 = E δtξt δx xx f i δt + xx F r i ξ t δt δx = E δtξt δx xx f i δt δx + E δtξt δx xx F r i ξ t δt δx δtξt he term E δx xx f i δt δx = 0 since it depends linearly on the noise and E ξ t = 0. he second term δtξt E δx xx F r i ξ t δt δx depends quadratically in the noise and thus the expectation operator will result in the variance on the noise. We follow the analysis: E δtξt δx xx Φ i δx = 4 E δtξt δx xx F r i ξ t δt δx Since the ξ t = ξ,, ξ m F i,., F im we will have that: and F i r = E δtξt δx xx Φ i δx = 43 E δtξ t δx xx F ij ξ j δx E δtξ t δx xx F ij ξ j δx E δtξ t δx ξ j xx F ij δx By writing ξ t in vector form we have that: E δtξt δx xx Φ i δx = 44 ξ E δt δx ξ j xx F ij δx ξ m he term δx m ξj xx F ij δx is scalar and it can multiply each one of the elements of the noise vector. m δte ξ δx ξj xx F ij δx m δte ξ m δx ξj xx F ij δx 45 Since E ξ i ξ i = σ and E ξ i ξ j = 0 we can show that: E δtξt δx xx Φ i δx σ δt δx xx F r i δx δx xx F r im δx In a similar way we can show that: and = 46 E δtξt δx uu Φ i δx σ δt uu F r i uu F r im = 47 E δtξt xu Φ i δx σ δt ux F r i δx ux F r im δx = 48 Since we have calculated all the terms of expression 4 we can proceed with the computation of 38. According to the analysis above the term E O d V xxγ t ξ t can be written as follows: E O d V xx Γ t ξ t = 49 trace V xx Γ t M + N + G Where the matrices M R m n, N R m n and G R m n are defined as follows: M = 50 δx xx F σ r δx δx xx F r n δx δt δx xx F r m δx δx xx F r mn δx Similarly N = 5 δx xu F σ r, δx xu F r,n δt δx xu F r m, δx xu F r m,n

6 and G = 5 uu F σ r, uu F r,n δt uu F r m, uu F m,n Based on 5 the term Γ t depends on which is a function of the variations in states and control up to the th order. In addition the matrices M, N and G are also functions of the deviations in state and controls up to the th order. he product of with each one of the matrices M, N and G will result into 3th order terms that can be neglected. By neglecting these terms we can show that: E O d V xx Γ t ξ t = 53 = trace V xx + F M + N + G = trace V xx F M + N + G Each element i, j of the product C = V xx F can be expressed as C i,j = n V xx i,r F r,j where C R n p. Furthermore the element µ, ν of the product H = CM is formulated H µ,ν = n k= Cµ,k M k,ν with H R n n. hus, the term trace V xx F M can be now expressed as: trace V xx F M = = = H l,l 54 l= C l,k M k,l n r V xx k,r F r,l M k,l Since M k,l = δtσdω δx xx F k,l δx the vectors δtσdω δx and δx do not depend on k, l, r and they can be taken outside the sum. hus we can show that: trace V xx F M 55 n = V xx k,r F r,l σ δtδx xx F k,l δx = δx σ δt = δx Mδx n V xx k,r F r,l xx F k,l δx where M is a matrix of dimensionality M R n n and it is defined as: M = σ δt n V xx k,r F r,l xx F k,l 56 By following the same algebraic steps it can been shown that: trace V xx F N = δx Ñ 57 with Ñ matrix of dimensionality Ñ Rn p defined as: Ñ = σ δt and n V xx k,r F r,l xu F k,l 58 trace V xx F G = G 59 with G matrix of dimensionality Ñ Rp p defined as: n G = σ δt V xx k,r F r,l uu F k,l 60 hus the term E O d V xxγ t ξ t is formulated as: E O d V xx Γ t ξ t = δx Mδx + G + δx Ñ 6 Similarly we can show that: E ξ t Γ t V xx O d = δx Mδx + G + δx Ñ 6 Next we will find the expectation for all terms that depend on O d δx,, dω, δt and not on the noise. Consequently, we will have that: E δx t A t V xx O d = δx t A t V xx Õ d = 0 E t B t V xx O d = t B t V xx Õ d = 0 E O d V xx A t δx t = Õ d Vxx A t δx t = 0 E O d V xx B t t = Õ d Vxx B t t = 0 63 where the quantity Õd has been defined in 4. All the 4 terms above are equal to zero since they have variations in state and control of the order higher than and therefore they can be neglected. Finally we compute the terms of the 5th class and therefore we have the expression E 5 = E O d V xx O d 64 = E trace V xx O d O d = trace V xx E O d O d O O = trace V xx E O n O n he product O i O j is a function of variation in state and control of order 4 since each term O i is a function of

7 variation in states and control of order. Consequently, the term E 5 = E O d V xxo d is equal to zero. With the computation of the expectation of term that is quadratic WR O d we have calculated all the terms of the second order expansion of the cost to go function. In the next section we derive the optimal controls and we present the SDDP algorithm. Furthermore we show how SDDP recover the deterministic solution as well as the cases of only control multiplicative, only state multiplicative and only additive noise. IV. OPIMAL CONROLS In this section we provide the form of the optimal controls and we show how previous results are special cases of our generalized stochastic DDP formulation. Furthermore after we computed all the terms of expansion of the cost to go function V x t at statex t we show that its form remains quadratic WR variations in state δx t under the constraint of the nonlinear stochastic dynamics in. More precisely we have that: V x t+δt + δx t+δt = V x t+δt ABLE I PSEUDOCODE OF HE SDDP ALGORIHM Given: An immediate cost function lx, u A terminal cost term φ tn. he stochastic dynamics dx = fx, udt + F x, udω Repeat until convergence: Given a trajectory in states and controls x, ū find the quadratic approximations of the stochastic dynamics A t, B t, Γ t, O d and the quadratic approximation of the immediate cost function l o, l x, l xx, l uu, l ux around these trajectories. Compute all the terms Q x, Q u, Q xu and Q uu according to equation 68. Back-propagate the quadratic approximation of the value function based on the equations: V k+ 0 = V k+ 0 Q uq uuq u V x k+ = Q k+ x Q uq uuq ux V xx k+ = Q k+ xx Q xuq uuq ux Compute = Q uu Q u + Q uxδx Update controls u = u + γ Get the new optimal trajectory x by propagating the nonlinear dynamics dx = fx, u dt + F x, u dω. Set x = x and ū = u and repeat. + x V A t δx t + x V B t t + δx Fδx + Z + Lδx + δx t A t V xx A t δx t + t B t V xx B t t + δx t A t V xx B t t + t B t V xx A t δx t 65 + δx Fδx + δx L + Z + Ũ + δx S + γ + δx Mδx + G + δx Ñ he unmaximized state, action value function is defined as follows: Qx k, u k = lx k, u k + V x k+ 66 Given a trajectory in states and controls x, ū we can approximate the state action value function as follows: Q x + δx, ū + = Q 0 + Q u + δx Q x 67 [ δx ] [ ] [ ] Q xx Q xu δx Q ux Q uu By equating the coefficients with similar powers between the state action value function Qx k, u k and the immediate reward and cost to go lx k, u k and V x k+ respectively we can show that: Q x = l x + A t V x + S Q u = l u + A t V x + Ũ Q xx = l xx + A t V xx A t + F + F + M Q xu = l xu + A t V xu B t + L + L + Ñ Q uu = l uu + B t V uu B t + Z + Z + G 68 where we have assume a local quadratic approximation of the immediate reward lx k, u k according to the equation: l x + δx, ū + = l 0 + l u + δx l x 69 [ δx ] [ ] [ ] l xx l xu δx l ux l uu with l x = l x, l u = l u, l xx = l x, l uu = l u and l ux = l u x. he local variations in control that maximize the state action value function are expressed by the equation that follows: = argmax Q x + δx, ū + 70 u = Q uu Q u + Q ux δx he optimal control variations have the form = l + Lδx where l = Q uuq u is the open loop gain and L = Q uuq ux is the closed loop - feedback gain. he SDDP algorithm is provided in a pseudocode form on able I. For the special cases where the stochastic dynamics have only additive noise F u, x = F then the terms M, Ñ, G, F, L, Z, Ũ, S will be zero since they are functions of xx F and xu F and uu F and it holds that xx F = 0, xu F = 0 and uu F = 0. In such a type of systems the control does not depend on the statistical characteristics of the noise. In cases of deterministic systems again M, Ñ, G, F, L, Z, Ũ, S will be zero because these terms depend on the variance of the noise σ dωi = 0, i =,, m. Finally if the noise is only control depended then M, Ñ, L, F, S will be zero since xx F u = 0, xu F u = 0 and x F c i x = 0 while if it is state dependent then Ñ, G, Z, L, Ũ will be zero since xu F x = 0, uu F x = 0 and u F c i x = 0.

8 State rajectories! *+,-*./0-34,*-5 State rajectories 0.09 Cost 3.5 #! '! &! %! 0.8 $! ! ime Horizon!$!!!"# $ $"# % %"# & 0673/8*-69* ime Horizon Iterations Fig.. he left plot illustrates the state trajectory for the the system dx = αcosxdt + u + x dω. he right plot corresponds to the optimal control. Fig.. Left plot illustrates the state space trajectories of the system dx = αx dt + udt + x dω for different values of noise. he right plot shows the stereotypical convergence behavior of SDDP. V. SIMULAION RESULS In this section we are testing the SDDP on two different one dimensional systems. Most precisely, we consider the one-dimensional stochastic nonlinear system of the form: dx = cosxdt + udt + x dω 7 he quadratic approximation is based on the matrices A t = sinxdt, B t = dt and O dt = cosxdt. he system has only state depended noise and therefore M = σ dtv xx x, F = 4σ dtv xx x, S = x 3 V xx σ dt while the terms Ñ = G = L = Z = Ũ = 0. We apply the SDDP algorithm for the task of bringing the state from the initial x 0 = 0 to terminal x [ = p = 4. he cost function is expressed as v π x, t = E hx + ] t 0 ru dτ with hx = w p x p where w p = and r = 0 6. In this example, there is only a terminal cost for the state while during the time horizon the state dependent cost is zero and thus there is only control cost. In figure, the left plot illustrates the state trajectory as it approaches the target state p = 4 while the right plot illustrates the optimal control. Since there is only a terminal cost, the control over the time horizon is almost zero while at the very end of the time horizon the control is activated and the state reaches the target state. he second system is given by the equation that follows: dx = αx dt + udt + x dω 7 he quadratic approximation is based on the matrices A t = αxtdt, B t = dt and O dt = αdt. he parameter α controls the degree of the instability of the system. For our simulations we used α = he task is to bring the state xt from the initial state x o = 0 to target state x = p =. he cost function has the same form as in the first example but with tuning parameters w p = and r = 0 3. Furthermore, the system has only state depended noise and therefore M = σ dtv xx x, F = 4σ dtv xx x, S = x 3 V xx σ dt while the terms Ñ = G = L = Z = Ũ = 0. In figure, the left plot illustrates state space trajectories for different values of the variance of the noise while the In both examples, the noise is used only during the iterative optimization process of SDDP. When testing the optimal control, instead of running the controlled system many times for different realizations of noise and then calculate the mean, we just zero the noise and run the system only once. right plot illustrates the convergence behavior of SDDP. An important observation is that curved trajectories correspond to cases with high variance noise. Furthermore, as the variance of the noise decreases the optimal trajectories become more and more straight. VI. DISCUSSION In this paper we explicitly derived the equations describing the second order expansion of the cost-to-go, given state and control dependent noise. Our main result is that the expressions remain quadratic WR δx and, so the basic structure of the algorithm, a quadratic cost-to-go approximation with a linear policy, remains unchanged. In addition we have shown how the cases of deterministic and stochastic DDP with additive noise are sub-cases of our generalized formulation of Stochastic DDP. Current and future research includes further testing and evaluation of our generalized Stochastic DDP algorithm on multidimensional stochastic systems which are highly nonlinear and have noise that is control and state dependent. Biomechanical models belong in this class due to highly nonlinear and noisy nature of muscle dynamics. Moreover we are aiming to incorporate a second order Extended Kalman Filter that can handle observation as well as process noise. he resulting optimal controller-estimator scheme will be a version of iterative Quadratic Gaussian regulator that can handle stochastic dynamics expanded up to the second order with state and control dependent noise. REFERENCES [] E. odorov and W. Li. A generalized iterative LQG method for locallyoptimal feedback control of constrained nonlinear stochastic systems. In Proceedings of the American Control Conference, 005. [] W. Li. and E odorov. Iterative optimal control and estimation design for stochastic nonlinear systems. In Proceedings of Conference on Decision and Control, 006. [3] D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Optimal Control. Elsevier Publishing Company, New York, 970. [4] Gregory Lantoine and Ryan P. Russell. A hybrid differential dynamic programming algorithm for robust low-thrust optimization. In AAS/AIAA Astrodynamics Specialist Conference and Exhibit, 008. [5] S. Yakowitz. he stagewise kuhn-tucker condition and differential dynamic programming. IEEE ransactions on Automatic Control, 3:5 30, 986. [6] Jun Morimoto and Chris Atkeson. Minimax differential dynamic programming: An application to robust biped walking. In In Advances in Neural Information Processing Systems 5. MI Press, Cambridge, MA, 00. [7] Yuval assa, om Erez, and William Smart. Receding horizon differential dynamic programming. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 0, pages MI Press, Cambridge, MA, 008.

Game Theoretic Continuous Time Differential Dynamic Programming

Game Theoretic Continuous Time Differential Dynamic Programming Game heoretic Continuous ime Differential Dynamic Programming Wei Sun, Evangelos A. heodorou and Panagiotis siotras 3 Abstract In this work, we derive a Game heoretic Differential Dynamic Programming G-DDP

More information

High-Order Local Dynamic Programming

High-Order Local Dynamic Programming High-Order Local Dynamic Programming Yuval Tassa Interdisciplinary Center for Neural Computation Hebrew University, Jerusalem, Israel tassa@alice.nc.hui.ac.il Emanuel Todorov Applied Mathematics and Computer

More information

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Jun Morimoto Human Information Science Labs, Department 3, ATR International Keihanna Science City, Kyoto, JAPAN, 619-0288

More information

Data-Driven Differential Dynamic Programming Using Gaussian Processes

Data-Driven Differential Dynamic Programming Using Gaussian Processes 5 American Control Conference Palmer House Hilton July -3, 5. Chicago, IL, USA Data-Driven Differential Dynamic Programming Using Gaussian Processes Yunpeng Pan and Evangelos A. Theodorou Abstract We present

More information

Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion

Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion J. Morimoto and C. Atkeson Human Information Science Laboratories, ATR International, Department 3 2-2-2

More information

Probabilistic Differential Dynamic Programming

Probabilistic Differential Dynamic Programming Probabilistic Differential Dynamic Programming Yunpeng Pan and Evangelos A. Theodorou Daniel Guggenheim School of Aerospace Engineering Institute for Robotics and Intelligent Machines Georgia Institute

More information

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

arxiv: v1 [cs.sy] 3 Sep 2015

arxiv: v1 [cs.sy] 3 Sep 2015 Model Based Reinforcement Learning with Final Time Horizon Optimization arxiv:9.86v [cs.sy] 3 Sep Wei Sun School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. wsun4@gatech.edu

More information

Stochastic and Adaptive Optimal Control

Stochastic and Adaptive Optimal Control Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal

More information

EE C128 / ME C134 Feedback Control Systems

EE C128 / ME C134 Feedback Control Systems EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of

More information

Inverse Optimality Design for Biological Movement Systems

Inverse Optimality Design for Biological Movement Systems Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google

More information

Optimal control and estimation

Optimal control and estimation Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Optimal Limit-Cycle Control recast as Bayesian Inference

Optimal Limit-Cycle Control recast as Bayesian Inference Optimal Limit-Cycle Control recast as Bayesian Inference Yuval Tassa Tom Erez Emanuel Todorov Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, Israel, (e-mail: tassa@alice.nc.huji.ac.il).

More information

Kalman Filters with Uncompensated Biases

Kalman Filters with Uncompensated Biases Kalman Filters with Uncompensated Biases Renato Zanetti he Charles Stark Draper Laboratory, Houston, exas, 77058 Robert H. Bishop Marquette University, Milwaukee, WI 53201 I. INRODUCION An underlying assumption

More information

Chapter 2 Event-Triggered Sampling

Chapter 2 Event-Triggered Sampling Chapter Event-Triggered Sampling In this chapter, some general ideas and basic results on event-triggered sampling are introduced. The process considered is described by a first-order stochastic differential

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

State Estimation using Moving Horizon Estimation and Particle Filtering

State Estimation using Moving Horizon Estimation and Particle Filtering State Estimation using Moving Horizon Estimation and Particle Filtering James B. Rawlings Department of Chemical and Biological Engineering UW Math Probability Seminar Spring 2009 Rawlings MHE & PF 1 /

More information

Variable Impedance Control A Reinforcement Learning Approach

Variable Impedance Control A Reinforcement Learning Approach Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness

More information

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic

More information

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018 EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal

More information

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Simo Särkkä Aalto University, Finland (visiting at Oxford University, UK) November 13, 2013 Simo Särkkä (Aalto) Lecture 1: Pragmatic

More information

ESC794: Special Topics: Model Predictive Control

ESC794: Special Topics: Model Predictive Control ESC794: Special Topics: Model Predictive Control Nonlinear MPC Analysis : Part 1 Reference: Nonlinear Model Predictive Control (Ch.3), Grüne and Pannek Hanz Richter, Professor Mechanical Engineering Department

More information

arxiv: v1 [cs.ro] 15 May 2017

arxiv: v1 [cs.ro] 15 May 2017 GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems Gilwoo Lee Siddhartha S. Srinivasa Matthew T. Mason arxiv:1705.05344v1 [cs.ro] 15 May 2017 Abstract As we aim to control

More information

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1 A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1 Ali Jadbabaie, Claudio De Persis, and Tae-Woong Yoon 2 Department of Electrical Engineering

More information

Multi-Robot Active SLAM with Relative Entropy Optimization

Multi-Robot Active SLAM with Relative Entropy Optimization Multi-Robot Active SLAM with Relative Entropy Optimization Michail Kontitsis, Evangelos A. Theodorou and Emanuel Todorov 3 Abstract This paper presents a new approach for Active Simultaneous Localization

More information

Suboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1)

Suboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1) Suboptimality of minmax MPC Seungho Lee In this paper, we consider particular case of Model Predictive Control (MPC) when the problem that needs to be solved in each sample time is the form of min max

More information

McGill University Department of Electrical and Computer Engineering

McGill University Department of Electrical and Computer Engineering McGill University Department of Electrical and Computer Engineering ECSE 56 - Stochastic Control Project Report Professor Aditya Mahajan Team Decision Theory and Information Structures in Optimal Control

More information

Stochastic Tube MPC with State Estimation

Stochastic Tube MPC with State Estimation Proceedings of the 19th International Symposium on Mathematical Theory of Networks and Systems MTNS 2010 5 9 July, 2010 Budapest, Hungary Stochastic Tube MPC with State Estimation Mark Cannon, Qifeng Cheng,

More information

ITERATIVE PATH INTEGRAL STOCHASTIC OPTIMAL CONTROL: THEORY AND APPLICATIONS TO MOTOR CONTROL. Evangelos A. Theodorou

ITERATIVE PATH INTEGRAL STOCHASTIC OPTIMAL CONTROL: THEORY AND APPLICATIONS TO MOTOR CONTROL. Evangelos A. Theodorou ITERATIVE PATH INTEGRAL STOCHASTIC OPTIMAL CONTROL: THEORY AND APPLICATIONS TO MOTOR CONTROL by Evangelos A. Theodorou A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN

More information

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4. Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation

More information

Nonlinear and robust MPC with applications in robotics

Nonlinear and robust MPC with applications in robotics Nonlinear and robust MPC with applications in robotics Boris Houska, Mario Villanueva, Benoît Chachuat ShanghaiTech, Texas A&M, Imperial College London 1 Overview Introduction to Robust MPC Min-Max Differential

More information

6.241 Dynamic Systems and Control

6.241 Dynamic Systems and Control 6.241 Dynamic Systems and Control Lecture 24: H2 Synthesis Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology May 4, 2011 E. Frazzoli (MIT) Lecture 24: H 2 Synthesis May

More information

State Space Representation of Gaussian Processes

State Space Representation of Gaussian Processes State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)

More information

Linearly-Solvable Stochastic Optimal Control Problems

Linearly-Solvable Stochastic Optimal Control Problems Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014

More information

Policy Gradient Reinforcement Learning for Robotics

Policy Gradient Reinforcement Learning for Robotics Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Lie Groups for 2D and 3D Transformations

Lie Groups for 2D and 3D Transformations Lie Groups for 2D and 3D Transformations Ethan Eade Updated May 20, 2017 * 1 Introduction This document derives useful formulae for working with the Lie groups that represent transformations in 2D and

More information

IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS

IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS D. Limon, J.M. Gomes da Silva Jr., T. Alamo and E.F. Camacho Dpto. de Ingenieria de Sistemas y Automática. Universidad de Sevilla Camino de los Descubrimientos

More information

OPTIMAL CONTROL AND ESTIMATION

OPTIMAL CONTROL AND ESTIMATION OPTIMAL CONTROL AND ESTIMATION Robert F. Stengel Department of Mechanical and Aerospace Engineering Princeton University, Princeton, New Jersey DOVER PUBLICATIONS, INC. New York CONTENTS 1. INTRODUCTION

More information

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA On Stochastic Adaptive Control & its Applications Bozenna Pasik-Duncan University of Kansas, USA ASEAS Workshop, AFOSR, 23-24 March, 2009 1. Motivation: Work in the 1970's 2. Adaptive Control of Continuous

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Optimization using Calculus. Optimization of Functions of Multiple Variables subject to Equality Constraints

Optimization using Calculus. Optimization of Functions of Multiple Variables subject to Equality Constraints Optimization using Calculus Optimization of Functions of Multiple Variables subject to Equality Constraints 1 Objectives Optimization of functions of multiple variables subjected to equality constraints

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

EL2520 Control Theory and Practice

EL2520 Control Theory and Practice EL2520 Control Theory and Practice Lecture 8: Linear quadratic control Mikael Johansson School of Electrical Engineering KTH, Stockholm, Sweden Linear quadratic control Allows to compute the controller

More information

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I

More information

The Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty

The Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty The Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty Michael Athans, Richard Ku, Stanley Gershwin (Nicholas Ballard 538477) Introduction

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model

Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model BULGARIAN ACADEMY OF SCIENCES CYBERNEICS AND INFORMAION ECHNOLOGIES Volume No Sofia Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model sonyo Slavov Department of Automatics

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section

More information

Differential Dynamic Programming with Nonlinear Constraints

Differential Dynamic Programming with Nonlinear Constraints Differential Dynamic Programming with Nonlinear Constraints Zhaoming Xie 1 C. Karen Liu 2 Kris Hauser 3 Abstract Differential dynamic programming (DDP) is a widely used trajectory optimization technique

More information

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands Bert Kappen Examples of control tasks Motor control Bert Kappen Pascal workshop, 27-29 May

More information

Covariance Matrix Simplification For Efficient Uncertainty Management

Covariance Matrix Simplification For Efficient Uncertainty Management PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part

More information

Riccati difference equations to non linear extended Kalman filter constraints

Riccati difference equations to non linear extended Kalman filter constraints International Journal of Scientific & Engineering Research Volume 3, Issue 12, December-2012 1 Riccati difference equations to non linear extended Kalman filter constraints Abstract Elizabeth.S 1 & Jothilakshmi.R

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

Stochastic contraction BACS Workshop Chamonix, January 14, 2008

Stochastic contraction BACS Workshop Chamonix, January 14, 2008 Stochastic contraction BACS Workshop Chamonix, January 14, 2008 Q.-C. Pham N. Tabareau J.-J. Slotine Q.-C. Pham, N. Tabareau, J.-J. Slotine () Stochastic contraction 1 / 19 Why stochastic contraction?

More information

Robust Model Predictive Control for Autonomous Vehicle/Self-Driving Cars

Robust Model Predictive Control for Autonomous Vehicle/Self-Driving Cars Robust Model Predictive Control for Autonomous Vehicle/Self-Driving Cars Che Kun Law, Darshit Dalal, Stephen Shearrow A robust Model Predictive Control (MPC) approach for controlling front steering of

More information

Imprecise Filtering for Spacecraft Navigation

Imprecise Filtering for Spacecraft Navigation Imprecise Filtering for Spacecraft Navigation Tathagata Basu Cristian Greco Thomas Krak Durham University Strathclyde University Ghent University Filtering for Spacecraft Navigation The General Problem

More information

A RECEDING HORIZON CONTROL FOR LIFTING PING-PONG BALL. Sumiko Majima and Keii Chou

A RECEDING HORIZON CONTROL FOR LIFTING PING-PONG BALL. Sumiko Majima and Keii Chou A RECEDING HORIZON CONROL FOR LIFING PING-PONG BALL Sumiko Majima and Keii Chou Graduate School of Systems and Information Engineering, University of sukuba, sukuba, Ibaraki, Japan Abstract: his paper

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

4F3 - Predictive Control

4F3 - Predictive Control 4F3 Predictive Control - Lecture 3 p 1/21 4F3 - Predictive Control Lecture 3 - Predictive Control with Constraints Jan Maciejowski jmm@engcamacuk 4F3 Predictive Control - Lecture 3 p 2/21 Constraints on

More information

Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential Use in Nonlinear Robust Estimation

Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential Use in Nonlinear Robust Estimation Proceedings of the 2006 IEEE International Conference on Control Applications Munich, Germany, October 4-6, 2006 WeA0. Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential

More information

Chapter 2 Optimal Control Problem

Chapter 2 Optimal Control Problem Chapter 2 Optimal Control Problem Optimal control of any process can be achieved either in open or closed loop. In the following two chapters we concentrate mainly on the first class. The first chapter

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10) Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations Simo Särkkä Aalto University, Finland November 18, 2014 Simo Särkkä (Aalto) Lecture 4: Numerical Solution of SDEs November

More information

Adaptive Dual Control

Adaptive Dual Control Adaptive Dual Control Björn Wittenmark Department of Automatic Control, Lund Institute of Technology Box 118, S-221 00 Lund, Sweden email: bjorn@control.lth.se Keywords: Dual control, stochastic control,

More information

OPTIMAL ESTIMATION of DYNAMIC SYSTEMS

OPTIMAL ESTIMATION of DYNAMIC SYSTEMS CHAPMAN & HALL/CRC APPLIED MATHEMATICS -. AND NONLINEAR SCIENCE SERIES OPTIMAL ESTIMATION of DYNAMIC SYSTEMS John L Crassidis and John L. Junkins CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London

More information

A variational radial basis function approximation for diffusion processes

A variational radial basis function approximation for diffusion processes A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

Modeling and Control Overview

Modeling and Control Overview Modeling and Control Overview D R. T A R E K A. T U T U N J I A D V A N C E D C O N T R O L S Y S T E M S M E C H A T R O N I C S E N G I N E E R I N G D E P A R T M E N T P H I L A D E L P H I A U N I

More information

Data assimilation with and without a model

Data assimilation with and without a model Data assimilation with and without a model Tyrus Berry George Mason University NJIT Feb. 28, 2017 Postdoc supported by NSF This work is in collaboration with: Tim Sauer, GMU Franz Hamilton, Postdoc, NCSU

More information

Minimum Angular Acceleration Control of Articulated Body Dynamics

Minimum Angular Acceleration Control of Articulated Body Dynamics Minimum Angular Acceleration Control of Articulated Body Dynamics Javier R. Movellan UCSD Abstract As robots find applications in daily life conditions it becomes important to develop controllers that

More information

Linear-Quadratic Optimal Control: Full-State Feedback

Linear-Quadratic Optimal Control: Full-State Feedback Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually

More information

A Robust Controller for Scalar Autonomous Optimal Control Problems

A Robust Controller for Scalar Autonomous Optimal Control Problems A Robust Controller for Scalar Autonomous Optimal Control Problems S. H. Lam 1 Department of Mechanical and Aerospace Engineering Princeton University, Princeton, NJ 08544 lam@princeton.edu Abstract Is

More information

Model-free Policy Search

Model-free Policy Search C H A P E R 17 Model-free Policy Search 17.1 INRODUCION In chapter 1, we talked about policy search as a nonlinear optimization problem, and discussed efficient ways to calculate the gradient of the long-term

More information

LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise) What is Optimization? An Optimization Problem has 3 parts. x F f(x) :

More information

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control problem estimation-control separation principle

More information

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 42, NO 6, JUNE 1997 771 Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach Xiangbo Feng, Kenneth A Loparo, Senior Member, IEEE,

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. VIII Kalman Filters - Mohinder Singh Grewal

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. VIII Kalman Filters - Mohinder Singh Grewal KALMAN FILERS Mohinder Singh Grewal California State University, Fullerton, USA Keywords: Bierman, hornton, Cholesky decomposition, continuous time optimal estimator, covariance, estimators, filters, GPS,

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

4F3 - Predictive Control

4F3 - Predictive Control 4F3 Predictive Control - Lecture 2 p 1/23 4F3 - Predictive Control Lecture 2 - Unconstrained Predictive Control Jan Maciejowski jmm@engcamacuk 4F3 Predictive Control - Lecture 2 p 2/23 References Predictive

More information

Pontryagin s maximum principle

Pontryagin s maximum principle Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin

More information

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities. 19 KALMAN FILTER 19.1 Introduction In the previous section, we derived the linear quadratic regulator as an optimal solution for the fullstate feedback control problem. The inherent assumption was that

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Numerical Solutions of ODEs by Gaussian (Kalman) Filtering

Numerical Solutions of ODEs by Gaussian (Kalman) Filtering Numerical Solutions of ODEs by Gaussian (Kalman) Filtering Hans Kersting joint work with Michael Schober, Philipp Hennig, Tim Sullivan and Han C. Lie SIAM CSE, Atlanta March 1, 2017 Emmy Noether Group

More information

Bayesian Decision Theory in Sensorimotor Control

Bayesian Decision Theory in Sensorimotor Control Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Nonlinear Model Predictive Control Tools (NMPC Tools)

Nonlinear Model Predictive Control Tools (NMPC Tools) Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator

More information

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE M. Lázaro 1, I. Santamaría 2, F. Pérez-Cruz 1, A. Artés-Rodríguez 1 1 Departamento de Teoría de la Señal y Comunicaciones

More information