Stochastic Differential Dynamic Programming
|
|
- Leonard Day
- 6 years ago
- Views:
Transcription
1 Stochastic Differential Dynamic Programming Evangelos heodorou, Yuval assa & Emo odorov Abstract Although there has been a significant amount of work in the area of stochastic optimal control theory towards the development of new algorithms, the problem of how to control a stochastic nonlinear system remains an open research topic. Recent iterative linear quadratic optimal control methods ilqg [], [] handle control and state multiplicative noise while they are derived based on first order approximation of dynamics. On the other hand, methods such as Differential Dynamic Programming expand the dynamics up to the second order but so far they can handle nonlinear systems with additive noise. In this work we present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state and control multiplicative process noise, and proceed to derive the second-order expansion of the cost-to-go. We find the correction terms that arise from the stochastic assumption. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP. I. INRODUCION Optimal Control describes the choice of actions that minimize future costs under the constraint of state space dynamics. In the continuous nonlinear case, local methods are one of the very few classes of algorithms which successfully solve general, high-dimensional Optimal Control problems. hese methods are based on the observation that optimal solutions form extremal trajectories, i.e. are solutions to a calculus-of-variations problem. Differential Dynamic Programming, or DDP, is a powerful local dynamic programming algorithm, which generates both open and closed loop control policies along a trajectory. he DDP algorithm, introduced in [3], computes a quadratic approximation of the cost-to-go and correspondingly, a local linear-feedback controller. he state space dynamics are also quadratically approximated around a trajectory. As in the Linear-Quadratic-Gaussian case, fixed additive noise has no effect on the controllers generated by DDP, and when described in the literature, the dynamics are usually assumed deterministic. Past work on nonlinear optimal control allows the use of DDP in problems with state and control constraints [4],[5]. In this work, state and control constraints are expanded up to the first order and the KK conditions are formulated resulting in a unconstrained quadratic optimization problem. E. heodorou is with the Computational Learning and Motor Control Lab, Departments of Computer Science and Neuroscience, University of Southern California etheodor@usc.edu Y. assa is with the Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, Israel tassa@alice.nc.huji.ac.il E. odorov is with the Department of Computer Science and Engineering and the Department of Applied Mathematics, University of Washington Seattle, WA, todorov@cs.washington.edu he application of DDP in real robotic high dimensional control tasks created the need for extensions that relax the condition for accurate models. In this vain, in [6] DDP is extended for the case of min - max formulation. he goal for this formulation is to make DDP robust against the model uncertainties and hybrid dynamics in a biped robotic locomotion task. In [7] implementation improvements regarding the evaluation of the value function allow the use of DDP in a receding horizon mode. In all the past work related to DDP, the optimal control problem is considered to be deterministic. While the impartiality to noise can be considered a feature if the noise is indeed fixed, in many cases varying noise covariance is an important feature of the problem, as with control-multiplicative noise which is common in biological systems []. his latter case was addressed within the iterative-lqg framework [], in which the optimal controller is derived based on the first order approximation of the dynamics and the second order approximation of the cost to go. However, for the iterative nonlinear optimal control algorithms such as DDP in which second order expansion of the dynamics is considered, the more general case of state and/or control multiplicative noise appears to have never been tackled. In this paper, we derive the DDP algorithm for state and control multiplicative process noise. We find that despite the potential of cubic and quartic terms, these cancel out, allowing us to maintain the quadratic form of the approximation. Moreover we show how the new generalized formulation of Stochastic Differential Dynamic Programming SDDP recovers the standard DDP deterministic solution as well as the special cases in which only state multiplicative or control multiplicative noise is considered. he remaining of this work is organized as follows: in the next section we provide the definition of the SDDP. In section III, the second order expansion of the cost to go is presented and in section IV the optimal controls are derived and the overall SDDP algorithm is presented. In addition in section IV we show how SDDP recovers the deterministic solution as well as the cases of only control multiplicative, only state multiplicative and only additive noise. Finally in sections V and VI simulation results and conclusions are discussed. II. SOCHASIC DIFFERENIAL DYNAMIC PROGRAMMING We consider the class of nonlinear stochastic optimal control problems with cost v π x, t = E [ hx + t 0 l τ, xτ, πτ, xτ dτ ]
2 subject to the stochastic dynamics of the form: dx = fx, udt + F x, udω where x R n is the state, u R p is the control and dω R m is brownian noise. he term hx in the cost function, is the terminal cost while the l τ, xτ, πτ, xτ is the instantaneous cost rate which is a function of the state x and control policy πτ, xτ. he cost-to - go v π x, t is defined as the expected cost accumulated over the time horizon t 0,, starting from the initial state x t to the final state x. o enhance the readability of our derivations we write the dynamics as a function Φ R n of the state, control and instantiation of the noise: Φx, u, dω fx, udt + F x, udω 3 It will sometimes be convenient to write the matrix F x, u R n m in terms of its rows or columns: Fr x, u [ ] F x, u =. = Fc x, u,..., Fc p x, u Fr n x, u Every element of the vector Φx, u, dω R n can now be expressed as: Φ j x, u, dω = f j x, uδt + F j r x, udω Given a nominal trajectory of states and controls x, ū we expand the dynamics around this trajectory to second order: Φ x + δx, ū +, dω = Φ x, ū, dω + x Φ δx + u Φ + Oδx,, dω where Oδx,, dω R n contains all the second order terms in the deviations in states, controls and noise. Writing this term element-wise: Oδx,, dω = O δx,, dω. O n δx,, dω, we can express the elements O j δx,, dω R as: O j δx,, dω = δx xx Φ j xu Φ j ux Φ j uu Φ j δx. We would now like to express the derivatives of Φ in terms of the given quantities. Beginning with the first-order terms, we find that: x Φ = x fx, uδt + x m u Φ = u fx, uδt + u m Not to be confused with big-o. F i c dω i t F i c dω i t Next we find the second order derivatives and we have that: xx Φ j = xx f j x, uδt + xx F r j x, udω t uu Φ j = uu f j x, uδt + uu F r j x, udω t ux Φ j = ux f j x, uδt + ux F r j x, udω t xu Φ j = ux Φ j After expanding the dynamics up to the second order we can transition from continuous to discrete time. More precisely the discrete-time dynamics are formulated as: δx t+δt = + I n n + x fx, uδt + x m u fx, uδt + u m F i c + F x, u δtξ t + O d δx,, ξ, δt F i c ξ i t δt δx t ξ i t δt t with δt = t k+ t k corresponding to a small discretization interval. Note that the term O d is the equivalent of O but in discrete time and therefore it is now a function of δt. In fact, since O d contains all the second order expansion terms of the dynamics it contains second order derivatives WR state and control expressed as follows: xx Φ j = xx f j x, uδt + xx F r j x, uξ t δt uu Φ j = uu f j x, uδt + uu F r j x, uξ t δt ux Φ j = ux f j x, uδt + ux F r j x, uξ t δt xu Φ j = ux Φ j he random variable ξ R m is zero mean and Gaussian distributed with covariance Σ = σ I m m he discretized dynamics can be written in a more compact form by grouping the state, control and noise dependent terms, and leaving the second order term separate: δx t+δt = A t δx t + B t t + Γ t ξ t + O d δx,, ξ, δt 4 where the matrices A t R n n, B t R n p and Γ t R n m are defined as A t = I n n + x fx, uδt B t = u fx, uδt Γ t = [ Γ Γ Γ m ] with Γ i R n defined Γ i = u F c i t + x F c i δx t + F c i. For the derivation of the optimal control it is useful to expresses Γ t as the summation of terms that depend on variations in state and controls and terms that are independent of such variations. More precisely we will have that:
3 Γ t = t δx, + F x, u 5 where each column vector of t is defined as δx, = u F c i t + x F c i δx t. i t III. VALUE FUNCION SECOND ORDER APPROXIMAION As in classical DDP, the derivation of stochastic DDP requires the second order expansion of the cost-to-go function around a nominal trajectory x: V x + δx = V x + V x δx + δx V xx δx 6 Substitution of the discretized dynamics 4 in the second order Value function expansion 6 results in: V x t+δt + δx t+δt = V x t+δt + V x A t δx t + B t t + Γ t ξ + O d + A t δx t + B t t + Γ t ξ + O d V xx A t δx t + B t t + Γ t ξ + O d 7 Next we will compute E V x t+δt + δx t+δt which requires the calculation of the expectation of the all the terms that appear in the equation above. his is what the rest of the analysis is dedicated to. More precisely in the next two sections we will calculate the expectation of the terms: and E V x δx t+δt E δx t+δtv xx δx t+δt where the state deviation δx t+δt at time instant t + δt is given by the linearized dynamics: 8 9 δx t+δt = A t δx t + B t t + Γ t ξ + O d 0 he analysis that follows in section III-A consist of the computation of the expectation of the four terms which result from the substitution of the linearized dynamics 0 into 8. In section III-B we compute the expectation of the 6 terms that result from the substitution of 0 into 9. A. Expectation of the first order term of the value function expansion x V δx t+δt. he expectation of the first order term results in: E V x A t δx t + B t t + Γ t ξ t + O d = V x A t δx t + B t t + E O d In order to find the expectation of O d R n we need to find the expectation of each one of the elements of this column vector. hus we will have that: E E O j δx,, ξ t, δt = δx xx Φ j xu Φ j ux Φ j uu Φ j = δt δx xx f j xu f j ux f j herefore we will have that: uu f j δx δx E V x δx t+δt = V x A t δx t + B t t + Õd Where the term Õd is defined as: Õ d δx,, δt = Õ δx,, δt Õ n δx,, δt = = Õj 3 4 he term x V Õ d is quadratic in variations in the states and controls δx, an thus there are the symmetric matrices F R n n, Z R m m and L R m n such that: V x Õd = δx Fδx + Z + Lδx 5 with F = xx f j V xj 6 Z = uu f j V xj 7 L = ux f j V xj 8 From the analysis above we can see that the expectation x V δx t+δt is a quadratic function with respect to variations in states and controls δx,. As we will prove in the next section the expectation of δx t+δt xxv δx t+δt is also a quadratic function of variations in states and controls δx,. B. Expectation of the second order term of the value function expansion δx t+δt xxv δx t+δt. In this section we compute all the terms that appear due to the expectation of the second approximation of the value function E δx t+δt xxv δx t+δt. he term δxt+δt is given by the stochastic dynamics in 0. Substitution of 0 results in 6 terms. o make our analysis clear we classify these 6 terms terms above into five classes. More precisely we will have that:
4 E δx t+δtv xxδx t+δt = E + E + E 3 + E 4 + E 5 9 where the terms E, E, E 3, E 4 and E 5 are defined as follows: E = E δx t A t V xx A t δx t + E t Bt V xx B t t +E δx t A t V xx B t t + E t Bt V xx A t δx t 0 E = E ξ t Γ t V xx A t δx + E ξ t Γ t V xx B t + E δx A t V xx Γ t ξ t + E Bt V xx Γ t ξ t + E ξ t Γ t V xx Γ t ξ t E 3 = E O d V xx Γ t ξ t + E ξ t Γ t V xx O d E 4 = E δx t A t V xx O d + E t B t V xx O d + E O d V xx B t t + E O d V xx A t δx t E 5 = E O d V xx O d 3 4 In the first category we have all these terms that depend neither on ξ t and nor on O d δx,, ξ t, δt. hese are the terms that define E. he second category E includes terms that depend on ξ t but not on O d δx,, ξ t, δt. In the third class E 3, there are terms that depends both on O d δx,, ξ t, δt and ξ t. In the fourth class E 4, we have terms that depend on O d δx,, ξ t, δt. Finally in the fifth class E 5, we have all these terms that depend on O d δx,, ξ t, δt quadratically. he expectation operator will cancel all the terms that include noise up the first order. Moreover, the mean operator for terms that depend on the noise quadratically will result in covariance. We compute the expectations of all the terms in the E class. More precisely we will have that: E δx t A t V xx A t δx t = δx t A t V xx A t δx t E t B t V xx B t t = t B t V xx B t t E δx t A t V xx B t t = δx t A t V xx B t t E t B t V xx A t δx t = t B t V xx A t δx t 5 We continue our analysis by calculating all the terms in the class E. More presicely we will have: E ξ t Γ t V xx A t δx = 0 E ξ t Γ t V xx B t = 0 E ξ t Γ t V xx A t δx = 0 E ξ t Γ t V xx B t = 0 6 he terms above are equal to zero since the brownian noise is zero mean. he expectation of the term that does not depend on O d δx,, ξ t, δt and it is quadratic with respect to the noise is given as follows: E ξ t Γ t V xx Γ t ξ t = trace Γ t V xx Γ t Σ ω 7 Since matrix Γ depends on variations in states and controls δx, we can further massage the expressions above so that it can be expressed as quadratic functions in δx,. trace Γ t V xx Γ t Σ ω = σ dω δt 8 Γ trace V xx Γ Γ m Γ m 9 = σdωδt Γ i V xx Γ i 30 he last equation is written in the form: trace Γ t V xx Γ t Σ ω = δx Fδx + δx L + Z + Ũ + δx S + γ 3 Where the terms F R n m, L R n p, Z R p p, Ũ R p, S R n and γ R are defined as follows: F = σ δt x F c i V xx x F c i 3 L = σ δt Z = σ δt Ũ = σ δt S = σ δt γ = σ δt x F c i V xx u F c i 33 u F c i V xx u F c i 34 u F c i V xx F c i 35 x F c i V xx F c i 36 F c i V xx F c i 37 For those terms that depend both on O d δx,, ξ t, δt and on the noise class E 3 we will have: E O d V xx Γ t ξ t = E trace Vxx Γ t ξ t O d = trace V xx Γ t E ξ t O d 38 By writing the term O d δx,, ξ t, δt in a matrix form and putting the noise vector insight the this matrix we have: E O d V xx Γ t ξ t = 39 trace V xx Γ t E [ ξ t O ξ t O n ]
5 Calculation of the expectation above requires to find the δtξt terms E O j more precisely we will have: E δtξt O j = 40 E δtξt δx Φ i xxδx + E δtξt Φ i uu + E δtξt Φ i uxδx We first calculate the term: E δtξt δx xx Φ i δx 4 = E δtξt δx xx f i δt + xx F r i ξ t δt δx = E δtξt δx xx f i δt δx + E δtξt δx xx F r i ξ t δt δx δtξt he term E δx xx f i δt δx = 0 since it depends linearly on the noise and E ξ t = 0. he second term δtξt E δx xx F r i ξ t δt δx depends quadratically in the noise and thus the expectation operator will result in the variance on the noise. We follow the analysis: E δtξt δx xx Φ i δx = 4 E δtξt δx xx F r i ξ t δt δx Since the ξ t = ξ,, ξ m F i,., F im we will have that: and F i r = E δtξt δx xx Φ i δx = 43 E δtξ t δx xx F ij ξ j δx E δtξ t δx xx F ij ξ j δx E δtξ t δx ξ j xx F ij δx By writing ξ t in vector form we have that: E δtξt δx xx Φ i δx = 44 ξ E δt δx ξ j xx F ij δx ξ m he term δx m ξj xx F ij δx is scalar and it can multiply each one of the elements of the noise vector. m δte ξ δx ξj xx F ij δx m δte ξ m δx ξj xx F ij δx 45 Since E ξ i ξ i = σ and E ξ i ξ j = 0 we can show that: E δtξt δx xx Φ i δx σ δt δx xx F r i δx δx xx F r im δx In a similar way we can show that: and = 46 E δtξt δx uu Φ i δx σ δt uu F r i uu F r im = 47 E δtξt xu Φ i δx σ δt ux F r i δx ux F r im δx = 48 Since we have calculated all the terms of expression 4 we can proceed with the computation of 38. According to the analysis above the term E O d V xxγ t ξ t can be written as follows: E O d V xx Γ t ξ t = 49 trace V xx Γ t M + N + G Where the matrices M R m n, N R m n and G R m n are defined as follows: M = 50 δx xx F σ r δx δx xx F r n δx δt δx xx F r m δx δx xx F r mn δx Similarly N = 5 δx xu F σ r, δx xu F r,n δt δx xu F r m, δx xu F r m,n
6 and G = 5 uu F σ r, uu F r,n δt uu F r m, uu F m,n Based on 5 the term Γ t depends on which is a function of the variations in states and control up to the th order. In addition the matrices M, N and G are also functions of the deviations in state and controls up to the th order. he product of with each one of the matrices M, N and G will result into 3th order terms that can be neglected. By neglecting these terms we can show that: E O d V xx Γ t ξ t = 53 = trace V xx + F M + N + G = trace V xx F M + N + G Each element i, j of the product C = V xx F can be expressed as C i,j = n V xx i,r F r,j where C R n p. Furthermore the element µ, ν of the product H = CM is formulated H µ,ν = n k= Cµ,k M k,ν with H R n n. hus, the term trace V xx F M can be now expressed as: trace V xx F M = = = H l,l 54 l= C l,k M k,l n r V xx k,r F r,l M k,l Since M k,l = δtσdω δx xx F k,l δx the vectors δtσdω δx and δx do not depend on k, l, r and they can be taken outside the sum. hus we can show that: trace V xx F M 55 n = V xx k,r F r,l σ δtδx xx F k,l δx = δx σ δt = δx Mδx n V xx k,r F r,l xx F k,l δx where M is a matrix of dimensionality M R n n and it is defined as: M = σ δt n V xx k,r F r,l xx F k,l 56 By following the same algebraic steps it can been shown that: trace V xx F N = δx Ñ 57 with Ñ matrix of dimensionality Ñ Rn p defined as: Ñ = σ δt and n V xx k,r F r,l xu F k,l 58 trace V xx F G = G 59 with G matrix of dimensionality Ñ Rp p defined as: n G = σ δt V xx k,r F r,l uu F k,l 60 hus the term E O d V xxγ t ξ t is formulated as: E O d V xx Γ t ξ t = δx Mδx + G + δx Ñ 6 Similarly we can show that: E ξ t Γ t V xx O d = δx Mδx + G + δx Ñ 6 Next we will find the expectation for all terms that depend on O d δx,, dω, δt and not on the noise. Consequently, we will have that: E δx t A t V xx O d = δx t A t V xx Õ d = 0 E t B t V xx O d = t B t V xx Õ d = 0 E O d V xx A t δx t = Õ d Vxx A t δx t = 0 E O d V xx B t t = Õ d Vxx B t t = 0 63 where the quantity Õd has been defined in 4. All the 4 terms above are equal to zero since they have variations in state and control of the order higher than and therefore they can be neglected. Finally we compute the terms of the 5th class and therefore we have the expression E 5 = E O d V xx O d 64 = E trace V xx O d O d = trace V xx E O d O d O O = trace V xx E O n O n he product O i O j is a function of variation in state and control of order 4 since each term O i is a function of
7 variation in states and control of order. Consequently, the term E 5 = E O d V xxo d is equal to zero. With the computation of the expectation of term that is quadratic WR O d we have calculated all the terms of the second order expansion of the cost to go function. In the next section we derive the optimal controls and we present the SDDP algorithm. Furthermore we show how SDDP recover the deterministic solution as well as the cases of only control multiplicative, only state multiplicative and only additive noise. IV. OPIMAL CONROLS In this section we provide the form of the optimal controls and we show how previous results are special cases of our generalized stochastic DDP formulation. Furthermore after we computed all the terms of expansion of the cost to go function V x t at statex t we show that its form remains quadratic WR variations in state δx t under the constraint of the nonlinear stochastic dynamics in. More precisely we have that: V x t+δt + δx t+δt = V x t+δt ABLE I PSEUDOCODE OF HE SDDP ALGORIHM Given: An immediate cost function lx, u A terminal cost term φ tn. he stochastic dynamics dx = fx, udt + F x, udω Repeat until convergence: Given a trajectory in states and controls x, ū find the quadratic approximations of the stochastic dynamics A t, B t, Γ t, O d and the quadratic approximation of the immediate cost function l o, l x, l xx, l uu, l ux around these trajectories. Compute all the terms Q x, Q u, Q xu and Q uu according to equation 68. Back-propagate the quadratic approximation of the value function based on the equations: V k+ 0 = V k+ 0 Q uq uuq u V x k+ = Q k+ x Q uq uuq ux V xx k+ = Q k+ xx Q xuq uuq ux Compute = Q uu Q u + Q uxδx Update controls u = u + γ Get the new optimal trajectory x by propagating the nonlinear dynamics dx = fx, u dt + F x, u dω. Set x = x and ū = u and repeat. + x V A t δx t + x V B t t + δx Fδx + Z + Lδx + δx t A t V xx A t δx t + t B t V xx B t t + δx t A t V xx B t t + t B t V xx A t δx t 65 + δx Fδx + δx L + Z + Ũ + δx S + γ + δx Mδx + G + δx Ñ he unmaximized state, action value function is defined as follows: Qx k, u k = lx k, u k + V x k+ 66 Given a trajectory in states and controls x, ū we can approximate the state action value function as follows: Q x + δx, ū + = Q 0 + Q u + δx Q x 67 [ δx ] [ ] [ ] Q xx Q xu δx Q ux Q uu By equating the coefficients with similar powers between the state action value function Qx k, u k and the immediate reward and cost to go lx k, u k and V x k+ respectively we can show that: Q x = l x + A t V x + S Q u = l u + A t V x + Ũ Q xx = l xx + A t V xx A t + F + F + M Q xu = l xu + A t V xu B t + L + L + Ñ Q uu = l uu + B t V uu B t + Z + Z + G 68 where we have assume a local quadratic approximation of the immediate reward lx k, u k according to the equation: l x + δx, ū + = l 0 + l u + δx l x 69 [ δx ] [ ] [ ] l xx l xu δx l ux l uu with l x = l x, l u = l u, l xx = l x, l uu = l u and l ux = l u x. he local variations in control that maximize the state action value function are expressed by the equation that follows: = argmax Q x + δx, ū + 70 u = Q uu Q u + Q ux δx he optimal control variations have the form = l + Lδx where l = Q uuq u is the open loop gain and L = Q uuq ux is the closed loop - feedback gain. he SDDP algorithm is provided in a pseudocode form on able I. For the special cases where the stochastic dynamics have only additive noise F u, x = F then the terms M, Ñ, G, F, L, Z, Ũ, S will be zero since they are functions of xx F and xu F and uu F and it holds that xx F = 0, xu F = 0 and uu F = 0. In such a type of systems the control does not depend on the statistical characteristics of the noise. In cases of deterministic systems again M, Ñ, G, F, L, Z, Ũ, S will be zero because these terms depend on the variance of the noise σ dωi = 0, i =,, m. Finally if the noise is only control depended then M, Ñ, L, F, S will be zero since xx F u = 0, xu F u = 0 and x F c i x = 0 while if it is state dependent then Ñ, G, Z, L, Ũ will be zero since xu F x = 0, uu F x = 0 and u F c i x = 0.
8 State rajectories! *+,-*./0-34,*-5 State rajectories 0.09 Cost 3.5 #! '! &! %! 0.8 $! ! ime Horizon!$!!!"# $ $"# % %"# & 0673/8*-69* ime Horizon Iterations Fig.. he left plot illustrates the state trajectory for the the system dx = αcosxdt + u + x dω. he right plot corresponds to the optimal control. Fig.. Left plot illustrates the state space trajectories of the system dx = αx dt + udt + x dω for different values of noise. he right plot shows the stereotypical convergence behavior of SDDP. V. SIMULAION RESULS In this section we are testing the SDDP on two different one dimensional systems. Most precisely, we consider the one-dimensional stochastic nonlinear system of the form: dx = cosxdt + udt + x dω 7 he quadratic approximation is based on the matrices A t = sinxdt, B t = dt and O dt = cosxdt. he system has only state depended noise and therefore M = σ dtv xx x, F = 4σ dtv xx x, S = x 3 V xx σ dt while the terms Ñ = G = L = Z = Ũ = 0. We apply the SDDP algorithm for the task of bringing the state from the initial x 0 = 0 to terminal x [ = p = 4. he cost function is expressed as v π x, t = E hx + ] t 0 ru dτ with hx = w p x p where w p = and r = 0 6. In this example, there is only a terminal cost for the state while during the time horizon the state dependent cost is zero and thus there is only control cost. In figure, the left plot illustrates the state trajectory as it approaches the target state p = 4 while the right plot illustrates the optimal control. Since there is only a terminal cost, the control over the time horizon is almost zero while at the very end of the time horizon the control is activated and the state reaches the target state. he second system is given by the equation that follows: dx = αx dt + udt + x dω 7 he quadratic approximation is based on the matrices A t = αxtdt, B t = dt and O dt = αdt. he parameter α controls the degree of the instability of the system. For our simulations we used α = he task is to bring the state xt from the initial state x o = 0 to target state x = p =. he cost function has the same form as in the first example but with tuning parameters w p = and r = 0 3. Furthermore, the system has only state depended noise and therefore M = σ dtv xx x, F = 4σ dtv xx x, S = x 3 V xx σ dt while the terms Ñ = G = L = Z = Ũ = 0. In figure, the left plot illustrates state space trajectories for different values of the variance of the noise while the In both examples, the noise is used only during the iterative optimization process of SDDP. When testing the optimal control, instead of running the controlled system many times for different realizations of noise and then calculate the mean, we just zero the noise and run the system only once. right plot illustrates the convergence behavior of SDDP. An important observation is that curved trajectories correspond to cases with high variance noise. Furthermore, as the variance of the noise decreases the optimal trajectories become more and more straight. VI. DISCUSSION In this paper we explicitly derived the equations describing the second order expansion of the cost-to-go, given state and control dependent noise. Our main result is that the expressions remain quadratic WR δx and, so the basic structure of the algorithm, a quadratic cost-to-go approximation with a linear policy, remains unchanged. In addition we have shown how the cases of deterministic and stochastic DDP with additive noise are sub-cases of our generalized formulation of Stochastic DDP. Current and future research includes further testing and evaluation of our generalized Stochastic DDP algorithm on multidimensional stochastic systems which are highly nonlinear and have noise that is control and state dependent. Biomechanical models belong in this class due to highly nonlinear and noisy nature of muscle dynamics. Moreover we are aiming to incorporate a second order Extended Kalman Filter that can handle observation as well as process noise. he resulting optimal controller-estimator scheme will be a version of iterative Quadratic Gaussian regulator that can handle stochastic dynamics expanded up to the second order with state and control dependent noise. REFERENCES [] E. odorov and W. Li. A generalized iterative LQG method for locallyoptimal feedback control of constrained nonlinear stochastic systems. In Proceedings of the American Control Conference, 005. [] W. Li. and E odorov. Iterative optimal control and estimation design for stochastic nonlinear systems. In Proceedings of Conference on Decision and Control, 006. [3] D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Optimal Control. Elsevier Publishing Company, New York, 970. [4] Gregory Lantoine and Ryan P. Russell. A hybrid differential dynamic programming algorithm for robust low-thrust optimization. In AAS/AIAA Astrodynamics Specialist Conference and Exhibit, 008. [5] S. Yakowitz. he stagewise kuhn-tucker condition and differential dynamic programming. IEEE ransactions on Automatic Control, 3:5 30, 986. [6] Jun Morimoto and Chris Atkeson. Minimax differential dynamic programming: An application to robust biped walking. In In Advances in Neural Information Processing Systems 5. MI Press, Cambridge, MA, 00. [7] Yuval assa, om Erez, and William Smart. Receding horizon differential dynamic programming. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 0, pages MI Press, Cambridge, MA, 008.
Game Theoretic Continuous Time Differential Dynamic Programming
Game heoretic Continuous ime Differential Dynamic Programming Wei Sun, Evangelos A. heodorou and Panagiotis siotras 3 Abstract In this work, we derive a Game heoretic Differential Dynamic Programming G-DDP
More informationHigh-Order Local Dynamic Programming
High-Order Local Dynamic Programming Yuval Tassa Interdisciplinary Center for Neural Computation Hebrew University, Jerusalem, Israel tassa@alice.nc.hui.ac.il Emanuel Todorov Applied Mathematics and Computer
More informationMinimax Differential Dynamic Programming: An Application to Robust Biped Walking
Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Jun Morimoto Human Information Science Labs, Department 3, ATR International Keihanna Science City, Kyoto, JAPAN, 619-0288
More informationData-Driven Differential Dynamic Programming Using Gaussian Processes
5 American Control Conference Palmer House Hilton July -3, 5. Chicago, IL, USA Data-Driven Differential Dynamic Programming Using Gaussian Processes Yunpeng Pan and Evangelos A. Theodorou Abstract We present
More informationRobust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion
Robust Low Torque Biped Walking Using Differential Dynamic Programming With a Minimax Criterion J. Morimoto and C. Atkeson Human Information Science Laboratories, ATR International, Department 3 2-2-2
More informationProbabilistic Differential Dynamic Programming
Probabilistic Differential Dynamic Programming Yunpeng Pan and Evangelos A. Theodorou Daniel Guggenheim School of Aerospace Engineering Institute for Robotics and Intelligent Machines Georgia Institute
More informationLinear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationarxiv: v1 [cs.sy] 3 Sep 2015
Model Based Reinforcement Learning with Final Time Horizon Optimization arxiv:9.86v [cs.sy] 3 Sep Wei Sun School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 333-, USA. wsun4@gatech.edu
More informationStochastic and Adaptive Optimal Control
Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal
More informationEE C128 / ME C134 Feedback Control Systems
EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of
More informationInverse Optimality Design for Biological Movement Systems
Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google
More informationOptimal control and estimation
Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationOptimal Limit-Cycle Control recast as Bayesian Inference
Optimal Limit-Cycle Control recast as Bayesian Inference Yuval Tassa Tom Erez Emanuel Todorov Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, Israel, (e-mail: tassa@alice.nc.huji.ac.il).
More informationKalman Filters with Uncompensated Biases
Kalman Filters with Uncompensated Biases Renato Zanetti he Charles Stark Draper Laboratory, Houston, exas, 77058 Robert H. Bishop Marquette University, Milwaukee, WI 53201 I. INRODUCION An underlying assumption
More informationChapter 2 Event-Triggered Sampling
Chapter Event-Triggered Sampling In this chapter, some general ideas and basic results on event-triggered sampling are introduced. The process considered is described by a first-order stochastic differential
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationState Estimation using Moving Horizon Estimation and Particle Filtering
State Estimation using Moving Horizon Estimation and Particle Filtering James B. Rawlings Department of Chemical and Biological Engineering UW Math Probability Seminar Spring 2009 Rawlings MHE & PF 1 /
More informationVariable Impedance Control A Reinforcement Learning Approach
Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness
More informationReal Time Stochastic Control and Decision Making: From theory to algorithms and applications
Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic
More informationEN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018
EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal
More informationLecture 1: Pragmatic Introduction to Stochastic Differential Equations
Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Simo Särkkä Aalto University, Finland (visiting at Oxford University, UK) November 13, 2013 Simo Särkkä (Aalto) Lecture 1: Pragmatic
More informationESC794: Special Topics: Model Predictive Control
ESC794: Special Topics: Model Predictive Control Nonlinear MPC Analysis : Part 1 Reference: Nonlinear Model Predictive Control (Ch.3), Grüne and Pannek Hanz Richter, Professor Mechanical Engineering Department
More informationarxiv: v1 [cs.ro] 15 May 2017
GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems Gilwoo Lee Siddhartha S. Srinivasa Matthew T. Mason arxiv:1705.05344v1 [cs.ro] 15 May 2017 Abstract As we aim to control
More informationA Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1
A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1 Ali Jadbabaie, Claudio De Persis, and Tae-Woong Yoon 2 Department of Electrical Engineering
More informationMulti-Robot Active SLAM with Relative Entropy Optimization
Multi-Robot Active SLAM with Relative Entropy Optimization Michail Kontitsis, Evangelos A. Theodorou and Emanuel Todorov 3 Abstract This paper presents a new approach for Active Simultaneous Localization
More informationSuboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1)
Suboptimality of minmax MPC Seungho Lee In this paper, we consider particular case of Model Predictive Control (MPC) when the problem that needs to be solved in each sample time is the form of min max
More informationMcGill University Department of Electrical and Computer Engineering
McGill University Department of Electrical and Computer Engineering ECSE 56 - Stochastic Control Project Report Professor Aditya Mahajan Team Decision Theory and Information Structures in Optimal Control
More informationStochastic Tube MPC with State Estimation
Proceedings of the 19th International Symposium on Mathematical Theory of Networks and Systems MTNS 2010 5 9 July, 2010 Budapest, Hungary Stochastic Tube MPC with State Estimation Mark Cannon, Qifeng Cheng,
More informationITERATIVE PATH INTEGRAL STOCHASTIC OPTIMAL CONTROL: THEORY AND APPLICATIONS TO MOTOR CONTROL. Evangelos A. Theodorou
ITERATIVE PATH INTEGRAL STOCHASTIC OPTIMAL CONTROL: THEORY AND APPLICATIONS TO MOTOR CONTROL by Evangelos A. Theodorou A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN
More informationOptimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.
Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation
More informationNonlinear and robust MPC with applications in robotics
Nonlinear and robust MPC with applications in robotics Boris Houska, Mario Villanueva, Benoît Chachuat ShanghaiTech, Texas A&M, Imperial College London 1 Overview Introduction to Robust MPC Min-Max Differential
More information6.241 Dynamic Systems and Control
6.241 Dynamic Systems and Control Lecture 24: H2 Synthesis Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology May 4, 2011 E. Frazzoli (MIT) Lecture 24: H 2 Synthesis May
More informationState Space Representation of Gaussian Processes
State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)
More informationLinearly-Solvable Stochastic Optimal Control Problems
Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014
More informationPolicy Gradient Reinforcement Learning for Robotics
Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLie Groups for 2D and 3D Transformations
Lie Groups for 2D and 3D Transformations Ethan Eade Updated May 20, 2017 * 1 Introduction This document derives useful formulae for working with the Lie groups that represent transformations in 2D and
More informationIMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS
IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS D. Limon, J.M. Gomes da Silva Jr., T. Alamo and E.F. Camacho Dpto. de Ingenieria de Sistemas y Automática. Universidad de Sevilla Camino de los Descubrimientos
More informationOPTIMAL CONTROL AND ESTIMATION
OPTIMAL CONTROL AND ESTIMATION Robert F. Stengel Department of Mechanical and Aerospace Engineering Princeton University, Princeton, New Jersey DOVER PUBLICATIONS, INC. New York CONTENTS 1. INTRODUCTION
More informationOn Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA
On Stochastic Adaptive Control & its Applications Bozenna Pasik-Duncan University of Kansas, USA ASEAS Workshop, AFOSR, 23-24 March, 2009 1. Motivation: Work in the 1970's 2. Adaptive Control of Continuous
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationOptimization using Calculus. Optimization of Functions of Multiple Variables subject to Equality Constraints
Optimization using Calculus Optimization of Functions of Multiple Variables subject to Equality Constraints 1 Objectives Optimization of functions of multiple variables subjected to equality constraints
More informationExpectation propagation for signal detection in flat-fading channels
Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA
More informationEL2520 Control Theory and Practice
EL2520 Control Theory and Practice Lecture 8: Linear quadratic control Mikael Johansson School of Electrical Engineering KTH, Stockholm, Sweden Linear quadratic control Allows to compute the controller
More informationKalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q
Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I
More informationThe Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty
The Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty Michael Athans, Richard Ku, Stanley Gershwin (Nicholas Ballard 538477) Introduction
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationAlgorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model
BULGARIAN ACADEMY OF SCIENCES CYBERNEICS AND INFORMAION ECHNOLOGIES Volume No Sofia Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model sonyo Slavov Department of Automatics
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications
Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section
More informationDifferential Dynamic Programming with Nonlinear Constraints
Differential Dynamic Programming with Nonlinear Constraints Zhaoming Xie 1 C. Karen Liu 2 Kris Hauser 3 Abstract Differential dynamic programming (DDP) is a widely used trajectory optimization technique
More informationAn efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands
An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands Bert Kappen Examples of control tasks Motor control Bert Kappen Pascal workshop, 27-29 May
More informationCovariance Matrix Simplification For Efficient Uncertainty Management
PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part
More informationRiccati difference equations to non linear extended Kalman filter constraints
International Journal of Scientific & Engineering Research Volume 3, Issue 12, December-2012 1 Riccati difference equations to non linear extended Kalman filter constraints Abstract Elizabeth.S 1 & Jothilakshmi.R
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationDeterministic Dynamic Programming
Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0
More informationStochastic contraction BACS Workshop Chamonix, January 14, 2008
Stochastic contraction BACS Workshop Chamonix, January 14, 2008 Q.-C. Pham N. Tabareau J.-J. Slotine Q.-C. Pham, N. Tabareau, J.-J. Slotine () Stochastic contraction 1 / 19 Why stochastic contraction?
More informationRobust Model Predictive Control for Autonomous Vehicle/Self-Driving Cars
Robust Model Predictive Control for Autonomous Vehicle/Self-Driving Cars Che Kun Law, Darshit Dalal, Stephen Shearrow A robust Model Predictive Control (MPC) approach for controlling front steering of
More informationImprecise Filtering for Spacecraft Navigation
Imprecise Filtering for Spacecraft Navigation Tathagata Basu Cristian Greco Thomas Krak Durham University Strathclyde University Ghent University Filtering for Spacecraft Navigation The General Problem
More informationA RECEDING HORIZON CONTROL FOR LIFTING PING-PONG BALL. Sumiko Majima and Keii Chou
A RECEDING HORIZON CONROL FOR LIFING PING-PONG BALL Sumiko Majima and Keii Chou Graduate School of Systems and Information Engineering, University of sukuba, sukuba, Ibaraki, Japan Abstract: his paper
More informationMATH 205C: STATIONARY PHASE LEMMA
MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)
More information4F3 - Predictive Control
4F3 Predictive Control - Lecture 3 p 1/21 4F3 - Predictive Control Lecture 3 - Predictive Control with Constraints Jan Maciejowski jmm@engcamacuk 4F3 Predictive Control - Lecture 3 p 2/21 Constraints on
More informationParameterized Joint Densities with Gaussian Mixture Marginals and their Potential Use in Nonlinear Robust Estimation
Proceedings of the 2006 IEEE International Conference on Control Applications Munich, Germany, October 4-6, 2006 WeA0. Parameterized Joint Densities with Gaussian Mixture Marginals and their Potential
More informationChapter 2 Optimal Control Problem
Chapter 2 Optimal Control Problem Optimal control of any process can be achieved either in open or closed loop. In the following two chapters we concentrate mainly on the first class. The first chapter
More informationRobotics. Control Theory. Marc Toussaint U Stuttgart
Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,
More informationSubject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)
Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations
Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations Simo Särkkä Aalto University, Finland November 18, 2014 Simo Särkkä (Aalto) Lecture 4: Numerical Solution of SDEs November
More informationAdaptive Dual Control
Adaptive Dual Control Björn Wittenmark Department of Automatic Control, Lund Institute of Technology Box 118, S-221 00 Lund, Sweden email: bjorn@control.lth.se Keywords: Dual control, stochastic control,
More informationOPTIMAL ESTIMATION of DYNAMIC SYSTEMS
CHAPMAN & HALL/CRC APPLIED MATHEMATICS -. AND NONLINEAR SCIENCE SERIES OPTIMAL ESTIMATION of DYNAMIC SYSTEMS John L Crassidis and John L. Junkins CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London
More informationA variational radial basis function approximation for diffusion processes
A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationModeling and Control Overview
Modeling and Control Overview D R. T A R E K A. T U T U N J I A D V A N C E D C O N T R O L S Y S T E M S M E C H A T R O N I C S E N G I N E E R I N G D E P A R T M E N T P H I L A D E L P H I A U N I
More informationData assimilation with and without a model
Data assimilation with and without a model Tyrus Berry George Mason University NJIT Feb. 28, 2017 Postdoc supported by NSF This work is in collaboration with: Tim Sauer, GMU Franz Hamilton, Postdoc, NCSU
More informationMinimum Angular Acceleration Control of Articulated Body Dynamics
Minimum Angular Acceleration Control of Articulated Body Dynamics Javier R. Movellan UCSD Abstract As robots find applications in daily life conditions it becomes important to develop controllers that
More informationLinear-Quadratic Optimal Control: Full-State Feedback
Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually
More informationA Robust Controller for Scalar Autonomous Optimal Control Problems
A Robust Controller for Scalar Autonomous Optimal Control Problems S. H. Lam 1 Department of Mechanical and Aerospace Engineering Princeton University, Princeton, NJ 08544 lam@princeton.edu Abstract Is
More informationModel-free Policy Search
C H A P E R 17 Model-free Policy Search 17.1 INRODUCION In chapter 1, we talked about policy search as a nonlinear optimization problem, and discussed efficient ways to calculate the gradient of the long-term
More informationLMI Methods in Optimal and Robust Control
LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise) What is Optimization? An Optimization Problem has 3 parts. x F f(x) :
More informationLecture 10 Linear Quadratic Stochastic Control with Partial State Observation
EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control problem estimation-control separation principle
More informationIntroduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim
Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 42, NO 6, JUNE 1997 771 Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach Xiangbo Feng, Kenneth A Loparo, Senior Member, IEEE,
More informationCONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. VIII Kalman Filters - Mohinder Singh Grewal
KALMAN FILERS Mohinder Singh Grewal California State University, Fullerton, USA Keywords: Bierman, hornton, Cholesky decomposition, continuous time optimal estimator, covariance, estimators, filters, GPS,
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More information4F3 - Predictive Control
4F3 Predictive Control - Lecture 2 p 1/23 4F3 - Predictive Control Lecture 2 - Unconstrained Predictive Control Jan Maciejowski jmm@engcamacuk 4F3 Predictive Control - Lecture 2 p 2/23 References Predictive
More informationPontryagin s maximum principle
Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin
More informationHere represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.
19 KALMAN FILTER 19.1 Introduction In the previous section, we derived the linear quadratic regulator as an optimal solution for the fullstate feedback control problem. The inherent assumption was that
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationNumerical Solutions of ODEs by Gaussian (Kalman) Filtering
Numerical Solutions of ODEs by Gaussian (Kalman) Filtering Hans Kersting joint work with Michael Schober, Philipp Hennig, Tim Sullivan and Han C. Lie SIAM CSE, Atlanta March 1, 2017 Emmy Noether Group
More informationBayesian Decision Theory in Sensorimotor Control
Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Öttl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias Freiberger, Martin Öttl Advanced
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationNonlinear Model Predictive Control Tools (NMPC Tools)
Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator
More informationSUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE
SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE M. Lázaro 1, I. Santamaría 2, F. Pérez-Cruz 1, A. Artés-Rodríguez 1 1 Departamento de Teoría de la Señal y Comunicaciones
More information