Stochastic optimal control theory

Size: px
Start display at page:

Download "Stochastic optimal control theory"

Transcription

1 Stochastic optimal control theory Bert Kappen SNN Radboud University Nijmegen the Netherlands July 5, 2008 Bert Kappen

2 Introduction Optimal control theory: Optimize sum of a path cost and end cost. Result is optimal control sequence and optimal trajectory. Input: Cost function. Output: Optimal trajectory and controls. Classical control theory: what control signal should I give to move a plant to a desired state? Input: Desired state trajectory. Output: Optimal control trajectory. Bert Kappen ICML, July

3 Finite horizon (fixed horizon time) Types of optimal control problems x environment t controlled trajectory t f Dynamics and environment may depend explicitly on time. Optimal control depends explicitly on time. Bert Kappen ICML, July

4 Finite horizon (moving horizon) Types of optimal control problems t t f Dynamics and environment are static. Optimal control is time independent. Bert Kappen ICML, July

5 Finite horizon (moving horizon) Types of optimal control problems t t f t t f Dynamics and environment are static. Optimal control is time independent. Similar to RL. Bert Kappen ICML, July

6 Other types of control problems: - minimum time - infinite horizon, average reward - infinite horizon, absorbing states In addition one should distinguish: - discrete vs. continuous state - discrete vs. continuous time - observable vs. partial observable Types of optimal control problems Bert Kappen ICML, July

7 Overview Deterministic optimal control (Kappen, 30 min.) - Introduction of delayed reward problem in discrete time; - Dynamic programming solution and deterministic Bellman equations; - Solution in continuous time and states; - Example: Mass on a spring - Pontryagin maximum principle; Notion of an optimal (particle) trajectory - Again Mass on a spring Bert Kappen ICML, July

8 Overview Stochastic optimal control, discrete case (Toussaint, 40 min.) - Stochastic Bellman equation (discrete state and time) and Dynamic Programming - Reinforcement learning (exact solution, value iteration, policy improvement); Actor critic networks; - Markov decision problems and probabilistic inference; - Example: robotic motion control and planning Bert Kappen ICML, July

9 Overview Stochastic optimal control, continuous case (Kappen, 40 min.) - Stochastic differential equations - Hamilton-Jacobi-Bellman equation (continuous state and time) - LQ control, Ricatti equation; - Example of LQ control - Learning; Partial observability: Inference and control; - Certainty equivalence - Path integral control; the role of noise and symmetry breaking; efficient approximate computation (MC, MF, BP,...) - Examples: Double slit, delayed choice, n joint arm Bert Kappen ICML, July

10 Overview Research issues (Toussaint, 30 min.) - Learning; - efficient methods to compute value functions/cost-to-go - control under partial observability (POMDPs) Bert Kappen ICML, July

11 Discrete time control Consider the control of a discrete time deterministic dynamical system: x t+1 = x t + f(t, x t, u t ), t = 0, 1,..., T 1 x t describes the state and u t specifies the control or action at time t. Given x t=0 = x 0 and u 0:T 1 = u 0, u 1,..., u T 1, we can compute x 1:T. Define a cost for each sequence of controls: C(x 0, u 0:T 1 ) = φ(x T ) + T 1 t=0 R(t, x t, u t ) The problem of optimal control is to find the sequence u 0:T 1 that minimizes C(x 0, u 0:T 1 ). Bert Kappen ICML, July

12 Dynamic programming Find the minimal cost path from A to J. C(J) = 0, C(H) = 3, C(I) = 4 C(F ) = min(6 + C(H), 3 + C(I)) Bert Kappen ICML, July

13 Discrete time control The optimal control problem can be solved by dynamic programming. Introduce the optimal cost-to-go: J(t, x t ) = min u t:t 1 ( φ(x T ) + T 1 s=t R(s, x s, u s ) ) which solves the optimal control problem from an intermediate time t until the fixed end time T, for all intermediate states x t. Then, J(T, x) = φ(x) J(0, x) = min u 0:T 1 C(x, u 0:T 1 ) Bert Kappen ICML, July

14 Discrete time control One can recursively compute J(t, x) from J(t + 1, x) for all x in the following way: J(t, x t ) = min u t:t 1 ( = min u t ( φ(x T ) + T 1 s=t R(t, x t, u t ) + R(s, x s, u s ) min u t+1:t 1 [ ) φ(x T ) + = min u t (R(t, x t, u t ) + J(t + 1, x t+1 )) T 1 s=t+1 = min u t (R(t, x t, u t ) + J(t + 1, x t + f(t, x t, u t ))) This is called the Bellman Equation. Computes u as a function of x, t for all intermediate t and all x. R(s, x s, u s ) ]) Bert Kappen ICML, July

15 Discrete time control The algorithm to compute the optimal control u 0:T 1, the optimal trajectory x 1:T and the optimal cost is given by 1. Initialization: J(T, x) = φ(x) 2. Backwards: For t = T 1,..., 0 and for all x compute u t (x) = arg min{r(t, x, u) + J(t + 1, x + f(t, x, u))} u J(t, x) = R(t, x, u t ) + J(t + 1, x + f(t, x, u t )) 3. Forwards: For t = 0,..., T 1 compute x t+1 = x t + f(t, x t, u t (x t )) NB: the backward computation requires u t (x) for all x. Bert Kappen ICML, July

16 Replace t + 1 by t + dt with dt 0. Assume J(x, t) is smooth. J(t, x) = min u Continuous limit x t+dt = x t + f(x t, u t, t)dt C(x 0, u 0 T ) = φ(x T ) + min u t J(t, x) = min u T 0 dτr(τ, x(τ), u(τ)) (R(t, x, u)dt + J(t + dt, x + f(x, u, t)dt)) (R(t, x, u)dt + J(t, x) + t J(t, x)dt + x J(t, x)f(x, u, t)dt) (R(t, x, u) + f(x, u, t) x J(x, t)) with boundary condition J(x, T ) = φ(x). Bert Kappen ICML, July

17 Continuous limit t J(t, x) = min u with boundary condition J(x, T ) = φ(x). (R(t, x, u) + f(x, u, t) x J(x, t)) This is called the Hamilton-Jacobi-Bellman Equation. Computes the anticipated potential J(t, x) from the future potential φ(x). Bert Kappen ICML, July

18 Example: Mass on a spring The spring force F z = z towards the rest position and control force F u = u. Newton s Law with m = 1. F = z + u = m z Control problem: Given initial position and velocity z(0) = ż(0) = 0 at time t = 0, find the control path 1 < u(0 T ) < 1 such that z(t ) is maximal. Bert Kappen ICML, July

19 Introduce x 1 = z, x 2 = ż, then Example: Mass on a spring x 1 = x 2 x 2 = x 1 + u The end cost is φ(x) = x 1 ; path cost R(x, u, t) = 0. The HJB takes the form: t J = min u ( J J x 2 + x 1 + J ) u x 1 x 2 x 2 J = x 2 J x 1 + x 1 J x 2 x 2, u = sign ( J x 2 ) Bert Kappen ICML, July

20 Example: Mass on a spring The solution is J(t, x 1, x 2 ) = cos(t T )x 1 + sin(t T )x 2 + α(t) u(t, x 1, x 2 ) = sign(sin(t T )) As an example consider T = 2π. Then, the optimal control is u = 1, u = 1, 0 < t < π π < t < 2π x 1 x t Bert Kappen ICML, July

21 Pontryagin minimum principle The HJB equation is a PDE with boundary condition at future time. The PDE is solved using discretization of space and time. The solution is an optimal cost-to-go for all x and t. From this we compute the optimal trajectory and optimal control. An alternative approach is a variational approach that directly finds the optimal trajectory and optimal control. Bert Kappen ICML, July

22 Pontryagin minimum principle We can write the optimal control problem as a constrained optimization problem with independent variables u(0 T ) and x(0 T ) min φ(x(t )) + u(0 T ),x(0 T ) T 0 dtr(x(t), u(t), t) subject to the constraint ẋ = f(x, u, t) and boundary condition x(0) = x 0. Introduce the Lagrange multiplier function λ(t): C = φ(x(t )) + = φ(x(t )) + T 0 T 0 H(t, x, u, λ) = R(t, x, u) λf(t, x, u) dt [R(t, x(t), u(t)) λ(t)(f(t, x(t), u(t)) ẋ(t))] dt[ H(t, x(t), u(t), λ(t)) + λ(t)ẋ(t))] Bert Kappen ICML, July

23 Derivation PMP The solution is found by extremizing C. This gives a necessary but not sufficient condition for a solution. If we vary the action wrt to the trajectory x, the control u and the Lagrange multiplier λ, we get: δc = φ x (x(t ))δx(t ) + T 0 dt[ H x δx(t) H u δu(t) + ( H λ + ẋ(t))δλ(t) + λ(t)δẋ(t)] = (φ x (x(t )) + λ(t )) δx(t ) T [ + dt ( H x λ(t))δx(t) ] H u δu(t) + ( H λ + ẋ(t))δλ(t) 0 For instance, H x = H(t,x(t),u(t),λ(t)) x(t). Bert Kappen ICML, July

24 We can solve H u (t, x, u, λ) = 0 for u and denote the solution as Assumes H convex in u. The remaining equations are u (t, x, λ) ẋ = H λ (t, x, u (t, x, λ), λ) λ = H x (t, x, u (t, x, λ), λ) with boundary conditions Mixed boundary value problem. x(0) = x 0 λ(t ) = φ x (x(t )) Bert Kappen ICML, July

25 Again mass on a spring Problem x 1 = x 2, x 2 = x 1 + u R(x, u, t) = 0 φ(x) = x 1 Hamiltonian H(t, x, u, λ) = R(t, x, u) + λ T f(t, x, u) = λ 1 x 2 + λ 2 ( x 1 + u) H (t, x, λ) = λ 1 x 2 λ 2 x 1 λ 2 u = sign(λ 2 ) The Hamilton equations ẋ = H λ x 1 = x 2, x 2 = x 1 sign(λ 2 ) λ = H x λ 1 = λ 2, λ 2 = λ 1 with x(t = 0) = x 0 and λ(t = T ) = 1. Bert Kappen ICML, July

26 Comments The HJB method gives a sufficient (and often necessary) condition for optimality. The solution of the PDE is expensive. The PMP method provides a necessary condition for optimal control. This means that it provides candidate solutions for optimality. The PMP method is computationally less complicated than the HJB method because it does not require discretization of the state space. Optimal control in continuous space and time contains many complications related to the existence, uniqueness and smoothness of the solution, particular in the absence of noise. In the presence of noise many of these intricacies disappear. HJB generalizes to the stochastic case, PMP does not (at least not easy). Bert Kappen ICML, July

27 Stochastic differential equations Consider the random walk on the line: x t+1 = x t + ξ t ξ t = ±1 with x 0 = 0. We can compute t x t = Since x t is a sum of random variables, x t becomes Gaussian distributed with i=1 ξ i x t = x 2 t = t ξ i = 0 i=1 t i,j=1 ξ i ξ j = t t ξ 2 i + ξ i ξ j = t i=1 i,j=1,j i Note, that the fluctuations t. Bert Kappen ICML, July

28 In the continuous time limit we define Stochastic differential equations dx t = x t+dt x t = dξ with dξ an infinitesimal mean zero Gaussian variable with dξ 2 = νdt. Then d x = 0, x (t) = 0 dt d x 2 = ν, x 2 (t) = νt dt 1 ρ(x, t x 0, 0) = exp ( (x x 0) 2 ) 2πνt 2νt Bert Kappen ICML, July

29 Stochastic optimal control Consider a stochastic dynamical system dx = f(t, x, u)dt + dξ dξ Gaussian noise dξ i dξ j = ν ij (t, x, u)dt. The cost becomes an expectation: C(t, x, u(t T )) = φ(x(t )) + T t dτ R(t, x(t), u(t)) over all stochastic trajectories starting at x with control path u(t T ). Note, that u(t) as part of u(t T ) is used at time t. Next move to x + dx and repeat the optimization. Bert Kappen ICML, July

30 We obtain the Bellman recursion Stochastic optimal control J(t, x t ) = min R(t, x t, u t ) + J(t + dt, x t+dt ) u t J(t + dt, x t+dt ) = dx t+dt N (x t+dt x t, νdt)j(t + dt, x t+dt ) = J(t, x t ) + dt t J(t, x t ) + dx x J(t, x t ) dx 2 2 xj(t, x t ) dx = f(x, u, t)dt dx 2 = ν(t, x, u)dt Thus, t J(t, x) = min u with boundary condition J(x, T ) = φ(x). ( R(t, x, u) + f(x, u, t) x J(x, t) + 1 ) 2 ν(t, x, u) 2 xj(x, t) Bert Kappen ICML, July

31 Linear Quadratic control The dynamics is linear dx = [A(t)x + B(t)u + b(t)]dt + m (C j (t)x + D j (t)u + σ j (t))dξ j, j=1 dξj dξ j = δjj dt The cost function is quadratic φ(x) = 1 2 xt Gx R(x, u, t) = 1 2 xt Q(t)x + u T S(t)x ut R(t)u In this case the optimal cost-to-go is quadratic in x: J(t, x) = 1 2 xt P (t)x + α T (t)x + β(t) u(t) = Ψ(t)x(t) ψ(t) Bert Kappen ICML, July

32 Substitution in the HJB equation yields ODEs for P, α, β: P = P A + A T P + mx C T j P C j + Q Ŝ T ˆR 1 Ŝ j=1 α = [A B ˆR 1 Ŝ] T α + β = 1 2 ˆR = R + mx [C j D j ˆR 1 Ŝ] T P σ j + P b j=1 2 p ˆRψ α T b 1 2 mx D T j P D j j=1 Ŝ = B T P + S + mx σ T j P σ j j=1 mx D T j P C j j=1 Ψ = ˆR 1 Ŝ ψ = ˆR 1 (B T α + mx D T j P σ j) j=1 with P (t f ) = G and α(t f ) = β(t f ) = 0. Bert Kappen ICML, July

33 Example Find the optimal control for the dynamics dx = (x + u)dt + dξ, dξ 2 = νdt with end cost φ(x) = 0 and path cost R(x, u) = 1 2 (x2 + u 2 ). The Ricatti equations reduce to P = 2P + 1 P 2 α = 0 β = 1 2 νp P β with P (T ) = α(t ) = β(t ) = 0 and u(x, t) = P (t)x t Bert Kappen ICML, July

34 Comments Note, that in the last example the optimal control is independent of ν, i.e. optimal stochastic control equals optimal deterministic control. This is true in general for non-multiplicative noise (C j = D j = 0). Bert Kappen ICML, July

35 Learning What happens if (part of) the state is not observed? For instance, - As a result of measurement error we do not know x t but p(x t y 0:t ) - We do not know the parameters of the dynamics - We do not know the cost/rewards (RL case) Bert Kappen ICML, July

36 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. Bert Kappen ICML, July

37 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. When the next day arrives, you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes... Bert Kappen ICML, July

38 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. When the next day arrives, you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes.... V t - The learning phase takes forever. - Mix of exploration and optimizing (RL: actor-critic, E 3,...) - Learning is not part of the control problem t Bert Kappen ICML, July

39 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. When the next day arrives, you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes.... V t - The learning phase takes forever. - Mix of exploration and optimizing (RL: actor-critic, E 3,...) - Learning is not part of the control problem t Bert Kappen ICML, July

40 Finite horizon learning Imagine instead life as we know it. It is finite and we have only one life. Aim is to maximize accumulated reward. This requires to plan your learning! - At t = 0, action is useless. - At t = T, learning is useless. learning action t Problem of inference and control. Bert Kappen ICML, July

41 As an example, consider the problem Inference and control dx = αudt + dξ with α unobserved and x observed. with α unobserved and x observed. Path cost R(x, u, t), end cost φ(x) and noise variance dξ 2 = νdt are given. The problem is that the future information that we receive about α depends on u. Each time step we observe dx and u and thus learn about α. p t+dt (α dx, u) p(dx α, u)p t (α) The solution is to augment the state space with parameters θ t (sufficient statistics) that describe p t (α) = p(α θ t ) and θ 0 known. Then with α = ±1, p t (α = 1) = σ(θ t ): dθ = u ν dx = u (αudt + dξ) ν NB: In forward pass dθ = F (dx), thus θ also observed. Bert Kappen ICML, July

42 With z t = (x t, θ t ) we obtain a standard HJB: t J(t, z)dt = min u with boundary condition J(z, T ) = φ(x). ( R(t, z, u)dt + dz z z J(z, t) + 1 dz 2 ) 2 z 2 zj(z, t) The expectation values are conditioned on (x t, θ t ) and are averages over p(α θ t ) and the Gaussian noise, cf. dx x,θ = αudt + dξ x,θ = ᾱudt ᾱ = dαp(α θ)α <dz> z <dz 2> z J(z,t+dt) t t+dt t f Bert Kappen ICML, July

43 Certainty equivalence An important special case of a partial observable control problem is the Kalman filter (y observed, x not observed). dx = (x + u)dt + dξ y = x + η The cost is quadratic in x and u, for instance The optimal control is u(x, t). C(x t, t, u t:t ) = T τ=t 1 2 (x2 τ + u 2 τ) When x t is not observed, we can compute p(x t y 0:t ) using Kalman filtering and the optimal control minimizes C KF (y 0:t, t, u t:t ) = dx t p(x t y 0:t )C(x t, t, u t:t ) Bert Kappen ICML, July

44 Since p(x t y 0:t ) = N (x t µ t, σ 2 t ) is Gaussian and C KF (y 0:t, t, u t:t ) = dx t C(x t, t, u t:t )N (x t µ t, σt 2 ) = = T τ=t 1 2 u2 τ + T x 2 τ τ=t µ t,σ t = C(µ t, t, u t:t ) (T t)σ2 t The first term is identical to the observed case with x t µ t. The second term does not depend on u and thus does not affect the optimal control. The optimal control for the Kalman filter is identical to the observed case with x t replaced by µ t : u KF (y 0:t, t) = u(µ t, t) Bert Kappen ICML, July

45 Summary For infinite time problems, learning is a meta problem with a time scale unrelated to the horizon time. A finite time partial observable (or adaptive) control problem is in general equivalent to an observable non-adaptive control problem in an extended state space. The partial observable case is generally more complex than the observable case. For instance a LQ problem with unknown parameters is not LQ. For a Kalman filter with unobserved states and known parameters, the partial observability does not affect the optimal control law (Certainty equivalence). Learning can be done either in the maximum likelihood sense or in a full Bayesian way. Bert Kappen ICML, July

46 Path integral control Solving the PDE is hard. We consider a special case that can be solved. dx = (f(x, t) + u)dt + dξ R(x, u, t) = V (x, t) ut Ru and φ(x) arbitrary. The stochastic HJB equation becomes: t J = min u ( 1 2 ut Ru + V + (f + u) T x J + 1 ) 2 Tr(ν 2 xj) = 1 2 ( xj) T R 1 ( x J) + V + f T x J Tr(ν 2 xj) u = R 1 x J(x, t) must be solved backward in time with boundary condition J(T, x) = φ(x). Bert Kappen ICML, July

47 Closed form solution If we further assume that R 1 = λν for some scalar λ then we can solve J in closed form. The solution contains the following steps: Substitute J = λ log Ψ in the HJB equations. Because of the relation R 1 = λν the terms quadratic in Ψ cancel and only linear terms remain. t Ψ = HΨ, H = V λ + f x ν 2 x This equation must be solved backward in time with boundary condition Ψ(x, T ) = exp( φ(x)/λ). The linearity allows us to reverse the direction of computation, replacing it by a diffusion process, in the following way. Bert Kappen ICML, July

48 Closed form solution Let ρ(y, τ x, t) describe a diffusion process defined by the Fokker-Planck equation τ ρ = V ν ρ y(fρ) ν 2 yρ = H ρ (1) with ρ(y, t x, t) = δ(y x). Define A(x, t) = dyρ(y, τ x, t)ψ(y, τ). It is easy to see by using the equations of motions for Ψ and ρ that A(x, t) is independent of τ. Evaluating A(x, t) for τ = t yields A(x, t) = Ψ(x, t). Evaluating A(x, t) for τ = T yields A(x, t) = dyρ(y, T x, t)ψ(y, T ). Thus, A(x, t) = A(x, T ) Ψ(x, t) = dyρ(y, T x, t) exp( φ(y)/ν) Bert Kappen ICML, July

49 The diffusion equation Forward sampling of the diffusion process τ ρ = V λ ρ y(fρ) ν 2 yρ (2) can be sampled as dx = f(x, t)dt + dξ x = x + dx, with probability 1 V (x, t)dt/λ x i =, with probability V (x, t)dt/λ Bert Kappen ICML, July

50 Forward sampling of the diffusion process We can estimate Ψ(x, t) = dyρ(y, T x, t) exp( φ(y)/λ) 1 N exp( φ(x i (T ))/λ) i alive by computing N trajectories x i (t T ), i = 1,..., N. Alive denotes the subset of trajectories that do not get killed along the way by the operation. Bert Kappen ICML, July

51 The diffusion process The diffusion process can be written as a path integral: ρ(y, T x, t) = [dx] y x exp ( 1 ) ν S path(x(t T )) S path (x(t T )) = T t dτ 1 2 (ẋ(τ) f(x(τ), τ))2 + V (x(τ), τ) x y t t f Bert Kappen ICML, July

52 The path integral formulation Ψ(x, t) = = ( dyρ(y, T x, t) exp φ(x) ) ν [dx] x exp ( 1ν ) S(x(t T )) S(x(t T )) = S path (x(t T ) + φ(x(t )) Ψ is a partition sum and J = ν log Ψ therefore can be interpreted as a free energy. S is the energy of a path and ν the temperature. The corresponding probability distribution is p(x(t T ) x, t) = 1 ( Ψ(x, t) exp 1ν ) S(x(t T )) Bert Kappen ICML, July

53 An example: double slit 8 6 dx = udt + dξ C = 1 2 x(t )2 + T 0 dτ 1 2 u(τ)2 + V (x, t) V (x, t = 1) implements a slit at an intermediate time t = 1. Ψ(x, t) = can be solved in closed form. dyρ(y, T x, t)ψ(y, T ) J t=0 t=0.99 t=1.01 t= x Bert Kappen ICML, July

54 MC sampling on double slit ( 1 J(x, t) ν log N ) N exp( φ(x i T )/ν) i= MC Exact J 4 J x x (a) Naive sample paths (b) J(x, t = 0) by naive sampling (n = ). (c) J(x, t = 0) by Laplace approximation and importance sampling (n = 100). Bert Kappen ICML, July

55 The delayed choice If we go further back in time we encounter a new phenomenon: a delayed choice when to decide. When slit size goes to zero, J is given by J(x, T ) = ν log = 1 T dyρ(y x)e φ(y)/ν ( 1 2 x2 νt log 2 cosh x ) νt J(x,t) T=2 T=1 T=0.5 where T the time to reach the slits. The expression between brackets is a typical free energy with temperature νt x Symmetry breaking at νt = 1 separates two qualitatively different behaviors. Bert Kappen ICML, July

56 The delayed choice 2 stochastic 2 deterministic The timing of the decision, that is when the automaton decides to go left or right, is the consequence of spontaneous symmetry breaking. Bert Kappen ICML, July

57 N joint arm in a plane dθ i = u i dt + dξ i, i = 1,..., n C = φ(x n ( θ)) + dτ 1 2 u(τ)2 u i = θ i θ i T t, i = 1,..., n ρ(θ, T θ 0, t) exp( φ(x n ( θ))) θ from uncontrolled penalized diffusion. Variational approximation. Bert Kappen ICML, July

58 Other path integral control issues Multiple agents coordination Relation Reinforcement learning and learning. Realistic robotics applications; Non-differentiable cost functions (obstacles). Bert Kappen ICML, July

59 Summary Deterministic control can be solved by - HJB, a PDE - PMP, two coupled ODEs with mixed boundary conditions Stochastic control can be solved by - HJB in general - Ricatti equation for sufficient statistics in LQ case - Path integral is LQ control with arbitrary cost and dynamics - RL is special case Learning, (PO states or parameter values) - Decoupled from control in the RL case - Joint inference and control typically harder than control only - For Kalman filters the PO is irrelevant (certainty equivalence) Bert Kappen ICML, July

60 Summary The PI control problems provides novel link to machine learning: - statistical physics - symmetry breaking, phase transitions - efficient computation (MCMC, BP, MF, EP) Bert Kappen ICML, July

61 Further reading Check out the ICML tutorial paper and references there and other general references at the tutorial web page Bert Kappen ICML, July

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands Bert Kappen Examples of control tasks Motor control Bert Kappen Pascal workshop, 27-29 May

More information

Stochastic optimal control theory

Stochastic optimal control theory Stochastic optimal control theory ICML, Helsinki 8 tutorial H.J. Kappen, Radboud University, Nijmegen, the Netherlands July 4, 8 Abstract Control theory is a mathematical description of how to act optimally

More information

Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems

Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems Wim Wiegerinck Bart van den Broek Bert Kappen SNN, Radboud University Nijmegen 6525 EZ Nijmegen, The Netherlands {w.wiegerinck,b.vandenbroek,b.kappen}@science.ru.nl

More information

An introduction to stochastic control theory, path integrals and reinforcement learning

An introduction to stochastic control theory, path integrals and reinforcement learning An introduction to stochastic control theory, path integrals and reinforcement learning Hilbert J. Kappen Department of Biophysics, Radboud University, Geert Grooteplein 21, 6525 EZ Nijmegen Abstract.

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

A path integral approach to agent planning

A path integral approach to agent planning A path integral approach to agent planning Hilbert J. Kappen Department of Biophysics Radboud University Nijmegen, The Netherlands b.kappen@science.ru.nl Wim Wiegerinck B. van den Broek Department of Biophysics

More information

Latent state estimation using control theory

Latent state estimation using control theory Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018 EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Stochastic and Adaptive Optimal Control

Stochastic and Adaptive Optimal Control Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function: Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic

More information

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4. Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Solution of Stochastic Optimal Control Problems and Financial Applications

Solution of Stochastic Optimal Control Problems and Financial Applications Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Static and Dynamic Optimization (42111)

Static and Dynamic Optimization (42111) Static and Dynamic Optimization (421) Niels Kjølstad Poulsen Build. 0b, room 01 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark Email:

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Advanced Mechatronics Engineering

Advanced Mechatronics Engineering Advanced Mechatronics Engineering German University in Cairo 21 December, 2013 Outline Necessary conditions for optimal input Example Linear regulator problem Example Necessary conditions for optimal input

More information

Pontryagin s maximum principle

Pontryagin s maximum principle Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin

More information

Linear Quadratic Optimal Control Topics

Linear Quadratic Optimal Control Topics Linear Quadratic Optimal Control Topics Finite time LQR problem for time varying systems Open loop solution via Lagrange multiplier Closed loop solution Dynamic programming (DP) principle Cost-to-go function

More information

arxiv: v3 [math.oc] 18 Jan 2012

arxiv: v3 [math.oc] 18 Jan 2012 Optimal control as a graphical model inference problem Hilbert J. Kappen Vicenç Gómez Manfred Opper arxiv:0901.0633v3 [math.oc] 18 Jan 01 Abstract We reformulate a class of non-linear stochastic optimal

More information

Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems.

Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems. ES22 Lecture Notes #11 Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems. J = φ(x( ) + L(x,u,t)dt ; x= f(x,u,t)

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

arxiv: v1 [cs.lg] 20 Sep 2010

arxiv: v1 [cs.lg] 20 Sep 2010 Approximate Inference and Stochastic Optimal Control Konrad Rawlik 1, Marc Toussaint 2, and Sethu Vijayakumar 1 1 Statistical Machine Learning and Motor Control Group, University of Edinburgh 2 Machine

More information

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67 CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Chapter2 p. 1/67 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Main Purpose: Introduce the maximum principle as a necessary condition to be satisfied by any optimal

More information

Steady State Kalman Filter

Steady State Kalman Filter Steady State Kalman Filter Infinite Horizon LQ Control: ẋ = Ax + Bu R positive definite, Q = Q T 2Q 1 2. (A, B) stabilizable, (A, Q 1 2) detectable. Solve for the positive (semi-) definite P in the ARE:

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations Simo Särkkä Aalto University, Finland November 18, 2014 Simo Särkkä (Aalto) Lecture 4: Numerical Solution of SDEs November

More information

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &

More information

Robust control and applications in economic theory

Robust control and applications in economic theory Robust control and applications in economic theory In honour of Professor Emeritus Grigoris Kalogeropoulos on the occasion of his retirement A. N. Yannacopoulos Department of Statistics AUEB 24 May 2013

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Department of Biomedical Engineering and Computational Science Aalto University April 28, 2010 Contents 1 Multiple Model

More information

State Space Representation of Gaussian Processes

State Space Representation of Gaussian Processes State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)

More information

Reflected Brownian Motion

Reflected Brownian Motion Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide

More information

Linearly-Solvable Stochastic Optimal Control Problems

Linearly-Solvable Stochastic Optimal Control Problems Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

PROBABILISTIC REASONING OVER TIME

PROBABILISTIC REASONING OVER TIME PROBABILISTIC REASONING OVER TIME In which we try to interpret the present, understand the past, and perhaps predict the future, even when very little is crystal clear. Outline Time and uncertainty Inference:

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Switching Regime Estimation

Switching Regime Estimation Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms

More information

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011 Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance

More information

Computational Issues in Nonlinear Dynamics and Control

Computational Issues in Nonlinear Dynamics and Control Computational Issues in Nonlinear Dynamics and Control Arthur J. Krener ajkrener@ucdavis.edu Supported by AFOSR and NSF Typical Problems Numerical Computation of Invariant Manifolds Typical Problems Numerical

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

Optimal Control Theory

Optimal Control Theory Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal

More information

Chap. 1. Some Differential Geometric Tools

Chap. 1. Some Differential Geometric Tools Chap. 1. Some Differential Geometric Tools 1. Manifold, Diffeomorphism 1.1. The Implicit Function Theorem ϕ : U R n R n p (0 p < n), of class C k (k 1) x 0 U such that ϕ(x 0 ) = 0 rank Dϕ(x) = n p x U

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Control Theory in Physics and other Fields of Science

Control Theory in Physics and other Fields of Science Michael Schulz Control Theory in Physics and other Fields of Science Concepts, Tools, and Applications With 46 Figures Sprin ger 1 Introduction 1 1.1 The Aim of Control Theory 1 1.2 Dynamic State of Classical

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Dynamic Programming with Applications. Class Notes. René Caldentey Stern School of Business, New York University

Dynamic Programming with Applications. Class Notes. René Caldentey Stern School of Business, New York University Dynamic Programming with Applications Class Notes René Caldentey Stern School of Business, New York University Spring 2011 Prof. R. Caldentey Preface These lecture notes are based on the material that

More information

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28 OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Theory and Applications of Constrained Optimal Control Proble

Theory and Applications of Constrained Optimal Control Proble Theory and Applications of Constrained Optimal Control Problems with Delays PART 1 : Mixed Control State Constraints Helmut Maurer 1, Laurenz Göllmann 2 1 Institut für Numerische und Angewandte Mathematik,

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Overview of the Seminar Topic

Overview of the Seminar Topic Overview of the Seminar Topic Simo Särkkä Laboratory of Computational Engineering Helsinki University of Technology September 17, 2007 Contents 1 What is Control Theory? 2 History

More information

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations Simo Särkkä Aalto University Tampere University of Technology Lappeenranta University of Technology Finland November

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms

Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms ECE785: Hybrid Systems:Theory and Applications Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms Wei Zhang Assistant Professor Department of Electrical

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

Problem 1 Cost of an Infinite Horizon LQR

Problem 1 Cost of an Infinite Horizon LQR THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K # 5 Ahmad F. Taha October 12, 215 Homework Instructions: 1. Type your solutions in the LATEX homework

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Closed-Loop Impulse Control of Oscillating Systems

Closed-Loop Impulse Control of Oscillating Systems Closed-Loop Impulse Control of Oscillating Systems A. N. Daryin and A. B. Kurzhanski Moscow State (Lomonosov) University Faculty of Computational Mathematics and Cybernetics Periodic Control Systems, 2007

More information

Static and Dynamic Optimization (42111)

Static and Dynamic Optimization (42111) Static and Dynamic Optimization (42111) Niels Kjølstad Poulsen Build. 303b, room 016 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic

More information

Tonelli Full-Regularity in the Calculus of Variations and Optimal Control

Tonelli Full-Regularity in the Calculus of Variations and Optimal Control Tonelli Full-Regularity in the Calculus of Variations and Optimal Control Delfim F. M. Torres delfim@mat.ua.pt Department of Mathematics University of Aveiro 3810 193 Aveiro, Portugal http://www.mat.ua.pt/delfim

More information

Joint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018

Joint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018 EXTENDED EULER-LAGRANGE AND HAMILTONIAN CONDITIONS IN OPTIMAL CONTROL OF SWEEPING PROCESSES WITH CONTROLLED MOVING SETS BORIS MORDUKHOVICH Wayne State University Talk given at the conference Optimization,

More information

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10) Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must

More information

EE291E/ME 290Q Lecture Notes 8. Optimal Control and Dynamic Games

EE291E/ME 290Q Lecture Notes 8. Optimal Control and Dynamic Games EE291E/ME 290Q Lecture Notes 8. Optimal Control and Dynamic Games S. S. Sastry REVISED March 29th There exist two main approaches to optimal control and dynamic games: 1. via the Calculus of Variations

More information

New Physical Principle for Monte-Carlo simulations

New Physical Principle for Monte-Carlo simulations EJTP 6, No. 21 (2009) 9 20 Electronic Journal of Theoretical Physics New Physical Principle for Monte-Carlo simulations Michail Zak Jet Propulsion Laboratory California Institute of Technology, Advance

More information

MATH 220: MIDTERM OCTOBER 29, 2015

MATH 220: MIDTERM OCTOBER 29, 2015 MATH 22: MIDTERM OCTOBER 29, 25 This is a closed book, closed notes, no electronic devices exam. There are 5 problems. Solve Problems -3 and one of Problems 4 and 5. Write your solutions to problems and

More information

Policy Gradient Methods. February 13, 2017

Policy Gradient Methods. February 13, 2017 Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem Nonlinear representation, backward SDEs, and application to the Principal-Agent problem Ecole Polytechnique, France April 4, 218 Outline The Principal-Agent problem Formulation 1 The Principal-Agent problem

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Lecture 5 Linear Quadratic Stochastic Control

Lecture 5 Linear Quadratic Stochastic Control EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 15: Nonlinear Systems and Lyapunov Functions Overview Our next goal is to extend LMI s and optimization to nonlinear

More information

Optimal Control. Macroeconomics II SMU. Ömer Özak (SMU) Economic Growth Macroeconomics II 1 / 112

Optimal Control. Macroeconomics II SMU. Ömer Özak (SMU) Economic Growth Macroeconomics II 1 / 112 Optimal Control Ömer Özak SMU Macroeconomics II Ömer Özak (SMU) Economic Growth Macroeconomics II 1 / 112 Review of the Theory of Optimal Control Section 1 Review of the Theory of Optimal Control Ömer

More information

Langevin Methods. Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 10 D Mainz Germany

Langevin Methods. Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 10 D Mainz Germany Langevin Methods Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 1 D 55128 Mainz Germany Motivation Original idea: Fast and slow degrees of freedom Example: Brownian motion Replace

More information

Lecture 4: Introduction to stochastic processes and stochastic calculus

Lecture 4: Introduction to stochastic processes and stochastic calculus Lecture 4: Introduction to stochastic processes and stochastic calculus Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London

More information

State Estimation using Moving Horizon Estimation and Particle Filtering

State Estimation using Moving Horizon Estimation and Particle Filtering State Estimation using Moving Horizon Estimation and Particle Filtering James B. Rawlings Department of Chemical and Biological Engineering UW Math Probability Seminar Spring 2009 Rawlings MHE & PF 1 /

More information

MATH 220: Problem Set 3 Solutions

MATH 220: Problem Set 3 Solutions MATH 220: Problem Set 3 Solutions Problem 1. Let ψ C() be given by: 0, x < 1, 1 + x, 1 < x < 0, ψ(x) = 1 x, 0 < x < 1, 0, x > 1, so that it verifies ψ 0, ψ(x) = 0 if x 1 and ψ(x)dx = 1. Consider (ψ j )

More information

Stochastic Gradient Descent in Continuous Time

Stochastic Gradient Descent in Continuous Time Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m

More information

Lecture Notes: (Stochastic) Optimal Control

Lecture Notes: (Stochastic) Optimal Control Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be

More information

Lecture 4 Continuous time linear quadratic regulator

Lecture 4 Continuous time linear quadratic regulator EE363 Winter 2008-09 Lecture 4 Continuous time linear quadratic regulator continuous-time LQR problem dynamic programming solution Hamiltonian system and two point boundary value problem infinite horizon

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information