Stochastic optimal control theory
|
|
- Maryann Ellis
- 6 years ago
- Views:
Transcription
1 Stochastic optimal control theory Bert Kappen SNN Radboud University Nijmegen the Netherlands July 5, 2008 Bert Kappen
2 Introduction Optimal control theory: Optimize sum of a path cost and end cost. Result is optimal control sequence and optimal trajectory. Input: Cost function. Output: Optimal trajectory and controls. Classical control theory: what control signal should I give to move a plant to a desired state? Input: Desired state trajectory. Output: Optimal control trajectory. Bert Kappen ICML, July
3 Finite horizon (fixed horizon time) Types of optimal control problems x environment t controlled trajectory t f Dynamics and environment may depend explicitly on time. Optimal control depends explicitly on time. Bert Kappen ICML, July
4 Finite horizon (moving horizon) Types of optimal control problems t t f Dynamics and environment are static. Optimal control is time independent. Bert Kappen ICML, July
5 Finite horizon (moving horizon) Types of optimal control problems t t f t t f Dynamics and environment are static. Optimal control is time independent. Similar to RL. Bert Kappen ICML, July
6 Other types of control problems: - minimum time - infinite horizon, average reward - infinite horizon, absorbing states In addition one should distinguish: - discrete vs. continuous state - discrete vs. continuous time - observable vs. partial observable Types of optimal control problems Bert Kappen ICML, July
7 Overview Deterministic optimal control (Kappen, 30 min.) - Introduction of delayed reward problem in discrete time; - Dynamic programming solution and deterministic Bellman equations; - Solution in continuous time and states; - Example: Mass on a spring - Pontryagin maximum principle; Notion of an optimal (particle) trajectory - Again Mass on a spring Bert Kappen ICML, July
8 Overview Stochastic optimal control, discrete case (Toussaint, 40 min.) - Stochastic Bellman equation (discrete state and time) and Dynamic Programming - Reinforcement learning (exact solution, value iteration, policy improvement); Actor critic networks; - Markov decision problems and probabilistic inference; - Example: robotic motion control and planning Bert Kappen ICML, July
9 Overview Stochastic optimal control, continuous case (Kappen, 40 min.) - Stochastic differential equations - Hamilton-Jacobi-Bellman equation (continuous state and time) - LQ control, Ricatti equation; - Example of LQ control - Learning; Partial observability: Inference and control; - Certainty equivalence - Path integral control; the role of noise and symmetry breaking; efficient approximate computation (MC, MF, BP,...) - Examples: Double slit, delayed choice, n joint arm Bert Kappen ICML, July
10 Overview Research issues (Toussaint, 30 min.) - Learning; - efficient methods to compute value functions/cost-to-go - control under partial observability (POMDPs) Bert Kappen ICML, July
11 Discrete time control Consider the control of a discrete time deterministic dynamical system: x t+1 = x t + f(t, x t, u t ), t = 0, 1,..., T 1 x t describes the state and u t specifies the control or action at time t. Given x t=0 = x 0 and u 0:T 1 = u 0, u 1,..., u T 1, we can compute x 1:T. Define a cost for each sequence of controls: C(x 0, u 0:T 1 ) = φ(x T ) + T 1 t=0 R(t, x t, u t ) The problem of optimal control is to find the sequence u 0:T 1 that minimizes C(x 0, u 0:T 1 ). Bert Kappen ICML, July
12 Dynamic programming Find the minimal cost path from A to J. C(J) = 0, C(H) = 3, C(I) = 4 C(F ) = min(6 + C(H), 3 + C(I)) Bert Kappen ICML, July
13 Discrete time control The optimal control problem can be solved by dynamic programming. Introduce the optimal cost-to-go: J(t, x t ) = min u t:t 1 ( φ(x T ) + T 1 s=t R(s, x s, u s ) ) which solves the optimal control problem from an intermediate time t until the fixed end time T, for all intermediate states x t. Then, J(T, x) = φ(x) J(0, x) = min u 0:T 1 C(x, u 0:T 1 ) Bert Kappen ICML, July
14 Discrete time control One can recursively compute J(t, x) from J(t + 1, x) for all x in the following way: J(t, x t ) = min u t:t 1 ( = min u t ( φ(x T ) + T 1 s=t R(t, x t, u t ) + R(s, x s, u s ) min u t+1:t 1 [ ) φ(x T ) + = min u t (R(t, x t, u t ) + J(t + 1, x t+1 )) T 1 s=t+1 = min u t (R(t, x t, u t ) + J(t + 1, x t + f(t, x t, u t ))) This is called the Bellman Equation. Computes u as a function of x, t for all intermediate t and all x. R(s, x s, u s ) ]) Bert Kappen ICML, July
15 Discrete time control The algorithm to compute the optimal control u 0:T 1, the optimal trajectory x 1:T and the optimal cost is given by 1. Initialization: J(T, x) = φ(x) 2. Backwards: For t = T 1,..., 0 and for all x compute u t (x) = arg min{r(t, x, u) + J(t + 1, x + f(t, x, u))} u J(t, x) = R(t, x, u t ) + J(t + 1, x + f(t, x, u t )) 3. Forwards: For t = 0,..., T 1 compute x t+1 = x t + f(t, x t, u t (x t )) NB: the backward computation requires u t (x) for all x. Bert Kappen ICML, July
16 Replace t + 1 by t + dt with dt 0. Assume J(x, t) is smooth. J(t, x) = min u Continuous limit x t+dt = x t + f(x t, u t, t)dt C(x 0, u 0 T ) = φ(x T ) + min u t J(t, x) = min u T 0 dτr(τ, x(τ), u(τ)) (R(t, x, u)dt + J(t + dt, x + f(x, u, t)dt)) (R(t, x, u)dt + J(t, x) + t J(t, x)dt + x J(t, x)f(x, u, t)dt) (R(t, x, u) + f(x, u, t) x J(x, t)) with boundary condition J(x, T ) = φ(x). Bert Kappen ICML, July
17 Continuous limit t J(t, x) = min u with boundary condition J(x, T ) = φ(x). (R(t, x, u) + f(x, u, t) x J(x, t)) This is called the Hamilton-Jacobi-Bellman Equation. Computes the anticipated potential J(t, x) from the future potential φ(x). Bert Kappen ICML, July
18 Example: Mass on a spring The spring force F z = z towards the rest position and control force F u = u. Newton s Law with m = 1. F = z + u = m z Control problem: Given initial position and velocity z(0) = ż(0) = 0 at time t = 0, find the control path 1 < u(0 T ) < 1 such that z(t ) is maximal. Bert Kappen ICML, July
19 Introduce x 1 = z, x 2 = ż, then Example: Mass on a spring x 1 = x 2 x 2 = x 1 + u The end cost is φ(x) = x 1 ; path cost R(x, u, t) = 0. The HJB takes the form: t J = min u ( J J x 2 + x 1 + J ) u x 1 x 2 x 2 J = x 2 J x 1 + x 1 J x 2 x 2, u = sign ( J x 2 ) Bert Kappen ICML, July
20 Example: Mass on a spring The solution is J(t, x 1, x 2 ) = cos(t T )x 1 + sin(t T )x 2 + α(t) u(t, x 1, x 2 ) = sign(sin(t T )) As an example consider T = 2π. Then, the optimal control is u = 1, u = 1, 0 < t < π π < t < 2π x 1 x t Bert Kappen ICML, July
21 Pontryagin minimum principle The HJB equation is a PDE with boundary condition at future time. The PDE is solved using discretization of space and time. The solution is an optimal cost-to-go for all x and t. From this we compute the optimal trajectory and optimal control. An alternative approach is a variational approach that directly finds the optimal trajectory and optimal control. Bert Kappen ICML, July
22 Pontryagin minimum principle We can write the optimal control problem as a constrained optimization problem with independent variables u(0 T ) and x(0 T ) min φ(x(t )) + u(0 T ),x(0 T ) T 0 dtr(x(t), u(t), t) subject to the constraint ẋ = f(x, u, t) and boundary condition x(0) = x 0. Introduce the Lagrange multiplier function λ(t): C = φ(x(t )) + = φ(x(t )) + T 0 T 0 H(t, x, u, λ) = R(t, x, u) λf(t, x, u) dt [R(t, x(t), u(t)) λ(t)(f(t, x(t), u(t)) ẋ(t))] dt[ H(t, x(t), u(t), λ(t)) + λ(t)ẋ(t))] Bert Kappen ICML, July
23 Derivation PMP The solution is found by extremizing C. This gives a necessary but not sufficient condition for a solution. If we vary the action wrt to the trajectory x, the control u and the Lagrange multiplier λ, we get: δc = φ x (x(t ))δx(t ) + T 0 dt[ H x δx(t) H u δu(t) + ( H λ + ẋ(t))δλ(t) + λ(t)δẋ(t)] = (φ x (x(t )) + λ(t )) δx(t ) T [ + dt ( H x λ(t))δx(t) ] H u δu(t) + ( H λ + ẋ(t))δλ(t) 0 For instance, H x = H(t,x(t),u(t),λ(t)) x(t). Bert Kappen ICML, July
24 We can solve H u (t, x, u, λ) = 0 for u and denote the solution as Assumes H convex in u. The remaining equations are u (t, x, λ) ẋ = H λ (t, x, u (t, x, λ), λ) λ = H x (t, x, u (t, x, λ), λ) with boundary conditions Mixed boundary value problem. x(0) = x 0 λ(t ) = φ x (x(t )) Bert Kappen ICML, July
25 Again mass on a spring Problem x 1 = x 2, x 2 = x 1 + u R(x, u, t) = 0 φ(x) = x 1 Hamiltonian H(t, x, u, λ) = R(t, x, u) + λ T f(t, x, u) = λ 1 x 2 + λ 2 ( x 1 + u) H (t, x, λ) = λ 1 x 2 λ 2 x 1 λ 2 u = sign(λ 2 ) The Hamilton equations ẋ = H λ x 1 = x 2, x 2 = x 1 sign(λ 2 ) λ = H x λ 1 = λ 2, λ 2 = λ 1 with x(t = 0) = x 0 and λ(t = T ) = 1. Bert Kappen ICML, July
26 Comments The HJB method gives a sufficient (and often necessary) condition for optimality. The solution of the PDE is expensive. The PMP method provides a necessary condition for optimal control. This means that it provides candidate solutions for optimality. The PMP method is computationally less complicated than the HJB method because it does not require discretization of the state space. Optimal control in continuous space and time contains many complications related to the existence, uniqueness and smoothness of the solution, particular in the absence of noise. In the presence of noise many of these intricacies disappear. HJB generalizes to the stochastic case, PMP does not (at least not easy). Bert Kappen ICML, July
27 Stochastic differential equations Consider the random walk on the line: x t+1 = x t + ξ t ξ t = ±1 with x 0 = 0. We can compute t x t = Since x t is a sum of random variables, x t becomes Gaussian distributed with i=1 ξ i x t = x 2 t = t ξ i = 0 i=1 t i,j=1 ξ i ξ j = t t ξ 2 i + ξ i ξ j = t i=1 i,j=1,j i Note, that the fluctuations t. Bert Kappen ICML, July
28 In the continuous time limit we define Stochastic differential equations dx t = x t+dt x t = dξ with dξ an infinitesimal mean zero Gaussian variable with dξ 2 = νdt. Then d x = 0, x (t) = 0 dt d x 2 = ν, x 2 (t) = νt dt 1 ρ(x, t x 0, 0) = exp ( (x x 0) 2 ) 2πνt 2νt Bert Kappen ICML, July
29 Stochastic optimal control Consider a stochastic dynamical system dx = f(t, x, u)dt + dξ dξ Gaussian noise dξ i dξ j = ν ij (t, x, u)dt. The cost becomes an expectation: C(t, x, u(t T )) = φ(x(t )) + T t dτ R(t, x(t), u(t)) over all stochastic trajectories starting at x with control path u(t T ). Note, that u(t) as part of u(t T ) is used at time t. Next move to x + dx and repeat the optimization. Bert Kappen ICML, July
30 We obtain the Bellman recursion Stochastic optimal control J(t, x t ) = min R(t, x t, u t ) + J(t + dt, x t+dt ) u t J(t + dt, x t+dt ) = dx t+dt N (x t+dt x t, νdt)j(t + dt, x t+dt ) = J(t, x t ) + dt t J(t, x t ) + dx x J(t, x t ) dx 2 2 xj(t, x t ) dx = f(x, u, t)dt dx 2 = ν(t, x, u)dt Thus, t J(t, x) = min u with boundary condition J(x, T ) = φ(x). ( R(t, x, u) + f(x, u, t) x J(x, t) + 1 ) 2 ν(t, x, u) 2 xj(x, t) Bert Kappen ICML, July
31 Linear Quadratic control The dynamics is linear dx = [A(t)x + B(t)u + b(t)]dt + m (C j (t)x + D j (t)u + σ j (t))dξ j, j=1 dξj dξ j = δjj dt The cost function is quadratic φ(x) = 1 2 xt Gx R(x, u, t) = 1 2 xt Q(t)x + u T S(t)x ut R(t)u In this case the optimal cost-to-go is quadratic in x: J(t, x) = 1 2 xt P (t)x + α T (t)x + β(t) u(t) = Ψ(t)x(t) ψ(t) Bert Kappen ICML, July
32 Substitution in the HJB equation yields ODEs for P, α, β: P = P A + A T P + mx C T j P C j + Q Ŝ T ˆR 1 Ŝ j=1 α = [A B ˆR 1 Ŝ] T α + β = 1 2 ˆR = R + mx [C j D j ˆR 1 Ŝ] T P σ j + P b j=1 2 p ˆRψ α T b 1 2 mx D T j P D j j=1 Ŝ = B T P + S + mx σ T j P σ j j=1 mx D T j P C j j=1 Ψ = ˆR 1 Ŝ ψ = ˆR 1 (B T α + mx D T j P σ j) j=1 with P (t f ) = G and α(t f ) = β(t f ) = 0. Bert Kappen ICML, July
33 Example Find the optimal control for the dynamics dx = (x + u)dt + dξ, dξ 2 = νdt with end cost φ(x) = 0 and path cost R(x, u) = 1 2 (x2 + u 2 ). The Ricatti equations reduce to P = 2P + 1 P 2 α = 0 β = 1 2 νp P β with P (T ) = α(t ) = β(t ) = 0 and u(x, t) = P (t)x t Bert Kappen ICML, July
34 Comments Note, that in the last example the optimal control is independent of ν, i.e. optimal stochastic control equals optimal deterministic control. This is true in general for non-multiplicative noise (C j = D j = 0). Bert Kappen ICML, July
35 Learning What happens if (part of) the state is not observed? For instance, - As a result of measurement error we do not know x t but p(x t y 0:t ) - We do not know the parameters of the dynamics - We do not know the cost/rewards (RL case) Bert Kappen ICML, July
36 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. Bert Kappen ICML, July
37 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. When the next day arrives, you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes... Bert Kappen ICML, July
38 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. When the next day arrives, you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes.... V t - The learning phase takes forever. - Mix of exploration and optimizing (RL: actor-critic, E 3,...) - Learning is not part of the control problem t Bert Kappen ICML, July
39 Learning in RL or receding horizon Imagine eternity, and you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes. When the next day arrives, you are ordered to cook a very good meal for tomorrow. You decide to spend all of today to learn the recipes.... V t - The learning phase takes forever. - Mix of exploration and optimizing (RL: actor-critic, E 3,...) - Learning is not part of the control problem t Bert Kappen ICML, July
40 Finite horizon learning Imagine instead life as we know it. It is finite and we have only one life. Aim is to maximize accumulated reward. This requires to plan your learning! - At t = 0, action is useless. - At t = T, learning is useless. learning action t Problem of inference and control. Bert Kappen ICML, July
41 As an example, consider the problem Inference and control dx = αudt + dξ with α unobserved and x observed. with α unobserved and x observed. Path cost R(x, u, t), end cost φ(x) and noise variance dξ 2 = νdt are given. The problem is that the future information that we receive about α depends on u. Each time step we observe dx and u and thus learn about α. p t+dt (α dx, u) p(dx α, u)p t (α) The solution is to augment the state space with parameters θ t (sufficient statistics) that describe p t (α) = p(α θ t ) and θ 0 known. Then with α = ±1, p t (α = 1) = σ(θ t ): dθ = u ν dx = u (αudt + dξ) ν NB: In forward pass dθ = F (dx), thus θ also observed. Bert Kappen ICML, July
42 With z t = (x t, θ t ) we obtain a standard HJB: t J(t, z)dt = min u with boundary condition J(z, T ) = φ(x). ( R(t, z, u)dt + dz z z J(z, t) + 1 dz 2 ) 2 z 2 zj(z, t) The expectation values are conditioned on (x t, θ t ) and are averages over p(α θ t ) and the Gaussian noise, cf. dx x,θ = αudt + dξ x,θ = ᾱudt ᾱ = dαp(α θ)α <dz> z <dz 2> z J(z,t+dt) t t+dt t f Bert Kappen ICML, July
43 Certainty equivalence An important special case of a partial observable control problem is the Kalman filter (y observed, x not observed). dx = (x + u)dt + dξ y = x + η The cost is quadratic in x and u, for instance The optimal control is u(x, t). C(x t, t, u t:t ) = T τ=t 1 2 (x2 τ + u 2 τ) When x t is not observed, we can compute p(x t y 0:t ) using Kalman filtering and the optimal control minimizes C KF (y 0:t, t, u t:t ) = dx t p(x t y 0:t )C(x t, t, u t:t ) Bert Kappen ICML, July
44 Since p(x t y 0:t ) = N (x t µ t, σ 2 t ) is Gaussian and C KF (y 0:t, t, u t:t ) = dx t C(x t, t, u t:t )N (x t µ t, σt 2 ) = = T τ=t 1 2 u2 τ + T x 2 τ τ=t µ t,σ t = C(µ t, t, u t:t ) (T t)σ2 t The first term is identical to the observed case with x t µ t. The second term does not depend on u and thus does not affect the optimal control. The optimal control for the Kalman filter is identical to the observed case with x t replaced by µ t : u KF (y 0:t, t) = u(µ t, t) Bert Kappen ICML, July
45 Summary For infinite time problems, learning is a meta problem with a time scale unrelated to the horizon time. A finite time partial observable (or adaptive) control problem is in general equivalent to an observable non-adaptive control problem in an extended state space. The partial observable case is generally more complex than the observable case. For instance a LQ problem with unknown parameters is not LQ. For a Kalman filter with unobserved states and known parameters, the partial observability does not affect the optimal control law (Certainty equivalence). Learning can be done either in the maximum likelihood sense or in a full Bayesian way. Bert Kappen ICML, July
46 Path integral control Solving the PDE is hard. We consider a special case that can be solved. dx = (f(x, t) + u)dt + dξ R(x, u, t) = V (x, t) ut Ru and φ(x) arbitrary. The stochastic HJB equation becomes: t J = min u ( 1 2 ut Ru + V + (f + u) T x J + 1 ) 2 Tr(ν 2 xj) = 1 2 ( xj) T R 1 ( x J) + V + f T x J Tr(ν 2 xj) u = R 1 x J(x, t) must be solved backward in time with boundary condition J(T, x) = φ(x). Bert Kappen ICML, July
47 Closed form solution If we further assume that R 1 = λν for some scalar λ then we can solve J in closed form. The solution contains the following steps: Substitute J = λ log Ψ in the HJB equations. Because of the relation R 1 = λν the terms quadratic in Ψ cancel and only linear terms remain. t Ψ = HΨ, H = V λ + f x ν 2 x This equation must be solved backward in time with boundary condition Ψ(x, T ) = exp( φ(x)/λ). The linearity allows us to reverse the direction of computation, replacing it by a diffusion process, in the following way. Bert Kappen ICML, July
48 Closed form solution Let ρ(y, τ x, t) describe a diffusion process defined by the Fokker-Planck equation τ ρ = V ν ρ y(fρ) ν 2 yρ = H ρ (1) with ρ(y, t x, t) = δ(y x). Define A(x, t) = dyρ(y, τ x, t)ψ(y, τ). It is easy to see by using the equations of motions for Ψ and ρ that A(x, t) is independent of τ. Evaluating A(x, t) for τ = t yields A(x, t) = Ψ(x, t). Evaluating A(x, t) for τ = T yields A(x, t) = dyρ(y, T x, t)ψ(y, T ). Thus, A(x, t) = A(x, T ) Ψ(x, t) = dyρ(y, T x, t) exp( φ(y)/ν) Bert Kappen ICML, July
49 The diffusion equation Forward sampling of the diffusion process τ ρ = V λ ρ y(fρ) ν 2 yρ (2) can be sampled as dx = f(x, t)dt + dξ x = x + dx, with probability 1 V (x, t)dt/λ x i =, with probability V (x, t)dt/λ Bert Kappen ICML, July
50 Forward sampling of the diffusion process We can estimate Ψ(x, t) = dyρ(y, T x, t) exp( φ(y)/λ) 1 N exp( φ(x i (T ))/λ) i alive by computing N trajectories x i (t T ), i = 1,..., N. Alive denotes the subset of trajectories that do not get killed along the way by the operation. Bert Kappen ICML, July
51 The diffusion process The diffusion process can be written as a path integral: ρ(y, T x, t) = [dx] y x exp ( 1 ) ν S path(x(t T )) S path (x(t T )) = T t dτ 1 2 (ẋ(τ) f(x(τ), τ))2 + V (x(τ), τ) x y t t f Bert Kappen ICML, July
52 The path integral formulation Ψ(x, t) = = ( dyρ(y, T x, t) exp φ(x) ) ν [dx] x exp ( 1ν ) S(x(t T )) S(x(t T )) = S path (x(t T ) + φ(x(t )) Ψ is a partition sum and J = ν log Ψ therefore can be interpreted as a free energy. S is the energy of a path and ν the temperature. The corresponding probability distribution is p(x(t T ) x, t) = 1 ( Ψ(x, t) exp 1ν ) S(x(t T )) Bert Kappen ICML, July
53 An example: double slit 8 6 dx = udt + dξ C = 1 2 x(t )2 + T 0 dτ 1 2 u(τ)2 + V (x, t) V (x, t = 1) implements a slit at an intermediate time t = 1. Ψ(x, t) = can be solved in closed form. dyρ(y, T x, t)ψ(y, T ) J t=0 t=0.99 t=1.01 t= x Bert Kappen ICML, July
54 MC sampling on double slit ( 1 J(x, t) ν log N ) N exp( φ(x i T )/ν) i= MC Exact J 4 J x x (a) Naive sample paths (b) J(x, t = 0) by naive sampling (n = ). (c) J(x, t = 0) by Laplace approximation and importance sampling (n = 100). Bert Kappen ICML, July
55 The delayed choice If we go further back in time we encounter a new phenomenon: a delayed choice when to decide. When slit size goes to zero, J is given by J(x, T ) = ν log = 1 T dyρ(y x)e φ(y)/ν ( 1 2 x2 νt log 2 cosh x ) νt J(x,t) T=2 T=1 T=0.5 where T the time to reach the slits. The expression between brackets is a typical free energy with temperature νt x Symmetry breaking at νt = 1 separates two qualitatively different behaviors. Bert Kappen ICML, July
56 The delayed choice 2 stochastic 2 deterministic The timing of the decision, that is when the automaton decides to go left or right, is the consequence of spontaneous symmetry breaking. Bert Kappen ICML, July
57 N joint arm in a plane dθ i = u i dt + dξ i, i = 1,..., n C = φ(x n ( θ)) + dτ 1 2 u(τ)2 u i = θ i θ i T t, i = 1,..., n ρ(θ, T θ 0, t) exp( φ(x n ( θ))) θ from uncontrolled penalized diffusion. Variational approximation. Bert Kappen ICML, July
58 Other path integral control issues Multiple agents coordination Relation Reinforcement learning and learning. Realistic robotics applications; Non-differentiable cost functions (obstacles). Bert Kappen ICML, July
59 Summary Deterministic control can be solved by - HJB, a PDE - PMP, two coupled ODEs with mixed boundary conditions Stochastic control can be solved by - HJB in general - Ricatti equation for sufficient statistics in LQ case - Path integral is LQ control with arbitrary cost and dynamics - RL is special case Learning, (PO states or parameter values) - Decoupled from control in the RL case - Joint inference and control typically harder than control only - For Kalman filters the PO is irrelevant (certainty equivalence) Bert Kappen ICML, July
60 Summary The PI control problems provides novel link to machine learning: - statistical physics - symmetry breaking, phase transitions - efficient computation (MCMC, BP, MF, EP) Bert Kappen ICML, July
61 Further reading Check out the ICML tutorial paper and references there and other general references at the tutorial web page Bert Kappen ICML, July
An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands
An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands Bert Kappen Examples of control tasks Motor control Bert Kappen Pascal workshop, 27-29 May
More informationStochastic optimal control theory
Stochastic optimal control theory ICML, Helsinki 8 tutorial H.J. Kappen, Radboud University, Nijmegen, the Netherlands July 4, 8 Abstract Control theory is a mathematical description of how to act optimally
More informationStochastic Optimal Control in Continuous Space-Time Multi-Agent Systems
Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems Wim Wiegerinck Bart van den Broek Bert Kappen SNN, Radboud University Nijmegen 6525 EZ Nijmegen, The Netherlands {w.wiegerinck,b.vandenbroek,b.kappen}@science.ru.nl
More informationAn introduction to stochastic control theory, path integrals and reinforcement learning
An introduction to stochastic control theory, path integrals and reinforcement learning Hilbert J. Kappen Department of Biophysics, Radboud University, Geert Grooteplein 21, 6525 EZ Nijmegen Abstract.
More informationDeterministic Dynamic Programming
Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0
More informationA path integral approach to agent planning
A path integral approach to agent planning Hilbert J. Kappen Department of Biophysics Radboud University Nijmegen, The Netherlands b.kappen@science.ru.nl Wim Wiegerinck B. van den Broek Department of Biophysics
More informationLatent state estimation using control theory
Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationEN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018
EN530.603 Applied Optimal Control Lecture 8: Dynamic Programming October 0, 08 Lecturer: Marin Kobilarov Dynamic Programming (DP) is conerned with the computation of an optimal policy, i.e. an optimal
More informationRobotics. Control Theory. Marc Toussaint U Stuttgart
Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationStochastic and Adaptive Optimal Control
Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationOptimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:
Optimal Control Control design based on pole-placement has non unique solutions Best locations for eigenvalues are sometimes difficult to determine Linear Quadratic LQ) Optimal control minimizes a quadratic
More informationOptimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.
Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationSolution of Stochastic Optimal Control Problems and Financial Applications
Journal of Mathematical Extension Vol. 11, No. 4, (2017), 27-44 ISSN: 1735-8299 URL: http://www.ijmex.com Solution of Stochastic Optimal Control Problems and Financial Applications 2 Mat B. Kafash 1 Faculty
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationStatic and Dynamic Optimization (42111)
Static and Dynamic Optimization (421) Niels Kjølstad Poulsen Build. 0b, room 01 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark Email:
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationAdvanced Mechatronics Engineering
Advanced Mechatronics Engineering German University in Cairo 21 December, 2013 Outline Necessary conditions for optimal input Example Linear regulator problem Example Necessary conditions for optimal input
More informationPontryagin s maximum principle
Pontryagin s maximum principle Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 5 1 / 9 Pontryagin
More informationLinear Quadratic Optimal Control Topics
Linear Quadratic Optimal Control Topics Finite time LQR problem for time varying systems Open loop solution via Lagrange multiplier Closed loop solution Dynamic programming (DP) principle Cost-to-go function
More informationarxiv: v3 [math.oc] 18 Jan 2012
Optimal control as a graphical model inference problem Hilbert J. Kappen Vicenç Gómez Manfred Opper arxiv:0901.0633v3 [math.oc] 18 Jan 01 Abstract We reformulate a class of non-linear stochastic optimal
More informationTheoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems.
ES22 Lecture Notes #11 Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems. J = φ(x( ) + L(x,u,t)dt ; x= f(x,u,t)
More informationRobotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:
Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More informationarxiv: v1 [cs.lg] 20 Sep 2010
Approximate Inference and Stochastic Optimal Control Konrad Rawlik 1, Marc Toussaint 2, and Sethu Vijayakumar 1 1 Statistical Machine Learning and Motor Control Group, University of Edinburgh 2 Machine
More informationCHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67
CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Chapter2 p. 1/67 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME Main Purpose: Introduce the maximum principle as a necessary condition to be satisfied by any optimal
More informationSteady State Kalman Filter
Steady State Kalman Filter Infinite Horizon LQ Control: ẋ = Ax + Bu R positive definite, Q = Q T 2Q 1 2. (A, B) stabilizable, (A, Q 1 2) detectable. Solve for the positive (semi-) definite P in the ARE:
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationLecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations
Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations Simo Särkkä Aalto University, Finland November 18, 2014 Simo Särkkä (Aalto) Lecture 4: Numerical Solution of SDEs November
More informationRobotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning
Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &
More informationRobust control and applications in economic theory
Robust control and applications in economic theory In honour of Professor Emeritus Grigoris Kalogeropoulos on the occasion of his retirement A. N. Yannacopoulos Department of Statistics AUEB 24 May 2013
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationLecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations
Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Department of Biomedical Engineering and Computational Science Aalto University April 28, 2010 Contents 1 Multiple Model
More informationState Space Representation of Gaussian Processes
State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)
More informationReflected Brownian Motion
Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide
More informationLinearly-Solvable Stochastic Optimal Control Problems
Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationPROBABILISTIC REASONING OVER TIME
PROBABILISTIC REASONING OVER TIME In which we try to interpret the present, understand the past, and perhaps predict the future, even when very little is crystal clear. Outline Time and uncertainty Inference:
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationSwitching Regime Estimation
Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms
More informationHJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011
Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance
More informationComputational Issues in Nonlinear Dynamics and Control
Computational Issues in Nonlinear Dynamics and Control Arthur J. Krener ajkrener@ucdavis.edu Supported by AFOSR and NSF Typical Problems Numerical Computation of Invariant Manifolds Typical Problems Numerical
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationComplexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
More informationOptimal Control Theory
Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal
More informationChap. 1. Some Differential Geometric Tools
Chap. 1. Some Differential Geometric Tools 1. Manifold, Diffeomorphism 1.1. The Implicit Function Theorem ϕ : U R n R n p (0 p < n), of class C k (k 1) x 0 U such that ϕ(x 0 ) = 0 rank Dϕ(x) = n p x U
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationControl Theory in Physics and other Fields of Science
Michael Schulz Control Theory in Physics and other Fields of Science Concepts, Tools, and Applications With 46 Figures Sprin ger 1 Introduction 1 1.1 The Aim of Control Theory 1 1.2 Dynamic State of Classical
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationDynamic Programming with Applications. Class Notes. René Caldentey Stern School of Business, New York University
Dynamic Programming with Applications Class Notes René Caldentey Stern School of Business, New York University Spring 2011 Prof. R. Caldentey Preface These lecture notes are based on the material that
More informationOPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28
OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationTheory and Applications of Constrained Optimal Control Proble
Theory and Applications of Constrained Optimal Control Problems with Delays PART 1 : Mixed Control State Constraints Helmut Maurer 1, Laurenz Göllmann 2 1 Institut für Numerische und Angewandte Mathematik,
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationOverview of the Seminar Topic
Overview of the Seminar Topic Simo Särkkä Laboratory of Computational Engineering Helsinki University of Technology September 17, 2007 Contents 1 What is Control Theory? 2 History
More informationLecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations
Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations Simo Särkkä Aalto University Tampere University of Technology Lappeenranta University of Technology Finland November
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationLecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms
ECE785: Hybrid Systems:Theory and Applications Lecture Note 13:Continuous Time Switched Optimal Control: Embedding Principle and Numerical Algorithms Wei Zhang Assistant Professor Department of Electrical
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationProblem 1 Cost of an Infinite Horizon LQR
THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K # 5 Ahmad F. Taha October 12, 215 Homework Instructions: 1. Type your solutions in the LATEX homework
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationConnections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN
Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University
More informationClosed-Loop Impulse Control of Oscillating Systems
Closed-Loop Impulse Control of Oscillating Systems A. N. Daryin and A. B. Kurzhanski Moscow State (Lomonosov) University Faculty of Computational Mathematics and Cybernetics Periodic Control Systems, 2007
More informationStatic and Dynamic Optimization (42111)
Static and Dynamic Optimization (42111) Niels Kjølstad Poulsen Build. 303b, room 016 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationReal Time Stochastic Control and Decision Making: From theory to algorithms and applications
Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic
More informationTonelli Full-Regularity in the Calculus of Variations and Optimal Control
Tonelli Full-Regularity in the Calculus of Variations and Optimal Control Delfim F. M. Torres delfim@mat.ua.pt Department of Mathematics University of Aveiro 3810 193 Aveiro, Portugal http://www.mat.ua.pt/delfim
More informationJoint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018
EXTENDED EULER-LAGRANGE AND HAMILTONIAN CONDITIONS IN OPTIMAL CONTROL OF SWEEPING PROCESSES WITH CONTROLLED MOVING SETS BORIS MORDUKHOVICH Wayne State University Talk given at the conference Optimization,
More informationSubject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)
Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must
More informationEE291E/ME 290Q Lecture Notes 8. Optimal Control and Dynamic Games
EE291E/ME 290Q Lecture Notes 8. Optimal Control and Dynamic Games S. S. Sastry REVISED March 29th There exist two main approaches to optimal control and dynamic games: 1. via the Calculus of Variations
More informationNew Physical Principle for Monte-Carlo simulations
EJTP 6, No. 21 (2009) 9 20 Electronic Journal of Theoretical Physics New Physical Principle for Monte-Carlo simulations Michail Zak Jet Propulsion Laboratory California Institute of Technology, Advance
More informationMATH 220: MIDTERM OCTOBER 29, 2015
MATH 22: MIDTERM OCTOBER 29, 25 This is a closed book, closed notes, no electronic devices exam. There are 5 problems. Solve Problems -3 and one of Problems 4 and 5. Write your solutions to problems and
More informationPolicy Gradient Methods. February 13, 2017
Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationNonlinear representation, backward SDEs, and application to the Principal-Agent problem
Nonlinear representation, backward SDEs, and application to the Principal-Agent problem Ecole Polytechnique, France April 4, 218 Outline The Principal-Agent problem Formulation 1 The Principal-Agent problem
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationLecture 5 Linear Quadratic Stochastic Control
EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationLMI Methods in Optimal and Robust Control
LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 15: Nonlinear Systems and Lyapunov Functions Overview Our next goal is to extend LMI s and optimization to nonlinear
More informationOptimal Control. Macroeconomics II SMU. Ömer Özak (SMU) Economic Growth Macroeconomics II 1 / 112
Optimal Control Ömer Özak SMU Macroeconomics II Ömer Özak (SMU) Economic Growth Macroeconomics II 1 / 112 Review of the Theory of Optimal Control Section 1 Review of the Theory of Optimal Control Ömer
More informationLangevin Methods. Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 10 D Mainz Germany
Langevin Methods Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 1 D 55128 Mainz Germany Motivation Original idea: Fast and slow degrees of freedom Example: Brownian motion Replace
More informationLecture 4: Introduction to stochastic processes and stochastic calculus
Lecture 4: Introduction to stochastic processes and stochastic calculus Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London
More informationState Estimation using Moving Horizon Estimation and Particle Filtering
State Estimation using Moving Horizon Estimation and Particle Filtering James B. Rawlings Department of Chemical and Biological Engineering UW Math Probability Seminar Spring 2009 Rawlings MHE & PF 1 /
More informationMATH 220: Problem Set 3 Solutions
MATH 220: Problem Set 3 Solutions Problem 1. Let ψ C() be given by: 0, x < 1, 1 + x, 1 < x < 0, ψ(x) = 1 x, 0 < x < 1, 0, x > 1, so that it verifies ψ 0, ψ(x) = 0 if x 1 and ψ(x)dx = 1. Consider (ψ j )
More informationStochastic Gradient Descent in Continuous Time
Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m
More informationLecture Notes: (Stochastic) Optimal Control
Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be
More informationLecture 4 Continuous time linear quadratic regulator
EE363 Winter 2008-09 Lecture 4 Continuous time linear quadratic regulator continuous-time LQR problem dynamic programming solution Hamiltonian system and two point boundary value problem infinite horizon
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More information