Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Similar documents
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Pontryagin s maximum principle

Linearly-Solvable Stochastic Optimal Control Problems

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Optimal Control Theory

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Inverse Optimality Design for Biological Movement Systems

Gaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Bayesian Decision Theory in Sensorimotor Control

Trajectory-based optimization

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Solution of Stochastic Optimal Control Problems and Financial Applications

Fast-slow systems with chaotic noise

Fast-slow systems with chaotic noise

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Lecture 5 Linear Quadratic Stochastic Control

Numerical schemes of resolution of stochastic optimal control HJB equation

Stochastic Differential Equations.

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

Mean-Field Games with non-convex Hamiltonian

Brownian Motion. An Undergraduate Introduction to Financial Mathematics. J. Robert Buchanan. J. Robert Buchanan Brownian Motion

Homogenization with stochastic differential equations

Stochastic and Adaptive Optimal Control

Stochastic Gradient Descent in Continuous Time

Deterministic Dynamic Programming

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Qualitative behaviour of numerical methods for SDEs and application to homogenization

Nonparametric Drift Estimation for Stochastic Differential Equations

Higher order weak approximations of stochastic differential equations with and without jumps

Introduction to Algorithmic Trading Strategies Lecture 4

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

Discretization of SDEs: Euler Methods and Beyond

1. Stochastic Process

Introduction Optimality and Asset Pricing

Control of Dynamical System

Sloppy derivations of Ito s formula and the Fokker-Planck equations

Gaussian processes for inference in stochastic differential equations

Numerical Optimal Control Overview. Moritz Diehl

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations

Numerical Integration of SDEs: A Short Tutorial

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Exercises. T 2T. e ita φ(t)dt.

1. Stochastic Processes and filtrations

Path Integral Stochastic Optimal Control for Reinforcement Learning

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Weak Convergence of Numerical Methods for Dynamical Systems and Optimal Control, and a relation with Large Deviations for Stochastic Equations

Introduction to multiscale modeling and simulation. Explicit methods for ODEs : forward Euler. y n+1 = y n + tf(y n ) dy dt = f(y), y(0) = y 0

Stochastic Integration and Stochastic Differential Equations: a gentle introduction

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

A Concise Course on Stochastic Partial Differential Equations

Lecture 4: Introduction to stochastic processes and stochastic calculus

Stochastic optimal control with rough paths

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Selected HW Solutions

A random perturbation approach to some stochastic approximation algorithms in optimization.

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

Finite element approximation of the stochastic heat equation with additive noise

Weak solutions to Fokker-Planck equations and Mean Field Games

Introduction to Algorithmic Trading Strategies Lecture 3

The Smoluchowski-Kramers Approximation: What model describes a Brownian particle?

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations

A new approach for investment performance measurement. 3rd WCMF, Santa Barbara November 2009

Reflected Brownian Motion

Stochastic differential equations in neuroscience

Continuous dependence estimates for the ergodic problem with an application to homogenization

Regularization by noise in infinite dimensions

Interest Rate Models:

Geometric projection of stochastic differential equations

Rough Burgers-like equations with multiplicative noise

This is a Gaussian probability centered around m = 0 (the most probable and mean position is the origin) and the mean square displacement m 2 = n,or

2. As we shall see, we choose to write in terms of σ x because ( X ) 2 = σ 2 x.

Kolmogorov Equations and Markov Processes

Lecture 6: Bayesian Inference in SDE Models

Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

STOCHASTIC PERRON S METHOD AND VERIFICATION WITHOUT SMOOTHNESS USING VISCOSITY COMPARISON: OBSTACLE PROBLEMS AND DYNKIN GAMES

Robust control and applications in economic theory

Handbook of Stochastic Methods

Malliavin Calculus in Finance

Introduction to Reachability Somil Bansal Hybrid Systems Lab, UC Berkeley

Viscosity Solutions for Dummies (including Economists)

Chap. 1. Some Differential Geometric Tools

Stochastic Homogenization for Reaction-Diffusion Equations

Xt i Xs i N(0, σ 2 (t s)) and they are independent. This implies that the density function of X t X s is a product of normal density functions:

Stochastic differential equation models in biology Susanne Ditlevsen

04. Random Variables: Concepts

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p

PDEs in Image Processing, Tutorials

Steady State Kalman Filter

Lecture 3: Hamilton-Jacobi-Bellman Equations. Distributional Macroeconomics. Benjamin Moll. Part II of ECON Harvard University, Spring

DUBLIN CITY UNIVERSITY

Transcription:

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 1 / 16

Continuous-time formulation Notation and terminology: x (t) 2 R n u (t) 2 R m ω (t) 2 R k dx = f (x, u) dt + G (x, u) dω Σ (x, u) = G (x, u) G (x, u) T ` (x, u) 0 q T (x) 0 π (x) 2 R m v π (x) 0 π (x), v (x) state vector control vector Brownian motion (integral of white noise) continuous-time dynamics noise covariance cost for choosing control u in state x (optional) scalar cost at terminal states x 2 T control law value/cost-to-go function optimal control law and its value function Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 2 / 16

Stochastic differential equations and integrals Ito diffusion / stochastic differential equation (SDE): dx = f (x) dt + g (x) dω This cannot be written as ẋ = f (x) + g (x) ω because ω does not exist. The SDE means that the time-integrals of the two sides are equal: x (T) x (0) = Z T 0 Z T f (x (t)) dt + g (x (t)) dω (t) 0 The last term is an Ito integral. For an Ito process y (t) adapted to ω (t), i.e. depending on the sample path only up to time t, this integral is Definition (Ito integral) Z T y (t) dω (t), lim N 1 0 N! i=0 y (t i) (ω (t i+1 ) ω (t i )) 0=t 0 <t 1 <<t N =T Replacing y (t i ) with y ((t i+1 + t i ) /2) yields the Stratonovich integral. Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 3 / 16

Stochastic chain rule and integration by parts A twice-differentiable function a (x) of an Ito diffusion dx = f (x) dt + g (x) dω is an Ito process (not necessarily a diffusion) which satisfies: Lemma (Ito) da (x (t)) = a 0 (x (t)) dx (t) + 1 2 a00 (x (t)) g (x (t)) 2 dt This is the stochastic version of the chain rule. Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 4 / 16

Stochastic chain rule and integration by parts A twice-differentiable function a (x) of an Ito diffusion dx = f (x) dt + g (x) dω is an Ito process (not necessarily a diffusion) which satisfies: Lemma (Ito) da (x (t)) = a 0 (x (t)) dx (t) + 1 2 a00 (x (t)) g (x (t)) 2 dt This is the stochastic version of the chain rule. There is also a stochastic version of integration by parts: x (T) y (T) x (0) y (0) = Z T 0 Z T x (t) dy (t) + y (t) dx (t) + [x, y] T 0 The last term (which would be 0 if x (t) or y (t) were differentiable) is Definition (quadratic covariation) [x, y] T, lim N 1 N! i=0 (x (t i+1) x (t i )) (y (t i+1 ) y (t i )) 0=t 0 <t 1 <<t N =T For a diffusion with constant noise amplitude we have [x, x] T = g 2 T. Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 4 / 16

Forward and backward equations, generator Let p (y, sjx, t), s t denote the transition probability density under the Ito diffusion dx = f (x) dt + g (x) dω. Then p satisfies the following PDEs: Theorem (Kolmogorov equations) forward (FP) equation backward equation s p = y (fp) + 1 2 2 y 2 g 2 p t p = f x (p) + 1 2 g2 (p) = L [p (y, sj, t)] 2 x2 Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 5 / 16

Forward and backward equations, generator Let p (y, sjx, t), s t denote the transition probability density under the Ito diffusion dx = f (x) dt + g (x) dω. Then p satisfies the following PDEs: Theorem (Kolmogorov equations) forward (FP) equation backward equation s p = y (fp) + 1 2 2 y 2 g 2 p t p = f x (p) + 1 2 g2 (p) = L [p (y, sj, t)] 2 x2 The operator L which computes expected directional derivatives is called the generator of the stochastic process. It satisfies (in the vector case): Theorem (generator) E L [v ()] (x), x(0)=x [v (x ( ))] v (x) lim = f (x) T v x (x) + 1!0 2 tr (Σ (x) v xx (x)) Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 5 / 16

Discretizing the time axis Consider the explicit Euler discretization with time step : x (t + ) = x (t) + f (x (t), u (t)) + p G (x (t), u (t)) ε (t) where ε (t) N (0, I). The term p appears because the variance grows linearly with time. Thus the transition probability p (x 0 jx, u) is Gaussian, with mean x + f (x, u) and covariance matrix Σ (x, u). The one-step cost is ` (x, u). Now we can apply the Bellman equation (in the finite horizon setting): v (x, t) = min n ` (x, u) + E u x 0 p(jx,u) v x 0, t + o = n o min ` (x, u) + E u dn( f(x,u), Σ(x,u)) [v (x + d, t + )] Next we use the Taylor-series expansion of v... Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 6 / 16

Hamilton-Jacobi-Bellman (HJB) equation v (x + d, t + ) = v (x, t) + v t (x, t) + o 2 + +d T v x (x, t) + 1 2 dt v xx (x, t) d + o d 3 h i Using the fact that E d T Md = tr (cov [d] M) + o 2, the expectation is E d [v (x + d, t + )] = v (x, t) + v t (x, t) + o 2 + + f (x, u) T v x (x, t) + 2 tr (Σ (x, u) v xx (x, t)) Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 7 / 16

Hamilton-Jacobi-Bellman (HJB) equation v (x + d, t + ) = v (x, t) + v t (x, t) + o 2 + +d T v x (x, t) + 1 2 dt v xx (x, t) d + o d 3 h i Using the fact that E d T Md = tr (cov [d] M) + o 2, the expectation is E d [v (x + d, t + )] = v (x, t) + v t (x, t) + o 2 + Substituting in the Bellman equation, v (x, t) = min u + f (x, u) T v x (x, t) + 2 tr (Σ (x, u) v xx (x, t)) ` (x, u) + v (x, t) + vt (x, t) + o 2 + + f (x, u) T v x (x, t) + 2 tr (Σ (x, u) v xx (x, t)) Simplifying, dividing by and taking! 0 yields the HJB equation v t (x, t) = min ` (x, u) + f (x, u) T v x (x) + 1 u 2 tr (Σ (x, u) v xx (x)) Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 7 / 16

HJB equations for different problem formulations Definition (Hamiltonian) H [x, u, v ()], ` (x, u) + f (x, u) T v x (x) + 1 2 tr (Σ (x, u) v xx (x)) = ` + L [v] The HJB equations for the optimal cost-to-go v are Theorem (HJB equations) first exit 0 = min u H [x, u, v ()] v (x 2 T ) = q T (x) finite horizon v t (x, t) = min u H [x, u, v (, t)] v (x, T) = q T (x) discounted average 1 τ v (x) = min u H [x, u, v ()] c = min u H [x, u, ev ()] Discounted cost-to-go: v π (x) = E R 0 exp ( t/τ) ` (x (t), u (t)) dt. Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 8 / 16

Existence and uniqueness of solutions The HJB equation has at most one classic solution (i.e. a function which satisfies the PDE everywhere.) If a classic solution exists then it is the optimal cost-to-go function. The HJB equation may not have a classic solution; in that case the optimal cost-to-go function is non-smooth (e.g. bang-bang control.) The HJB equation always has a unique viscosity solution which is the optimal cost-to-go function. Approximation schemes based on MDP discretization (see below) are guaranteed to converge to the unique viscosity solution / optimal cost-to-go function. Most continuous function approximation schemes (which scale better) are unable to represent non-smooth solutions. All examples of non-smoothness seem to be deterministic; noise tends to smooth the optimal cost-to-go function. Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 9 / 16

Example of noise smoothing Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 10 / 16

More tractable problems Consider a restricted family of problems with dynamics and cost dx = (a (x) + B (x) u) dt + C (x) dω ` (x, u) = q (x) + 1 2 ut R (x) u For such problems the Hamiltonian can be minimized analytically w.r.t. u. Suppressing the dependence on x for clarity, we have min H = min q + 12 ut Ru + (a + Bu) T v x + 12 CC tr T v xx u u The minimum is achieved at u = min H = q + a T v x + 1 u 2 tr CC T v xx R 1 B T v x and the result is 1 2 vt x BR 1 B T v x Thus the HJB equations become 2nd-order quadratic PDEs, no longer involving the min operator. Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 11 / 16

More tractable problems (generalizations) Allowing control-multiplicative noise: Σ (x, u) = C 0 (x) C 0 (x) T + K k=1 C k (x) uu T C k (x) T The optimal control law becomes: u = R + K k=1 CT k v xxc k 1 B T v x Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 12 / 16

More tractable problems (generalizations) Allowing control-multiplicative noise: Σ (x, u) = C 0 (x) C 0 (x) T + K k=1 C k (x) uu T C k (x) T The optimal control law becomes: u = R + K k=1 CT k v xxc k 1 B T v x Allowing more general control costs: ` (x, u) = q (x) + i r (u i ), r : convex The optimal control law becomes: n o u = arg min i r (u i ) + u T B T v x = r 0 1 B T v x u Z juj r (u) = s atanh 0 w u max dw =) u = u max tanh s 1 B T v x Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 12 / 16

Pendulum example θ = k sin (θ) + u Stochastic dynamics: dx = a (x) dt + B (udt + σdω) Cost and optimal control: First-order form: x1 x = = x 2 θ θ x a (x) = 2 k sin (x 1 ) 0 B = 1 ` (x, u) = q (x) + r 2 u2 u (x) = r 1 v x2 (x) HJB equation (discounted): 1 τ v = q + x 2v x1 + k sin (x 1 ) v x2 + σ2 2 v x 2 x 2 1 2r v2 x 2 Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 13 / 16

Pendulum example continued Parameters: k = σ = r = 1, τ = 0.3, q = 1 exp 2θ 2, β = 0.99 Dicretize state space, approximate derivatives via finite differences, iterate: v (n+1) = βv (n) + (1 β) τ min H (n) u Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 14 / 16

MDP discretization Define discrete state and control spaces X (h) R n, U (h) R m and discrete time step (h), where h is a "coarseness" parameter and h! 0 corresponds to infinitely dense discretization. Construct p (h) (x 0 (h) jx (h), u (h)) s.t. Definition (local consistency) d, x 0 (h) x (h) E [d] = (h) f(x (h), u (h) ) + o( (h) ) cov [d] = (h) Σ(x (h), u (h) ) + o( (h) ) In the limit h! 0 the MDP solution v (h) converges to the solution v of the continuous problem, even when v is non-smooth (Kushner and Dupois) Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 15 / 16

Constructing the MDP For each x (h), u (h) choose vectors fv i g i=1k such that all possible next states are x 0 (h) = x (h) + hv i. Then compute p i (h) = p (h)(x (h) + hv i jx (h), u (h) ) as: Find w i, y i s.t. i w i v i = f i y i v i v T i = Σ i y i v i = 0 and set i w i = 1, w i 0 i y i = 1, y i 0 p i (h) = hw i + y i h + 1 h (h) = 2 h + 1 Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 16 / 16

Constructing the MDP For each x (h), u (h) choose vectors fv i g i=1k such that all possible next states are x 0 (h) = x (h) + hv i. Then compute p i (h) = p (h)(x (h) + hv i jx (h), u (h) ) as: Find w i, y i s.t. i w i v i = f i y i v i v T i = Σ i y i v i = 0 i w i = 1, w i 0 i y i = 1, y i 0 and set Set (h) = h and minimize Σ h i p i (h) (v i f) (v i f) T 2 s.t. i p i (h) v i = f i p i (h) = 1, p i (h) 0 p i (h) = hw i + y i h + 1 h (h) = 2 h + 1 Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 16 / 16

Constructing the MDP For each x (h), u (h) choose vectors fv i g i=1k such that all possible next states are x 0 (h) = x (h) + hv i. Then compute p i (h) = p (h)(x (h) + hv i jx (h), u (h) ) as: Find w i, y i s.t. i w i v i = f i y i v i v T i = Σ i y i v i = 0 and set i w i = 1, w i 0 i y i = 1, y i 0 p i (h) = hw i + y i h + 1 h (h) = 2 h + 1 Set (h) = h and minimize Σ h i p i (h) (v i f) (v i f) T 2 s.t. i p i (h) v i = f i p i (h) = 1, p i (h) 0 Set (h) = h and p i (h) x N (h) + hv i ; hf, hσ Emo Todorov (UW) AMATH/CSE 579, Winter 2014 Winter 2014 16 / 16