Dynamic Consistency for Stochastic Optimal Control Problems

Similar documents
arxiv: v1 [math.oc] 20 May 2010

Deterministic Dynamic Programming

Exam 2, Solution 4. Jordan Paschke

7. Shortest Path Problems and Deterministic Finite State Systems

Fenchel-Moreau Conjugates of Inf-Transforms and Application to Stochastic Bellman Equation

Alternative Characterization of Ergodicity for Doubly Stochastic Chains

Kalman Filter and Parameter Identification. Florian Herzog

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

1 Markov decision processes

Stochastic and Adaptive Optimal Control

4th Preparation Sheet - Solutions

Positive Markov Jump Linear Systems (PMJLS) with applications

ENSC327 Communications Systems 19: Random Processes. Jie Liang School of Engineering Science Simon Fraser University

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

Game Theory Extra Lecture 1 (BoB)

Solution of Stochastic Optimal Control Problems and Financial Applications

Stochastic Shortest Path Problems

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67

Static and Dynamic Optimization (42111)

On the Convergence of Optimistic Policy Iteration

Matthew Zyskowski 1 Quanyan Zhu 2

QUALIFYING EXAM IN SYSTEMS ENGINEERING

Uncertainty quantification and systemic risk

Gaussian, Markov and stationary processes

14 - Gaussian Stochastic Processes

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Lecture 6: Bayesian Inference in SDE Models

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Stochastic Composition Optimization

Decomposability and time consistency of risk averse multistage programs

Z i Q ij Z j. J(x, φ; U) = X T φ(t ) 2 h + where h k k, H(t) k k and R(t) r r are nonnegative definite matrices (R(t) is uniformly in t nonsingular).

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

The Social Cost of Stochastic & Irreversible Climate Change

2.3 Terminology for Systems of Linear Equations

Linear Quadratic Zero-Sum Two-Person Differential Games Pierre Bernhard June 15, 2013

An Uncertain Control Model with Application to. Production-Inventory System

LECTURE 3. Last time:

Uniformly Uniformly-ergodic Markov chains and BSDEs

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain BSDEs and risk averse networks

Region of attraction approximations for polynomial dynamical systems

Extended Formulations, Lagrangian Relaxation, & Column Generation: tackling large scale applications

Exercises. T 2T. e ita φ(t)dt.

Lecture 5 Linear Quadratic Stochastic Control

Modeling and Analysis of Dynamic Systems

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Discrete-Time Finite-Horizon Optimal ALM Problem with Regime-Switching for DB Pension Plan

LMI Methods in Optimal and Robust Control

Module 05 Introduction to Optimal Control

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

ACM/CMS 107 Linear Analysis & Applications Fall 2016 Assignment 4: Linear ODEs and Control Theory Due: 5th December 2016

Chain differentials with an application to the mathematical fear operator

Formula Sheet for Optimal Control

Game Theory with Information: Introducing the Witsenhausen Intrinsic Model

1 Stochastic Dynamic Programming

16.30 Estimation and Control of Aerospace Systems

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Management of dam systems via optimal price control

Homogeneous Linear Systems and Their General Solutions

Math 249B. Nilpotence of connected solvable groups

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Markov decision processes and interval Markov chains: exploiting the connection

The first order quasi-linear PDEs

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE

MATH 220 solution to homework 5

3 Compact Operators, Generalized Inverse, Best- Approximate Solution

MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner

Robustness and bootstrap techniques in portfolio efficiency tests

Advanced Computer Networks Lecture 2. Markov Processes

Conservation Laws: Systematic Construction, Noether s Theorem, Applications, and Symbolic Computations.

Chapter 0 of Calculus ++, Differential calculus with several variables

6.241 Dynamic Systems and Control

Stochastic optimal control with rough paths

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations

DEPARTMENT OF MATHEMATICS

Chap. 3. Controlled Systems, Controllability

Kernels for Dynamic Textures

Linear Algebra Review (Course Notes for Math 308H - Spring 2016)

AC&ST AUTOMATIC CONTROL AND SYSTEM THEORY SYSTEMS AND MODELS. Claudio Melchiorri

Linear-Quadratic Optimal Control: Full-State Feedback

6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

MATH The Chain Rule Fall 2016 A vector function of a vector variable is a function F: R n R m. In practice, if x 1, x n is the input,

Control Theory in Physics and other Fields of Science

Sequential State Estimation (Crassidas and Junkins, Chapter 5)

Linear Programming Methods

Uncertainty quantification and systemic risk

Centralized Versus Decentralized Control - A Solvable Stylized Model in Transportation

Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems

UC Berkeley Summer Undergraduate Research Program 2015 July 8 Lecture

Prof. Erhan Bayraktar (University of Michigan)

Mean-Field Games with non-convex Hamiltonian

Utility Maximization in Hidden Regime-Switching Markets with Default Risk

6 Reinforcement Learning

Abstrct. In this paper, we consider the problem of optimal flow control for a production system with one machine which is subject to failures and prod

Centralized Versus Decentralized Control - A Solvable Stylized Model in Transportation Logistics

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity.

Transcription:

Dynamic Consistency for Stochastic Optimal Control Problems Cadarache Summer School CEA/EDF/INRIA 2012 Pierre Carpentier Jean-Philippe Chancelier Michel De Lara SOWG June 2012

Lecture outline Introduction Distributed formulation SOC with constraints Reduction to finite-dimensional problem

Informal definition of dynamically consistent problems Informal definition For a sequence of dynamic optimization problems, we aim at discussing a notion of consistency over time. At the very first time step t 0, formulate an optimization problem that yields optimal decision rules for all the forthcoming time steps t 0,t 1,...,T; At the next time step t 1, formulate a new optimization problem starting at time t 1 that yields a new sequence of optimal decision rules. This process can be continued until the final time T is reached. A family of optimization problems is said to be dynamically consistent if the optimal strategies obtained when solving the original problem remain optimal for all subsequent problems.

Informal definition of dynamically consistent problems Decision at time t 0 State u 0,0 u 1 0,k u 2 0,k u 3 0,k depends on decisions computed at later times for possible future states. t 0 t k T Time

Informal definition of dynamically consistent problems Time consistent decision at time t k State u 0,0 t 0 u 0,k t k u k,k T Time Decision u k,k computed using a problem formulated at time t k knowing where I am at time t k or decision u 0,k. computed at time t 0 and to be applied at time t k for the same position should be the same.

Informal definition of dynamically consistent problems An example: deterministic shortest path If A B C D E is optimal then C D E is optimal. Arriving at C, If I behave rationally I should not deviate from the originally chosen route starting at A. t=1 B t=2 C E A t=0 D

Dynamic Programming and dynamically consistent problems Dynamic Programming and dynamically consistent problems Dynamic Programing gives dynamically consistent controls It gives a family of problems indexed by time: the solution of problem at time t uses the solution of problem at time t +1: V k x = min E [ L k x,u,wk+1 +Vk+1 ftk x,u,w u k+1 ] Choose control at time t k optimizing the sum of a current cost and of a running cost. The running cost has an interpretation as the optimal cost function of a problem at time t k+1.

Dynamic Programming and dynamically consistent problems Dynamic Programming and dynamically consistent problems 2 The problem at time t k+1 is the following: V k+1 x = min X,U T 1 E L t Xt,U t,w t+1 +K XT Xtk+1 = x, t=t k+1 s.t. X t+1 = f t Xt,U t,w t+1, t = tk+1,...,t 1, U t X t t = t k+1,...,t 1.

Deterministic case Deterministic case State u 0,k u 0,0 u k,k Exact state match at time t k t 0 t k T Time

Deterministic case Deterministic case State u 0,k u 0,0 u k,k Independence of the initial condition. t 0 t k T Time

Deterministic case Deterministic case State u 0,0 u 0,k u k,k Taking care of all possible initial states at time t k through state feedback. t 0 t k T Time

Deterministic case A first example in the deterministic case 1 min u t0,...,u T 1,x t0,...,x T T 1 t=t 0 L t xt,u t +K xt, Dt0 subject to x t+1 = f t xt,u t, xt0 given. Suppose a solution to this problem exists: u t 0,...,u T 1 : controls indexed by time t, the state variable. x t0,xt 1,...,x T : optimal path for No need for more information since the model is deterministic. these controls depend on the hidden parameter x t0, these controls are usually not optimal for x t 0 x t0.

Deterministic case A first example in the deterministic case 2 Consider the natural subsequent problems for every t i t 0 : min u ti,...,u T 1,x ti,...,x T T 1 t=t i L t xt,u t +K xt, Dti subject to x t+1 = f t xt,u t, xti given. One makes the two following observations. 1. Independence of the initial condition. In the very particular case where the solution to Problem D ti does not depend on x ti, Problems {D ti } ti are dynamically consistent. 2. True deterministic world. Suppose that the initial condition for Problem D ti is given by xt i = f ti x ti 1,ut i 1 exact model, then Problems {D ti } ti are dynamically consistent. Otherwise, adding disturbances to the problem brings inconsistency.

Deterministic case A first example in the deterministic case 3 Solve now Problem D t0 using Dynamic Programming DP: φ t 0,...,φ T 1 : controls depending on both t and x. The following result is a direct application of the DP principle. 3. Right amount of information. Suppose that one is looking for strategies as feedback functions φ t depending on state x. Then Problems {D ti } ti are dynamically consistent. As a first conclusion, time consistency is recovered provided we let the decision rules depend upon a sufficiently rich information.

Deterministic case Independence of the initial condition for every t = t 0,...,T 1, functions l t : U t R and g t : U t R, and assume that x t is scalar. Let K be a scalar constant and consider the following deterministic optimal control problem: min x,u T 1 t=t 0 l t u t x t +Kx T, s.t. x t0 given, x t+1 = g t u t x t, t = t 0,...,T 1.

Deterministic case Independence of the initial condition 2 Variables x t can be recursively replaced using dynamics g t leading to: min u T 1 t=t 0 l t u t g t 1 u t 1...g t0 u t0 x t0 +Kg T 1 u T 1...g t0 u t0 x t0. The optimal cost is linear with respect to x t0. Suppose that x t0 only takes positive values. Then x t0 has no influence on the minimizer. Remains true at subsequent time steps provided that dynamics are such that x t remains positive for every time step. The dynamic consistency property holds true. This example which may look very special but we will see later on that it is analogous to familiar SOC problems.

Deterministic case The guiding principle of the lecture To obtain Dynamical Consistency, let the decision rules depend upon a sufficiently rich information set.

Introduction Informal definition of dynamically consistent problems Dynamic Programming and dynamically consistent problems Deterministic case Distributed formulation A classical SOC Problem Equivalent distributed formulation SOC with constraints Constraints typology A toy example with finite probabilities Back to time consistency Reduction to finite-dimensional problem An equivalent problem with added state and control Dynamic programing

A classical SOC Problem A classical SOC Problem Control variables U = U t t=t0,...,t 1. Noise variables W = W t t=t1,...,t. Markovian setting: noise variables X t0,w t1,...,w T are independent. The problem S t0 starting at t 0 writes: [ T 1 min X,U E s.t. X t0 given, ] L t Xt,U t,w t+1 +K XT, t=t 0 X t+1 = f t Xt,U t,w t+1, U t X t0,w t1,...,w t, t = t 0,...,T 1, We know that there is no loss of optimality in looking for the optimal strategy as a state feedback control U t = φ t X t. 1

A classical SOC Problem Stochastic optimal control: the classical case 2 Can be solved using Dynamic Programming: V T x = Kx, Vtx = min E L t x,u,w u U t+1 +Vt+1 ft x,u,w t+1. It is clear while inspecting the DP equation that optimal strategies {φ t} t t0 remain optimal for the subsequent optimization problems: min U ti,...,u T 1,X ti,...,x T T 1 E L t X t,u t,w t+1 +KX T, t=t i subject to: X ti given, S ti X t+1 = f t X t,u t,w t+1, U t X ti,w ti +1,...,W t.

Equivalent distributed formulation An equivalent distributed formulation Ξ t a linear space of R-valued measurable functions on X t. Υ t space of signed measures on X t. We consider a dual pairing between Ξ t and Υ t by considering the bilinear form: ξ,µ = ξxdµx for ξ Ξ t and µ Υ t. X t When X t is a random variable taking values in X t and distributed according to µ t. Then ξ t,µ t = E ξ t X t.

Equivalent distributed formulation Fokker-Planck equation: state probability law dynamics For a given feedback laws φ t : X t U t we define T φt t : Ξ t+1 Ξ t : T φ t t ξ t+1 x def = E ξ t+1 X t+1 X t = x x X t. Using state dynamics and Markovian setting we obtain: T φ t t ξ t+1 x = E ξ t+1 ft x,φ t x,w t+1 x X t. Denoting by T φt t the adjoint operator of T φ t t T φ t ξ,µ = ξ, T φ µ t The state probability law driven by the chosen feedback follows: µ t+1 = T φt µt t t = t 0,...,T 1, µ t0 given.

Equivalent distributed formulation Revisiting the expected cost Next we introduce the operator Λ φt t : X t R: Λ φt t x def = E L t x,φt x,w t+1 meant to be the expected cost at time t for each possible state value when feedback function φ t is applied. Using Λ φt t we have: E L t Xt,φ t Xt,Wt+1 = E Λ φ t t X t = Λ φt t,µ t.

Equivalent distributed formulation Equivalent deterministic infinite-dimensional problem We obtain an equivalent deterministic infinite-dimensional optimal control problem D t0 : min Φ,µ T 1 t=t 0 s.t. µ t0 given, µ t+1 = Λ Φt t,µ t + K,µ T, T Φt t µt, t = t 0,...,T 1, Very similar to the Independence of the initial condition example

Equivalent distributed formulation Solving D t0 using Dynamic Programming V T µ = K,µ. A V T 1 µ = min Λ φ φ T 1,µ φ µ +V T T 1. Optimal feedback Γ T 1 : µ φ µ a priori depends on x and µ. V T 1 µ = min Λ φ φ T 1 +Aφ T 1 K,µ, = min Λ φ T 1 φ +Aφ T 1 K xµdx. X

Equivalent distributed formulation Solving D t0 using Dynamic Programming 2 Interchanging minimization and expectation operators leads to: optimal Γ T 1 does not depend on µ : Γ T 1 φ T 1, V T 1 again depends on µ in a multiplicative manner.

Introduction Informal definition of dynamically consistent problems Dynamic Programming and dynamically consistent problems Deterministic case Distributed formulation A classical SOC Problem Equivalent distributed formulation SOC with constraints Constraints typology A toy example with finite probabilities Back to time consistency Reduction to finite-dimensional problem An equivalent problem with added state and control Dynamic programing

Constraints typology Constraints in stochastic optimal control Different kinds of constraints in stochastic optimization: almost-sure constraint : gx T a P-a.s., chance constraint : P gx T a p, expectation constraint : E gx T a,... A chance constraint can be modelled as an expectation constraint: P gx T a = E 1 X adx T, with X ad = { x X, gx a }. Chance constraints bring both theoretical and numerical difficulties, especially convexity [Prékopa, 1995]. However the difficulty we are interested in is common to chance and expectation constraints. In the sequel, we concentrate on adding an expectation constraint.

Constraints typology Probability constraint on state trajectories Water level P { X t x t,t [0,T] } a Minimal water level Time

Constraints typology Transformed [ ] into a Probability constraints on final state : E g X T a Motivated by a joint probability constraint: P { γ t Xt bt, t = t 0,...,T } a. Introducing a new binary state variable Y t {0,1}: Y t0 = 1, Y t+1 = Y t 1 { γ t+1 X t+1 b t+1 }, E [ Y T ] = 1 P { γt Xt bt, t = t 0,...,T } a

Constraints typology Problem setting with expectation constraint Control variables U = U t t=t0,...,t 1. Noise variables W = W t t=t1,...,t. Markovian setting: noises variables X t0,w t1,...,w T are independent. The problem starting at t 0 writes: min X,U [ T 1 ] E L t Xt,U t,w t+1 +K XT, t=t 0 s.t. X t0 given, X t+1 = f t Xt,U t,w t+1, U t X t0,w t1,...,w t, t = t 0,...,T 1, E [ g X T ] a, a R

Constraints typology Problem at time t i min X,U [ T 1 ] E L t Xt,U t,w t+1 +K XT, t=t i s.t. X t+1 = f t Xt,U t,w t+1, Xti given, U t X ti,w ti+1,...,w t, t = t i,...,t 1, E [ g X T ] a. Not dynamically consistent with the usual state variable. With new appropriate state variable, one regains dynamical consistency. Optimal strategy at time t is a function of Φ t0,t : X U since information structure is not modified by the last constraint.

Constraints typology Equivalent deterministic infinite-dimensional problem We obtain an equivalent deterministic infinite-dimensional optimal control problem. min Φ,µ T 1 t=t 0 s.t. µ t0 given, µ t+1 = Λ Φt t,µ t + K,µ T +χ { g,µ a} µ T T Φt t g,µ T a. µt, t = t 0,...,T 1, The last constraint implies that the optimal feedback laws depend on the initial condition µ t0 and we do not have dynamical consistency.

A toy example with finite probabilities Introduction Informal definition of dynamically consistent problems Dynamic Programming and dynamically consistent problems Deterministic case Distributed formulation A classical SOC Problem Equivalent distributed formulation SOC with constraints Constraints typology A toy example with finite probabilities Back to time consistency Reduction to finite-dimensional problem An equivalent problem with added state and control Dynamic programing

A toy example with finite probabilities Simple case-study: a discrete controlled Markov chain The state space X = {1,2,3}: Discrete possible values of a reservoir level. x = 1 resp. x = 3 the lower resp. upper level of the reservoir. The control action u takes values in U = {0,1}, the value 1 corresponding to using some given water release to produce some electricity. The noise variable: stochastic inflow takes its values in W = { 1,0,1}. The discrete probability law of each noise variable W t being characterized by the weights {, 0, +}. Finally, the reservoir dynamics is: X t+1 = min x,maxx,x t U t +W t+1.

A toy example with finite probabilities Simple case-study: a discrete controlled Markov chain Let µ t be the discrete probability law associated with the state random variable X t, the initial probability law µ t0 being given. In such a discrete case, it is easy to compute the transition matrix P u giving the Markov chain transitions for each possible value u of the control, with the following interpretation: P u ij = P X t+1 = j X t = i,u t = u = P {min x,maxx,i u +Wt+1 = j }.

A toy example with finite probabilities Simple case-study: a discrete controlled Markov chain We obtain for the reservoir problem: + 0 + 0 P 0 = 0 +, P 1 = 0 0+ + 1 0 0 + 0 + 0 0 + and we denote by P u i the i th row of matrix P u.

A toy example with finite probabilities Simple case-study: a discrete controlled Markov chain Let φ t : X U be a feedback law at time t. We denote by Φ t the set of admissible feedbacks at time t. In our discrete case, cardφ t = cardu cardx = 8 for all t. The transition matrix P φt associated with such a feedback is obtained by properly selecting rows of the transition matrices P u, namely: P φt1 1 P φt = P φt2 2. P φt3 3 Then the dynamics given by the Fokker-Planck equation writes: µ t+1 = P φt µt, and the state µ t involved in this dynamic equation is a three-dimensional column vector µ t = µ 1,t,µ 2,t,µ 3,t

A toy example with finite probabilities The cost at time t is supposed to be linear, equal to p t U t, p t being a negative deterministic price. No final cost K 0. The reservoir level at final time T must be equal to x with a probability level at least equal to π. The distributed formulation associated with the reservoir control problem is: min {φ t Φ t} T 1 t=t 0 T 1 t=t 0 p t φt,µ t, 6a subject to: µ t+1 = P φt µt, µ t0 given, 6b 1{x},µ T π, 6c with 1 {x} x = 1 if x = x, and 0 otherwise.

A toy example with finite probabilities The previous problem can be solved using DP: the state equation is a two-dimensional µ 1,t +µ 2,t +µ 3,t = 1 Suppose now that the number of reservoir discrete levels is n rather than 3, with n big enough: the resolution of the distributed formulation suffers from the curse of dimensionality n 1-dimensional state, The ultimate curse being to consider that the level takes values in the interval [x,x], so that the µ t s are continuous probability laws.

Back to time consistency Back to Dynamical Consistency We can use Dynamic Programing: Let V t µ t be the optimal cost of the problem starting at time t with initial condition µ t. { K,µ if g,µ a, V T µ = + otherwise, and, for every t = t 0,...,T 1 and every probability law µ on X: V t µ = min Λ Φt t,µ +V t+1 Φ t T Φt t µ. Optimal feedback functions Φ t depend on µ t. Consistency recovered for strategies depending on X t and µ t.

Back to time consistency Back to Dynamical Consistency 2 Optimal feedback functions Φ t depend on µ t µ min Λ Φt t,µ +V t+1 T Φt t. Φ t But it also depends on X t because of the distributed formulation U t = φ t X t! Consistency recovered for strategies depending on X t and µ t. The new state variable to consider X t,µ t is an infinite dimensional object!

Introduction Informal definition of dynamically consistent problems Dynamic Programming and dynamically consistent problems Deterministic case Distributed formulation A classical SOC Problem Equivalent distributed formulation SOC with constraints Constraints typology A toy example with finite probabilities Back to time consistency Reduction to finite-dimensional problem An equivalent problem with added state and control Dynamic programing

An equivalent problem with added state and control Equivalent problem: added state process Z and control V min U,V,X,Z T 1 E L t X t,u t,w t+1 +KX T, t=t 0 X t0 = x t0, X t+1 = f t X t,u t,w t+1, Z t0 = 0, Z t+1 = Z t +V t, Almost sure final constraint: gx T Z T a.. Measurability constraints: U t F t, V t F t+1. Additional time constraints: E V t F t = 0.

An equivalent problem with added state and control Equivalent problem 2 Let U t0,...,u T 1 and X t0,...,x T satisfying the constraints of Initial problem. We define V t0,...,v T 1 and Z t0,...,z T as follow: V t = E gx T F t+1 E gxt F t, Zt+1 = Z t +V t. Noting that the σ-field F t0 is the minimal σ-field and that the random variable X T is F T -measurable, we have: Z T = gx T E gx T. Hence, U, X, V and Z satisfy the set of constraints of the new problem.

An equivalent problem with added state and control Equivalent problem 3 Let U t0,...,u T 1, X t0,...,x T, V t0,...,v T 1 and Z t0,...,z T satisfying the constraints of V,Z-problem be given. Then, Z T = T 1 t=t 0 V t. Thus we have: E T 1 Z T = E E V t F t = 0. t=t 0 If follows that gx T Z T a E gx T a, and therefore the processes U t0,...,u T 1 and X t0,...,x T satisfy the constraints of Initial problem To conclude, since the two problems share the same sets as admissible constraints and share the same criteria to optimize, they are equivalent.

An equivalent problem with added state and control Equivalent problem 4 1. Dynamic Programming will give feedbacks as functions of X t,z t. 2. The new problem is more intricate: added state and added control, Vt depends on W t+1 Hazard Decision some new constraints on the controls are to be taken into account. 3. E gx T F t : perception of the risk constraint at time t. 4. V t = E gx T F t+1 E gxt F t : variation between time t and time t +1 of this perception. 5. Z t: cumulative variation over time of the risk constraint.

Dynamic programing Dynamic programing In order to use Dynamic programming on problem 7, we will first recall on a simplified model how Dynamic Programming works on a classical SOP. min X,U E ] L t Xt,U t,w t+1 +K XT, t=t 0 [ T 1 s.t. X t0 given, X t+1 = f t Xt,U t,w t+1, U t F t t = t 0,...,T 1, Then we will show the modifications induced by hazard-decision control or E V t F t = 0.

Dynamic programing Dynamic programing steps in classical framework 1. Rewrites criterion using conditional expectations: E E L 0 X 0,U 0 +E L 1 X 1,U 1 +KX 2 F 1 F0 2. minimization with respect to U t and X t enters up to the conditional expectation with respect to F t. 3. Recursive computation from the most internal problem to the outer one t = t 0. The most internal t = T 1: min E hx T 1,U T 1,W T F T 1. U F T 1 T 1 hx,u,w = L T 1 x,u+k f T 1 x,u,w

Dynamic programing Hazard Decision framework Hazard Decision: U t F t+1 Interchange of expectation and minimization can be done one step further. E min hx T 1,U T 1,W T F T 1. U F T 1 T with hx,u,w = L T 1 x,u,w+k f T 1 x,u,w Optimal feedback U T 1 can be chosen as a X T 1,W T -measurable function.

Dynamic programing Hazard Decision framework with constraint Control constraint: E U t F t = 0. We are led to the following problem at time t = T 1: min E hx T 1,U T 1,W T F T 1, U F T 1 T subject to: E U T 1 F T 1 = 0. Optimal feedback U T 1 can still be chosen as a X T 1,W T -measurable function? Yes and the proof is actually available when F T 1 is generated by finite valued random variables.

Dynamic programing Back to the equivalent problem The Hazard-Decision V and Decision-Hazard U controls. Initialization of the Bellman function at time T: V T x,z = Kx+χ G ad x,z, χ G ad characteristic function of Gad = { x,z gx z a } ; Bellman function at time t: V t x,z = min u U min E hx,u,z,v,w V W t+1, t+1 subject to: E V = 0. hx,u,z,v,w def = L t x,u,w+v t+1 ft x,u,w,z +v U t X t,z t and V t X t,z t,w t+1.