Risk-Averse Control of Partially Observable Markov Systems. Andrzej Ruszczyński. Workshop on Dynamic Multivariate Programming Vienna, March 2018

Similar documents
Process-Based Risk Measures for Observable and Partially Observable Discrete-Time Controlled Systems

Kolmogorov Equations and Markov Processes

On continuous time contract theory

Mean-Field optimization problems and non-anticipative optimal transport. Beatrice Acciaio

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

Particle Filtering Approaches for Dynamic Stochastic Optimization

Approximation of BSDEs using least-squares regression and Malliavin weights

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Markov Chain BSDEs and risk averse networks

Markov Chain Monte Carlo Methods for Stochastic Optimization

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Decomposability and time consistency of risk averse multistage programs

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

The Wiener Itô Chaos Expansion

Risk-Averse Dynamic Optimization. Andrzej Ruszczyński. Research supported by the NSF award CMMI

On the Multi-Dimensional Controller and Stopper Games

Introduction Optimality and Asset Pricing

MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

SDE Coefficients. March 4, 2008

A class of globally solvable systems of BSDE and applications

Sequential Monte Carlo Methods for Bayesian Computation

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

Mean-Field optimization problems and non-anticipative optimal transport. Beatrice Acciaio

Towards a Bayesian model for Cyber Security

A MODEL FOR THE LONG-TERM OPTIMAL CAPACITY LEVEL OF AN INVESTMENT PROJECT

Dynamic Risk Measures and Nonlinear Expectations with Markov Chain noise

Brownian Motion. An Undergraduate Introduction to Financial Mathematics. J. Robert Buchanan. J. Robert Buchanan Brownian Motion

EULER MARUYAMA APPROXIMATION FOR SDES WITH JUMPS AND NON-LIPSCHITZ COEFFICIENTS

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach

Minimum average value-at-risk for finite horizon semi-markov decision processes

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Higher Order Linear Equations

Mean-field SDE driven by a fractional BM. A related stochastic control problem

Math 116 Practice for Exam 2

Statistical test for some multistable processes

Birgit Rudloff Operations Research and Financial Engineering, Princeton University

Partial Differential Equations with Applications to Finance Seminar 1: Proving and applying Dynkin s formula

Numerical Solutions of ODEs by Gaussian (Kalman) Filtering

Viscosity Solutions of Path-dependent Integro-Differential Equations

Nested Uncertain Differential Equations and Its Application to Multi-factor Term Structure Model

Lecture 12: Diffusion Processes and Stochastic Differential Equations

Risk-Sensitive and Robust Mean Field Games

Infinite-dimensional methods for path-dependent equations

Uniformly Uniformly-ergodic Markov chains and BSDEs

Markov Chain Monte Carlo Methods for Stochastic

Numerical scheme for quadratic BSDEs and Dynamic Risk Measures

Risk neutral and risk averse approaches to multistage stochastic programming.

Solution of Stochastic Optimal Control Problems and Financial Applications

Utility Theory CHAPTER Single period utility theory

SMSTC (2007/08) Probability.

Quasi-invariant measures on the path space of a diffusion

Numerical methods for solving stochastic differential equations

1 Stochastic Dynamic Programming

Uncertainty in Kalman Bucy Filtering

Systemic Risk and Stochastic Games with Delay

Lecture 4: Introduction to stochastic processes and stochastic calculus

Vector Calculus. A primer

Harnack Inequalities and Applications for Stochastic Equations

MATH 5616H INTRODUCTION TO ANALYSIS II SAMPLE FINAL EXAM: SOLUTIONS

Affine Processes. Econometric specifications. Eduardo Rossi. University of Pavia. March 17, 2009

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Higher order weak approximations of stochastic differential equations with and without jumps

Reversible Coalescing-Fragmentating Wasserstein Dynamics on the Real Line

Constrained continuous-time Markov decision processes on the finite horizon

Ambiguity and Information Processing in a Model of Intermediary Asset Pricing

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands

1. Introduction Systems evolving on widely separated time-scales represent a challenge for numerical simulations. A generic example is

Math 116 Final Exam April 26, 2013

Auxiliary Particle Methods

Optimal portfolio strategies under partial information with expert opinions

Prof. Erhan Bayraktar (University of Michigan)

Formulas for probability theory and linear models SF2941

Stochastic Differential Equations

Notes on Measure Theory and Markov Processes

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for

Exercises for Multivariable Differential Calculus XM521

An Analytic Method for Solving Uncertain Differential Equations

Mathematical Formulation of Our Example

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

On a class of stochastic differential equations in a financial network model

On Kusuoka Representation of Law Invariant Risk Measures

State Space Representation of Gaussian Processes

HOMEWORK # 3 SOLUTIONS

Brownian Motion and Stochastic Calculus

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

Multi-armed bandit models: a tutorial

Stochastic modelling of epidemic spread

Problem 1. Construct a filtered probability space on which a Brownian motion W and an adapted process X are defined and such that

Mean Field Competitive Binary MDPs and Structured Solutions

Particle Filtering for Data-Driven Simulation and Optimization

Poisson Jumps in Credit Risk Modeling: a Partial Integro-differential Equation Formulation

Note Set 5: Hidden Markov Models

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bridging the Gap between Center and Tail for Multiscale Processes

Lecture 4a: Continuous-Time Markov Chain Models

Existence and uniqueness: Picard s theorem

Transcription:

Risk-Averse Control of Partially Observable Markov Systems Workshop on Dynamic Multivariate Programming Vienna, March 2018

Partially Observable Discrete-Time Models Markov Process: fx t ; Y t g td1;:::;t on the Borel state space X Y The process fx t g is observable, while fy t g is not observable Control sets: U t W X U, t D 1; : : : ; T Transition kernel: PŒ.X tc1 ; Y tc1 / 2 C j x t ; y t ; u t D Q t.x t ; y t ; u t /.C/ Costs: Z t D c t.x t ; Y t ; U t /, t D 1; : : : ; T Two relevant filtrations X ;Y F t defined by the full state process fxt ; Y t g defined by the observed process fxt g F X t n Space of costs: Z t D Classical Problem: Z W! R ˇˇ o Z is F X ;Y t -measurable and bounded min E c 1.X 1 ; Y 1 ; U 1 / C c 2.X 2 ; Y 2 ; U 2 / C C c T.X T ; Y T ; U T /

Partially Observable Discrete-Time Models Markov Process: fx t ; Y t g td1;:::;t on the Borel state space X Y The process fx t g is observable, while fy t g is not observable Control sets: U t W X U, t D 1; : : : ; T Transition kernel: PŒ.X tc1 ; Y tc1 / 2 C j x t ; y t ; u t D Q t.x t ; y t ; u t /.C/ Costs: Z t D c t.x t ; Y t ; U t /, t D 1; : : : ; T Two relevant filtrations X ;Y F t defined by the full state process fxt ; Y t g defined by the observed process fxt g F X t n Space of costs: Z t D Risk-Averse Problem: Z W! R ˇˇ o Z is F X ;Y t -measurable and bounded min 1;T c1.x 1 ; Y 1 ; U 1 /; c 2.X 2 ; Y 2 ; U 2 /; : : : ; c T.X T ; Y T ; U T /

Dynamic Risk Measures Probability space. ; F ; P/ with filtration F 1 F T F Adapted sequence of random variables (costs) Z 1 ; Z 2 ; : : : ; Z T Spaces: Z t of F t -measurable functions and Z t;t D Z t Z T Dynamic Risk Measure A sequence of conditional risk measures t;t W Z t;t! Z t, t D 1; : : : ; T. Monotonicity condition: t;t.z/ t;t.w / for all Z; W 2 Z t;t such that Z W Local property: For all A 2 F t t;t.1 A Z/ D 1 A t;t.z/

Time Consistency and Nested Representation A dynamic risk measure t;t T td1 is time-consistent if for all 1 t < T Z t D W t and tc1;t.z tc1 ; : : : ; Z T / tc1;t.w tc1 ; : : : ; W T / imply that ;T.Z ; : : : ; Z T / ;T.W ; : : : ; W T / Define one-step mappings: t.z tc1 / D t;t.0; Z tc1 ; 0; : : : ; 0/ Nested Decomposition Theorem Suppose a dynamic risk measure t;t T td1 is time-consistent, and t;t.0; : : : ; 0/ D 0 t;t.z t ; Z tc1 ; : : : ; Z T / D Z t C t;t.0; Z tc1 ; : : : ; Z T / Then for all t we have the representation t;t.z t ; : : : ; Z T / D Z t C t Z tc1 C tc1 Z tc2 C C T 1.Z T /

Issues with General Theory in the Markov Setting Probability measure P, processes X t and Z t depend on policy We have to deal with a family of risk measures t;t./ The values of the risk measures may depend on history, and Markov policies cannot be expected The cost may not be observable Motivating Example X D f0; 1g, T D 2, and Z t D Z t.x t / (cost depends on state). Consider the risk measure 2;2.Z 2 /.x 1 ; x 2 / D Z 2.x 2 / 1;2.Z 1 ; Z 2 /.x 1 / D Z 1.x 1 / C Z 2.x 1 / (assumes that x 1 will not change) It is time-consistent and has the normalization, translation, and local properties. As 1;2 does not depend on the distribution of x 2, given x 1, it is useless for controlling Markov models. In fact, it is much worse than expectation.

Conditional Risk Evaluators Space of observable random variables: n t D S W! R ˇˇ o S is F X t -measurable and bounded ; t D 1; : : : ; T A mapping t;t W Z t Z T! t is a conditional risk evaluator (i) It is monotonic if Z s W s for all s D t; : : : ; T, implies that t;t.z t ; : : : ; Z T / t;t.w t ; : : : ; W T / (ii) It is normalized if t;t.0; : : : ; 0/ D 0; (iii) It is translation equivariant if 8.Z t ; : : : ; Z T / 2 t Z tc1 Z T, t;t.z t ; : : : ; Z T / D Z t C t;t.0; Z tc1 ; : : : ; Z T /; (iv) It is decomposable if a mapping t W Z t! t exists such that: t.z t / D Z t ; 8Z t 2 t ; t;t.z t ; : : : ; Z T / D t.z t / C t;t.0; Z tc1 ; : : : ; Z T /; 8 Z 2 Z t;t

Risk Filters and their Time Consistency A risk filter t;t td1;:::;t t;t W Z t;t! t. is a sequence of conditional risk evaluators We have index risk filters by policy, because affects the measure P History: H t D.X 1 ; X 2 ; : : : ; X t /, h t D.x 1 ; x 2 ; : : : ; x t / 2 td1;:::;t A family of risk filters t;t is stochastically conditionally time consistent if for any ; 0 2, for any 1 t < T, for all h t 2 X t, all.z t ; : : : ; Z T / 2 Z t;t and all.w t ; : : : ; W T / 2 Z t;t, the conditions tc1;t.z tc1; : : : ; Z T / j H t Z t D W t D h t st 0 tc1;t.w tc1; : : : ; W T / j Ht 0 D h t imply t;t.z t; Z tc1 ; : : : ; Z T /.h t / 0 t;t.w t; W tc1 ; : : : ; W T /.h t / The relation st is the conditional stochastic order

Bayes Operator Belief State: Conditional distribution of Y t given initial distribution 1 and history g t D. 1 ; x 1 ; u 1 ; x 2 ; : : : ; u t 1 ; x t / t.g t /.A/ D PŒY t 2 A j g t ; 8 A 2 B.Y/; t D 1; : : : ; T Conditional distribution of the observable part: Z P ŒX tc1 2 B j g t ; u t D Q X t.x t ; ; u t /.B/ d t.g t /; where Q X t.x t; y t ; u t / is the marginal of Q t.x t ; y t ; u t / on the space X Transition of the belief state - Bayes operator Y tc1.g tc1 / D t.x t ; t.g t /; u t ; x tc1 / Example: Y D fy 1 ; : : : ; y n g and Q t.x; y; u/ has density q t.x 0 ; y 0 jx; y; u/ t.x; ; u; x 0 / P n.fy k id1 g/ D q t.x 0 ; y k j x; y i ; u/ i P ǹ P n D1 id1 q t.x 0 ; y ` j x; y i ; u/ i

Markov Risk Filters Extended state history (including belief states): h t D.x 1 ; 1 ; x 2 ; 2 ; : : : ; x t ; t / 2 H t Policies D. 1 ; : : : ; T / with decision rules t.h t / 2 U t.x t / Markov Policy For all h t ; h 0 t 2 H t, if x t D x 0 t and t D 0 t, then t.h t / D t.h 0 t / D t.x t ; t / Policy value function: v t.h t/ D t;t c t.x t ; Y t ; t.h t //; : : : ; c T.X T ; Y T ; T.H T //.h t / 2 td1;:::;t A family of risk filters t;t is Markov if for all Markov policies 2, for all h t D.x 1 ; : : : ; x t / and h 0 t D.x 1 0 ; : : : ; x t 0/ in Xt such that x t D xt 0 and t D t 0, we have v t.h t/ D v t.h0 t / D v t.x t; t /

Structure of Markov Risk Filters (with Jingnan Fan) 2 A family of risk filters t;t is normalized, translation-invariant, td1;:::;t stochastically conditionally time consistent, decomposable, and Markov if and only if transition risk mappings exist: t W x t ; t ; Q t.h t/ W 2 ; h t 2 X t V! R; t D 1 : : : T 1; (i) t.x; ; ; / is normalized and strongly monotonic with respect to stochastic dominance (ii) for all 2, for all t D 1; : : : ; T 1, and for all h t 2 X t, v t.h t/ D r t.x t ; t ; t.h t // C t x t ; t ; Q t.h t/; v tc1.h t; / Evaluation of a Markov policy : v t.x t; t / D r t.x t ; t ; t.x t ; t // C t x t ; t ; Q t.x t; t /; x 0 7! v tc1.x 0 ; t.x t ; t ; t.x t ; t /; x 0 //

Examples of Transition Risk Mappings Average Value at Risk.x; ; m; v/ D min 2R where.x; / 2 Œ min ; max.0; 1. n C 1 Z.x; / X v.x 0 / o C m.dx 0 / Mean Semideviation of Order p Z Z.x; ; m; v/ D v.x 0 / m.dx 0 / C.x; / X ƒ X E m Œv where.x; / 2 Œ0; 1. v.x 0 / E m Œv 1 p C m.dx 0 p / Entropic Mapping.x; ; m; v/ D 1.x; / ln E m he.x;/ v.x0 / i ;.x; / > 0

Dynamic Programming Risk-averse optimal control problem: c1.x 1 ; Y 1 ; U 1 /; c 2.X 2 ; Y 2 ; U 2 /; : : : ; c T.X T ; Y T ; U T / Theorem min 1;T If the risk measure is Markovian (+ general conditions), then the optimal solution is given by the dynamic programming equations: v t vt.x; / D min.x; / D min Z t x; ; u2u t.x/ Y r T.x; ; u/; x 2 X; 2 P.X/ r t.x; ; u/ C u2u T.x/ K X t.x; y; u/.dy/; x 0 7! v tc1 x 0 ; t.x; ; u; x 0 / ; x 2 X; 2 P.Y/; t D T 1; : : : ; 1 Optimal Markov policy Ŏ D f O 1 ; : : : ; O T g - the minimizers above

Risk-Averse Clinical Trials (Darinka Dentcheva and Curtis McGinity) In stages t D 1; : : : ; T successive patients are given drugs (cytotoxic agents), to which severe toxic response (even death) is possible Probability of toxic response (x tc1 D 1) depends on the unknown optimal dose and the administered dose (control) u t : 1 F.u t ; / D 1 e '.u t;/ The belief state t, the conditional probability distribution of the unknown optimal dose, is the current state of the system The state evolves according to Bayes operator, depending on the response of the patient: for 2 Y (the range of doses) ( F.u t ; / t./ if toxic (x tc1 D 1) tc1./ 1 F.u t ; / t./ if not toxic (x tc1 D 0) Cost per stage: c t.; u t / D t ju t j (other forms possible) Medical ethics naturally motivates risk-averse control

Total Cost Models Find the best policy D. 1 ; : : : ; T / to determine doses u t D t. t / Expected Value Model TX C1 min E t ju t 2 td1 j T C1 is the weight of the final recommendation u T C1 Risk-Averse Model Two sources of risk n min 2 1;T C1 t ju t jo td1;:::;t C1 Unknown state (only belief state t available at time t) Unknown evolution of f t g due to random responses of patients

Dynamic Programming Equations All memory is carried by the belief state t For each t and u t, only two next states are possible, corresponding to x tc1 D 0 or 1 Simplified equation Z v t./ D min r t.; u/ C ; PŒx 0 D 1jy; u.dy/; v u tc1 t.x; ; u; / Y Examples: r t.; u/ D E ju j ; p; './ D E max '.x 0 / x 0 2f0;1g Any law invariant risk measure on the space of functions on U (for r t ) or on U f0; 1g (in the case of t ) can be used here.

Limited Lookahead Policies At each time t, assume that this is the last test before the final recommendation, and solve the two-stage problem Risk-Neutral h min E t t ju t j C x tc1 E response min E tc1 ju tc1 u t u tc1 Risk-Averse h min E t t ju t j C x tc1 max min E tc1 ju tc1 u t response u tc1 i j i j x tc1 D TX C1 DtC1 (weight of the future)

Simulation Results for Expected Value and Risk-Averse Policies Comparison of Policies: Risk-Averse vs. Expected Value Lookahead 0.4 Distribution of Dosage 0.3 T = 20 η = 2.6 Risk-AverseLookahead ExpectedValueLookahead 0.2 0.1 0.0 η T+1 * 0 AndrzejMcGinity, Ruszczyński Dentcheva, Ruszczyński Risk-Averse Risk-Averse ControlOptimal of Partially Learning Observable for ClinicalMarkov Trial DoseSystems Escalation

Machine Deterioration (with Jingnan Fan) We consider the problem of minimizing costs of a machine in T periods. Unobserved state: y t 2 f1; 2g, with 1 being the good and 2 the bad state Observed state: x t - cost incurred in the previous period Control: u t 2 f0; 1g, with 0 meaning continue, and 1 meaning replace The dynamics of Y is Markovian, with the transition matrices K Œu : K Œ0 1 p p D K Œ1 1 p p D 0 1 1 p p Distribution of costs: P x tc1 C ˇˇ yt D i; u t D 0 D P x tc1 C ˇˇ yt D i; u t D 1 D Z C 1 Z C 1 f i.x/ dx; i D 1; 2 f 1.x/ dx; i D 1; 2

Value and Policy Monotonicity Belief state: i 2 Œ0; 1 - conditional probability of the good state The optimal value functions: vt.x; / D x C w t./, t D 1; : : : ; T C 1 Dynamic programming equations w t./ D min nr C f 1 ; x 0 7! x 0 C w tc1.1 p/ I with the final stage value wt C1./ D 0. f 1 C.1 /f 2 ; x 0 7! x 0 C w tc1..; x 0 // o ; If f 1 f2 is non-increasing, then the functions wt./ are non-increasing and thresholds t 2 Œ0; 1 ; t D 1; : : : ; T exist, such that the policy ( ut D 0 if t > t ; 1 if t t ; is optimal

Numerical Illustration Cost distributions f 1 and f 2 : uniform with R 0 f 1.x/ dx R 0 f 2.x/ dx Transition risk mapping: mean semideviation Empirical distribution of the total cost for the risk-neutral model (blue) and the risk-averse model (orange)

Partially Observable Jump Process The unobserved process fy t g 0tT : Finite state Markov jump process on the space Y D f1; : : : ; ng with the generator.t/: 8 < lim ij.t/ D "#0 : 1 " P Y tc" D jˇˇyt D i if j i P k i ik.t/ if j D i The observed process fx t g 0tT : Diffusion following the SDE dx t D A.Y t ; t/ dt C B.t/ dw t ; 0 t T ; with the initial value x 0, independent of Y 0. fw t g is a Wiener process. Random final cost:.y T /

The Belief State Equation (Wonham, 1964) Filtration of observable events: fft X g 0tT The belief state: i.t/ D P Y t D i j Ft X ; i D 1; : : : ; n; Belief State Equation d i.s/ D. A.i; s/ NA.s/ / i.s/ ds C i.s/ B.s/ where. / i.s/ D nx ji.s/ j.s/; jd1 xa.s/ D dw s ; i.0/ D p i ; nx A.j; s/ j.s/; jd1 and fw t g 0tT is a Wiener process given by the formula W t D Z t 0 dx s NA.s/ ds : B.s/

Risk Filtering by BSDE (with Ruofan Yan) Suppose.t/ D p and we use a Markov risk filter % t;t 0tT. The value function for the final cost case: h V.t; p/ D % t;t Y t;p i T Structure of % t;t./ [using Coquet, Hu, Mémin, Peng (2002)] If the filter is monotonic, normalized, time consistent, and has the local property (+ minor growth conditions) then a driver g W Œ0; T R R n t;p! R exists, such that % t;t Y T D Vt, where.v ; Z/ solve backward stochastic differential equation h dv s D g.s; V s ; Z s / ds Z s dw s ; s 2 Œt; T ; V T D % T ;T Y t;p i T

Equivalent Forward-Backward System Under additional condition of law invariance of T ;T Œ, we obtain the following system. Forward SDE for the belief state: For i D 1; : : : ; n and 0 t s T d i.s/ D. A.i; s/ NA.s/ / i.s/ ds C i.s/ B.s/ dw s ; i.t/ D p i ; Backward SDE for the risk measure: for 0 t s T dv s D g.s; V s ; Z s / ds Z s dw s ; s 2 Œt; T ; V T D r T ;.T / The functional r T.; / is a law invariant risk measure.

The Running Cost Case Functional with Running Cost: Z T D Z T 0 c.t/ dt C.Y T / The value function: V.t; p/ D % t;t ŒZ T Forward SDE for the belief state: For i D 1; : : : ; n and 0 t s T d i.s/ D. A.i; s/ NA.s/ / i.s/ ds C i.s/ B.s/ dw s ; i.t/ D p i ; Backward SDE for the risk measure: for 0 t s T dv s D c.s/ C g.s; V s ; Z s / ds Z s dw s ; s 2 Œt; T ; V T D r T ;.T /

Controlled Process Controlled transition rates: ij.t; /, 2 U, where U is a bounded set. The rates are uniformly bounded. Piecewise-constant control: For 0 D t 0 < t 1 < t 2 < < t N D T, we define U N i D u 2 U ˇˇ u.t/ D u.tj /; 8 t 2 Œt j ; t jc1 /; 8 j D i; : : : ; N 1 where U is the set of U-valued processes, adapted to F t 0tT. Value function for a fixed control: Z tjc1 V u.t j ; p/ D tj ;t jc1 c. t j ;piu.s/; u r / ds C V u t jc1 ; t j ;piu.t jc1 / t j t j ;piu./ is the belief process restarted at t j from value p, while the system is controlled by u./ D u.t j / in the interval Œt j ; t jc1 /.

Dynamic Programming Optimal value function: OV.t j ; p/ D inf u2u N j V u.t j ; p/ Z tjc1 OV.t j ; p/ D inf tj ;t jc1 c. t j ;pi.r/; / dr C OV t jc1 ; t j ;pi.t jc1 / 2U t j Each tj ;t jc1 Œ is given by a controlled FBSDE system on Œt j ; t jc1 d t j ;pi i.s/ D../ / i.s/ ds C t j ;pi i.s/ A.i; s/ NA.s/ dw s ; B.s/ t j ;pi i.t j / D p i ; dv s D c t j ;pi.s/; C g.s; V s ; Z s / ds Z s dw s ; V tjc1 D OV t jc1 ; t j ;pi.t jc1 / Further research: Numerical methods for this FBSDE system

References A. Ruszczyński, Risk-averse dynamic programming for Markov decision processes, Mathematical Programming, Series B 125 (2010) 235 261 Ö. Çavuş and A. Ruszczyński, Computational methods for risk-averse undiscounted transient Markov models, Operations Research, 62 (2), 2014, 401 417. Ö. Çavuş and A. Ruszczyński, Risk-averse control of undiscounted transient Markov models, SIAM J. on Control and Optimization, 52(6), 2014, 3935 3966. J. Fan and A. Ruszczyński, Risk measurement and risk-averse control of partially observable discrete-time Markov systems, Mathematical Methods of Operations Research, 2018, 1 24. C. McGinity, D. Dentcheva and A. Ruszczyński, Risk-averse approximate dynamic programming for partially observable systems with application to clinical trials, in preparation R. Yan, J. Yao, A. Ruszczyński, Risk Filtering and Risk-Averse Control of Partially Observable Markov Jump Processes, submitted for publication. D. Dentcheva and A. Ruszczyński, Risk measures for continuous-time Markov chains, submitted for publication.