A weak dynamic programming principle for zero-sum stochastic differential games

Size: px

Start display at page:

Download "A weak dynamic programming principle for zero-sum stochastic differential games"

Ariel Douglas
5 years ago
Views:

1 A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa Vitória Dissertação para obtenção do Grau de Mestre em Matemática e Aplicações Júri Presidente: Professora Doutora Ana Bela Cruzeiro Orientador: Professor Doutor Diogo Aguiar Gomes Vogal: Professor Doutor António Atalaia Serra Vogal: Doutor Gabriele Terrone Junho de 2010

3 I was born not knowing and have had only a little time to change that here and there. Richard Feynman

4 4

5 Acknowledgments First I would like to thank Prof. Nizar Touzi, for he initially proposed the problem that gave rise to this thesis and supervised this work in its early stage. I am also grateful to my adviser, Prof. Diogo Gomes, who took this project under his supervision. For his guidance and numerous insightful comments I deeply express my gratitude. This project started at École Polytechnique, where I had the pleasure of staying for a whole semester. I am grateful for their hospitality. My parents deserve a special acknowledgment, for all their love and unconditional support. Last but not least I would like to thank the numerous people - friends, colleagues and mentors - who, through their friendship, influence and encouragement, contributed to these last five fruitful and joyful years of my personal and academic development. i

6 ii

7 Resumo Neste trabalho extendemos uma versão fraca do princípio de programação dinâmica, provado pela primeira vez em [1] para problemas de controlo optimal, a problemas de jogos diferenciais estocásticos de soma nula. Deste modo, conseguimos derivar a equação de Hamilton-Jacobi-Bellman-Isaacs quando um dos jogadores pode utilizar estratégias com valores num conjunto ilimitado. Palavras-chave: jogos diferenciais estocásticos, função valor, princípio de programação dinâmica, soluções de viscosidade. iii

8 iv

9 Abstract We extend a weak version of the dynamic programming principle, first proven in [1] for stochastic control problems, to the context of zero-sum stochastic differential games. By doing so we are able to derive the Hamilton-Jacobi-Bellman-Isaacs equation when one of the players is allowed to use strategies taking values in an unbounded set. Keywords: stochastic differential games, value function, dynamic programming principle, viscosity solutions. v

10 vi

11 Contents Acknowledgments Resumo Abstract i iii v Preface 1 1 Deterministic optimal control Introduction The controlled dynamical system Value function Dynamic programming principle Hamilton-Jacobi-Bellman equation Stochastic optimal control Introduction The controlled Markov diffusion Value function Dynamic programming principle Hamilton-Jacobi-Bellman equation Verification Theorem Merton s optimal portfolio Deterministic differential games Introduction The controlled dynamical system Lower and upper values Dynamic programming principle Hamilton-Jacobi-Bellman-Isaacs equation Stochastic differential games Introduction Preliminaries The Markovian scenario State dynamics Admissible controls Terminal reward Strategies Lower and upper values Properties of strategies Properties of the value function Non-randomness vii

12 viii CONTENTS Growth rate Continuity in the space variable Weak dynamic programming principle Optimal stochastic control as a particular case Continuity of the reward function Hamilton-Jacobi-Bellman-Isaacs equation Uniqueness Merton s optimal portfolio: worst-case approach Conclusions and further research Notation A Viscosity solutions 85 A.1 Notion of viscosity solution A.2 Uniqueness for the Dirichlet problem A.3 Discontinuous viscosity solutions A.4 Parabolic equations B Stochastic calculus 95 B.1 Preliminaries B.1.1 Brownian motion and filtration B.1.2 Stopping times and progressive measurability B.1.3 Martingales and local martingales B.2 Stochastic integral and Itô s formula B.2.1 Itô processes B.2.2 Itô s formula B.3 Martingale representation B.4 Girsanov s Theorem B.5 Stochastic differential equations B.6 Controlled diffusions C A measure result 105

13 Preface In [1], Bouchard and Touzi propose a weak version of the dynamic programming principle in the context of stochastic optimal control. Their objective was to avoid technical difficulties related to the measurable selection argument. By doing this they were able to derive the dynamic programming equation without requiring the value function to be measurable. One question that arises naturally is how to extend this approach to stochastic differential games. Zero-sum stochastic differential games were studied rigorously for the first time by Fleming and Souganidis in [2]. These problems are usually studied in a setting where the strong assumptions imply that the value function is continuous, hence measurable. Considering a weak version of the dynamic programming principle gives us the opportunity of studying these problems in a more general setting where the value function does not have a priori much regularity. This thesis tackles the problem of extending the weak dynamic programming principle of [1] to the context of stochastic differential games. This is done in Chapter 4. For convenience of the reader we develop in the first Chapters small introductions to the theories of deterministic optimal control, stochastic optimal control and deterministic two-person zero-sum differential games. In addition, 2 appendices in the end briefly recall and develop some basic notions and results of second-order viscosity solutions and stochastic calculus. 1

14 2 CONTENTS

15 Chapter 1 Deterministic optimal control In this Chapter we consider the deterministic optimal control problem in the finite horizon setting. Our objective is to establish the dynamic programming principle and derive from it the Hamilton- Jacobi-Bellman equation. We aim to stress the main ideas and arguments. Thus, we consider the Mayer problem no running cost with compact valued controls and bounded value function. For a detailed exposition we refer the reader to [3]. 1.1 Introduction We start by outlining in a brief and informal way the main ideas of this Chapter. Consider a state variable, X, driven by a control ν through the following nonlinear system: dxs = µ s, Xs; νs ds, with initial condition Xt = x. Since this variable X depends on t, x, ν, we use the notation X := X ν t,x. We are interested in the terminal value of this variable, X ν t,xt. More precisely, we are interested in the quantity Jt, x; ν := f X ν t,xt, which we want to maximize over all admissible controls. Thus we want to determine the value function vt, x := sup Jt, x; ν. ν We will use a dynamic programming approach to this optimization problem. To establish heuristically the dynamic programming principle we assume that there is an optimal control ν, i.e., there exists ν such that, for all t, x, vt, x = Jt, x; ν. Then, given τ t, we have, by the flow property for ordinary differential equations, that vt, x = f Xt,xT ν = f X ν τ,xt,x ν τt = J τ, Xt,xτ; ν ν = v τ, Xt,xτ ν. 3

16 4 CHAPTER 1. DETERMINISTIC OPTIMAL CONTROL If the existence of admissible controls is not assumed then we should expect the previous equality to be replaced by the so-called dynamic programming principle: vt, x = sup v τ, Xt,xτ ν. ν The dynamic programming principle gives us important information on the local behavior of the value function. Letting τ t we are able to derive an infinitesimal version of it, the Hamilton-Jacobi- Bellman equation. Indeed, if we assume that v is C 1, then we have, by the chain rule, that 0 = sup ν v τ, Xt,xτ ν vt, x τ t sup τ t u t vt, x + µt, x; u, Dvt, x Thus, we should expect v to be a solution of the following Hamilton-Jacobi-Bellman HJB equation: t v + inf µ.; u, Dv t, x = 0. u Since in many applications the value function lacks the regularity to be a classical solution of the previous equation, we need a weak definition of solution for such an equation. As it turns out, the theory of viscosity solutions provides the adequate framework for this derivation. In fact, as we will see, the value function is a viscosity solution of the HJB equation. 1.2 The controlled dynamical system We consider a nonlinear system in R d, written in differential form as { dx ν t,x s = µ s, X ν t,xs; νs ds X ν t,xt = x, 1.1 where ν is a control function in a set U to be specified later, t, x are the initial conditions and µ : [0, T ] R d U R d is a continuous function which is Lipschitz continuous in the space variable, that is µt, x; u µt, y; u K x y, for some constant K. We call [0, T ] R d the state space and we denote it by S. The space of admissible controls, U, is defined as U := {ψ : [0, T ] U : ψ measurable}, where U R M is a compact set. Under the assumptions made on µ, the existence and uniqueness of a solution Xt,x ν for 1.1 follows from the standard theory of ordinary differential equations, for each ν. Furthermore there is continuity with respect to the initial conditions t, x uniformly in t and ν. More precisely, for each x, there is a constant, C x, depending only on K, T, x such that for all ν U, t [0, T ], t t, s t, x B 1 x, X ν t,x s Xν t,xs Cx x x + t t 1 2 This estimate is a particular case of Lemma 148, when there is no diffusion. 1.3 Value function. 1.2 In this Section we characterize our problem. We consider a terminal reward which we want to maximize, Jt, x; ν := f X ν t,xt.

17 1.4. DYNAMIC PROGRAMMING PRINCIPLE 5 The payoff function f is assumed to be bounded and continuous. The problem we consider is to determine the value function given by vt, x := sup Jt, x; ν. ν U The following regularity result for v is easy to obtain. Proposition 1. v is bounded and continuous in [0, T ] R d. Proof. The boundedness of v follows directly from the boundedness of f. To prove continuity we consider ν ε such that Then vt, x J t, x; ν ε + ε. vt, x vt, x Jt, x; ν ε Jt, x ; ν ε + ε = f Xt,xT νε f Xt νε,x T + ε ε, where the convergence, as t, x t, x, follows from the continuity of f and from 1.2. Thus, for t, x sufficiently close to t, x, we have The inequality is obtained in an analogous way. vt, x vt, x 2ε. vt, x vt, x 2ε, 1.4 Dynamic programming principle where In this Section we establish the dynamic programming principle for the value function v. The essential ingredient is the following property: Jt, x; ν 1 τ ν 2 = J τ, X ν1 t,xτ; ν 2, 1.3 ν 1 τ ν 2 s := ν 1 s1 [t,τ] s + ν 2 s1 τ,t ] s denotes the concatenation of controls ν 1, ν 2 at time τ t. This property follows directly from the flow property for dynamical systems and a simple computation: Jt, x; ν 1 τ ν 2 = f ν1 τ ν2 Xt,x T = f X ν2 τ,x ν 1 τt t,x = J τ, X ν1 t,xτ; ν 2 Theorem 2 Dynamic programming principle. For all x R d, t [0, T ] and τ [t, T ] the following holds: vt, x = sup v τ, Xt,xτ ν ν U

18 6 CHAPTER 1. DETERMINISTIC OPTIMAL CONTROL Proof. We prove the two inequalities separately. Step 1: vt, x sup v τ, Xt,xτ ν ν U By 1.3 we have Jt, x; ν = J τ, X ν t,xτ; ν sup J τ, X ν t,xτ; ν 2 ν 2 U = v τ, X ν t,xτ. Taking the supremum on both sides of the inequality yields Step 2: vt, x sup v τ, Xt,xτ ν ν U Fix ε > 0 and consider ν 1, ν 2 U such that vt, x sup v τ, Xt,xτ ν. ν U Then sup v τ, Xt,xτ ν v τ, Xt,xτ ν1 + ε ν U v τ, X ν1 t,xτ J τ, X ν1 t,xτ; ν 2 + ε. Since ε is arbitrary we conclude that sup v τ, Xt,xτ ν v τ, Xt,xτ ν1 + ε ν U J τ, Xt,xτ; ν1 ν 2 + 2ε = Jt, x; ν 1 τ ν 2 + 2ε vt, x + 2ε. sup v τ, Xt,xτ ν vt, x. ν U 1.5 Hamilton-Jacobi-Bellman equation In this Section we prove that the dynamic programming principle implies that v is a viscosity solution of the Hamilton-Jacobi-Bellman equation. We introduce the Hamiltonian where Ht, x, p := inf u U Hu t, x, p, H u t, x, p := p, µt, x; u. Notice that H u t, x, p is continuous in t, x, p, u. continuous. Thus, due to the compactness of U, H is also Theorem 3. The value function v is a viscosity solution of t v + H., Dvt, x = 0, t, x ]0, T [ R d. 1.4

19 1.5. HAMILTON-JACOBI-BELLMAN EQUATION 7 Proof. We begin by proving that v is a viscosity supersolution of 1.4. Consider φ C 1 S and ˆt, ˆx argminv φ. By Remark 101, in the Appendix, we can suppose that ˆt, ˆx is a strict global minimizer of v φ. Thus, for all t, x S, that is v φt, x v φˆt, ˆx, φˆt, ˆx φt, x vˆt, ˆx vt, x. Fix u U and consider the constant control νs := u. Then, for τ ˆt, φˆt, ˆx φ τ, X νˆt,ˆx τ vˆt, ˆx v τ, X νˆt,ˆx τ. By the dynamic programming principle we know that vˆt, ˆx v τ, X νˆt,ˆx τ, hence φˆt, ˆx φ τ, X νˆt,ˆx τ Dividing both sides of 1.5 by τ ˆt and letting τ ˆt we conclude that t φ Dφ, µ., u ˆt, ˆx = t φ + H u., Dφ ˆt, ˆx 0, where we have used the differentiability of φ. Since u U is arbitrary we conclude that Thus v is a viscosity supersolution of 1.4. t φ + H., Dφ ˆt, ˆx 0. To check that v is a subsolution we consider φ C 1 S and ˆt, ˆx argmaxv φ. By Remark 101, in the Appendix, we can suppose that ˆt, ˆx is a strict global maximizer of v φ. Thus, for all t, x S, that is We suppose by contradiction that for some δ > 0. Then, for all u U, v φt, x v φˆt, ˆx, φˆt, ˆx φt, x vˆt, ˆx vt, x. 1.6 t φ + H., Dφ ˆt, ˆx 2δ, t φ + H u., Dφ ˆt, ˆx 2δ. Since H u t, x, p is continuous with respect to u, t, x, p and U is compact we deduce that there exists R > 0 such that, for all u U, t φ + H u., Dφ t, x δ, for all t, x B R ˆt, ˆx. By 1.2, for τ sufficiently close to ˆt, we have s, X νˆt,ˆx s ˆt, ˆx < R,

20 8 CHAPTER 1. DETERMINISTIC OPTIMAL CONTROL for all ν U and all s [ˆt, τ]. Since φ C 1 S, we have φ τ, X νˆt,ˆx τ φˆt, ˆx = τ ˆt t φ H νs., Dφ s, X νˆt,ˆx s ds τ ˆtδ. By 1.6 we then conclude that vˆt, ˆx v τ, X νˆt,ˆx τ φˆt, ˆx φ τ, X νˆt,ˆx τ τ ˆtδ. Taking the supremum in ν we conclude that vˆt, ˆx sup v ν U which contradicts the dynamic programming principle. τ, X νˆt,ˆx τ + τ ˆtδ, We remark that v also satisfies the terminal condition vt, x = fx. This provides a characterization of the value function as a solution of a partial differential equation. For the characterization to be complete, uniqueness of solution for the HJB equation must be established. It is, indeed, possible to prove uniqueness by the standard techniques of first order viscosity solutions. We will not do this study here. Instead we refer the reader to [3].

21 Chapter 2 Stochastic optimal control In this Chapter we turn our attention to stochastic optimal control problems. The results are analogous to those of the previous Chapter. A modern reference on the subject is the well known book by Fleming and Soner, [4]. 2.1 Introduction Stochastic optimal control is, in many points, similar to its deterministic counterpart. Thus the procedure we will use to study this problem will be analogous to the previous one. In this introduction we outline the main differences. In the stochastic scenario we consider a state variable, X ν t,x, which is a stochastic process satisfying a stochastic differential equation, dx ν t,xs = µ s, X ν t,xs; ν s ds + σ s, X ν t,x s; ν s dws, with initial condition Xt,xt ν = x. This state variable is driven by a control ν, which must be a progressively measurable stochastic process. As in the deterministic case, we have a terminal payoff, f Xt,xT ν, which we want to maximize. One difference is that, in the stochastic setting, when the choice of control is to be made, at time t, we can not predict the terminal payoff. Thus we maximize instead its conditional expectation, Jt, x; ν := E [f Xt,xT ν ] Ft. We call this random variable the terminal reward. The stochastic optimal control problem is then to determine the value function: V t, x := esssup Jt, x; ν. ν Even though, a priori, V t, x is a random variable, we will see that it is, in fact, a constant random variable. Thus we can think of V as a function V : [0, T ] R d R. As in the deterministic scenario, it is easy to derive heuristically the dynamic programming principle for this problem. It takes the following form: V t, x = esssup E [V τ, Xt,xτ ν ] Ft. ν Though the heuristic derivation is simple and the analogies with the deterministic case are many, the stochastic version of the dynamic programming principle encloses subtle technical difficulties related to measurability issues. Indeed, as we can see, for the statement of the principle to make sense, we 9

22 10 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL must prove first that V is measurable. At this point, either we work under a restricted setting where strong assumptions imply easily that V is measurable typically that it is even continuous, or we have to use deep measurable selection arguments to prove it. In this Chapter we will take the first route. For the reader interested in following the second, we refer to [5], where that approach is used in the discrete time scenario. Recently, Bouchard and Touzi, proposed in [1] an alternative approach, where a weak version of the dynamic programming principle is used. Following this approach we avoid the measurability issues and we are still able to derive the Hamilton-Jacobi-Bellman equation which is the utmost objective of the whole dynamic programming approach. In Chapter 4 we will use this approach in the context of stochastic differential games. With the dynamic programming principle we are able to derive the Hamilton-Jacobi-Bellman equation HJB. Again we must take a limit τ t, in the context of the dominated convergence Theorem, and use the stochastic counterpart of the chain rule, Itô s formula. Indeed, under the assumption that V is C 1,2, we have by Itô s formula that [ E V τ, Xt,xτ ν ] V t, x F t [ τ = esssup E t V + µ.; ν s, DV + 1 ν t 2 Tr σσ T, ; ν s D 2 V s, X ν t,x s ] ds F t + [ τ +E DV σ.; ν s s, Xt,xs ν ] Ft dw s = esssup ν t If Xt,xs ν remains bounded for s [t, τ], we then have, by the martingale properties of the stochastic integral, that [ τ E DV σ.; ν s, Xt,xs ν ] Ft dw s = 0. t In that case, we can divide both sides of 2.1 by τ t and take the limit as τ t to conclude that sup t V + µ.; u, DV + 1 u 2 Tr σσ T, ; u D 2 V t, x = 0. Thus we should expect V to be a solution of the HJB equation: t V + inf µ.; u, DV 1 u 2 Tr σσ T, ; u D 2 V t, x = 0. Notice that this is a parabolic second order partial differential equation. Like in the first order case, the theory of viscosity solutions provides the adequate framework for the study of such equations. The HJB equation provides us also with a procedure to find optimal strategies. This procedure is explored in Section 2.6 and applied in Section The controlled Markov diffusion We fix a time horizon, T, and consider the classical Wiener space, Ω, F, P, endowed with the standard N dimensional Brownian motion, W, and the natural filtration induced by W, F = {F s : 0 s T }. For further details see Section 4.2. As in the previous Chapter we denote by S := [0, T ] R d the state space. We consider a control space, U, consisting of the progressively measurable processes with values in U, where U R M is a compact set. Let µ : S U R d and σ : S U R d N be continuous functions such that µt, x; u µt, y; u + σt, x; u σt, y; u K x y, µt, x; u + σt, x; u K1 + x.

23 2.3. VALUE FUNCTION 11 The dynamics of the state variable is given by a stochastic differential equation, { dx ν t,x s = µ s, X ν t,xs; ν s ds + σ s, X ν t,x s; ν s dws X ν t,xt = x, 2.2 where t, x are the initial conditions and ν U. Under the assumptions made on µ, σ, we know by Theorem 147 that, for each ν, there exists a unique strong solution, Xt,x, ν of 2.2. Furthermore, by Lemma 148, there is continuity with respect to the initial conditions, uniformly in t and ν, in the sense that, for each x and p 2, there is a constant, C x, depending only on K, T, x, p such that for all ν U, t [0, T ], t t, s t, x B 1 x, 2.3 Value function E [ X ν t,x s Xν t,xs p Ft ] C x x x p + t t p 2 We consider a terminal reward which we want to maximize, Jt, x; ν := E [f Xt,xT ν ] Ft, where the payoff function f is assumed to be bounded and globally Lipschitz. Notice that Jt, x; ν is a random variable and may depend on the past. The problem we consider is to determine the value function given by V t, x := esssup Jt, x; ν. ν U. 2.3 For the notion of essential supremum, see Definition 39. Even though, a priori, V t, x is a random variable, it turns out that it is, in fact, a constant random variable. For a proof of this result we refer the reader to Section 4.4.1, where an analogous result is proved in the context of stochastic differential games. Thus, we may think of V t, x as a function V : S R. It is possible to find controls which are uniformly ε optimal. More precisely we have: Lemma 4. Fix ε > 0. Then there exists ν ε U such that V t, x Jt, x; ν ε + ε. Proof. By Theorem 40 there exists a countable collection {ν i } U such that V t, x = sup Jt, x; ν i. i Consider Λ i := {V t, x Jt, x; ν i + ε} F t. We define Λ 1 := Λ 1, Λ i+1 = Λ i+1 \ i k=1 Λ k. Then {Λ i } F t forms a countable partition of Ω, modulo null sets. We now define Then, since U is compact, ν ε U and ν ε := i ν i 1 Λi. Jt, x; ν ε = i Jt, x; ν i 1 Λi V t, x ε, where we used in the first equality a property of J, to be proved in Lemma 35, that implies: J t, x; 1 Λi ν i = 1 Λi Jt, x; ν i. i i We call this property independence of irrelevant alternatives.

24 12 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL Remark 5. Even though, in the proof of the previous Lemma, we use the compactness of U, the same result still holds if U is not compact. This will be a consequence of Proposition 68, and it will be explored in Section By the previous result and by the non-randomness of V we get the following alternative definition for the value function: Proposition 6. The following holds: V t, x = sup E [Jt, x; ν]. ν U Proof. Since V is a constant random-variable we have Thus, on one hand, On the other hand, by the previous Lemma, and since ε is arbitrary we deduce that V t, x = E[V t, x]. V t, x = E[esssup Jt, x; ν] ν U E [Jt, x; ν]. sup ν U V t, x = E[V t, x] E[Jt, x; ν ε ] + ε sup E [Jt, x; ν] + ε, ν U V t, x sup E [Jt, x; ν]. ν U Using Proposition 6, the next regularity result for V is easy to obtain. Proposition 7. The function V t, x is bounded and continuous. Furthermore V t,. is Lipschitz continuous and V., x is 1 2 Hölder continuous. Proof. The boundedness of V follows directly from the boundedness of f. By Proposition 6, we have for all x B 1 x: V t, x V t, x = sup ν E[Jt, x ; ν] sup E[Jt, x; ν] ν sup E [ f Xt ν,x T f Xt,xT ν ] ν sup KE [ X ν t,x T Xν t,xt ] ν [ X ν t,x T Xν t,xt 2] 1 2 sup KE ν C x K x x 2 + t t 1 2, where the last inequality follows from 2.3. Changing the roles of t, x and t, x we get the desired continuity result on V.

25 2.4. DYNAMIC PROGRAMMING PRINCIPLE 13 Typically, the 1 2 Hölder continuity of V, in the time variable, is proved using the dynamic programming principle. by a slightly different argument. Because we state the dynamic programming principle with stopping times we need some a priori regularity of V in the time variable. This is the reason why we proved at this point this regularity result. The fact that V t, x is continuous is essential to simplify the subsequent exposition. Indeed, due to this fact, we conclude that V τ, Xt,xτ ν is measurable, for any stopping time τ. As we will see in the next Section, this is crucial for the statement of the dynamic programming principle. In [1], Bouchard and Touzi establish a weak version of the dynamic programming principle where such a regularity of the value function is not required. 2.4 Dynamic programming principle where In this Section we establish the dynamic programming principle for the value function V. The essential ingredient is a result analogous to 1.3: [ Jt, x; ν 1 τ ν 2 = E J ] τ, Xt,xτ; ν1 ν 2 Ft, 2.4 ν 1 τ ν 2 s := ν 1 s1 [t,τ] s + ν 2 s1 τ,t ] s denotes the concatenation of controls ν 1, ν 2 at the stopping time τ t. This property follows directly from the flow property for solutions of stochastic differential equations and the tower property of conditional expectations: Jt, x; ν 1 τ ν 2 = E [ f ν1 τ ν2 Xt,x T ] F t [ = E E = E [ f X ν2 τ,x ν 1 t,x τt Fτ ] Ft ] [ J τ, X ν1 t,xτ; ν 2 Ft ]. Theorem 8 Dynamic programming principle. For all x R d, t [0, T ] and stopping time τ [t, T ] the following holds: V t, x = esssup E [V τ, Xt,xτ ν ] Ft. ν U Proof. We prove the two inequalities separately. Step 1: V t, x esssup E [ ] V τ, Xt,xτ F ν t ν U By 2.4 we have Jt, x; ν = E [J τ, Xt,xτ; ν ν ] Ft [ E esssup J τ, Xt,xτ; ν ] ν 2 Ft = E ν 2 U [V τ, X ν t,xτ Ft ]. Taking the supremum on both sides of the inequality yields V t, x esssup E [V τ, Xt,xτ ν ] Ft. ν U Step 2: V t, x esssup E [V τ, Xt,xτ ] ν Ft ν U

26 14 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL This inequality requires more work. Fix ε > 0 and consider ν ε U such that esssup E [V τ, Xt,xτ ν ] [ ] Ft E V τ, Xt,xτ νε Ft + ε. ν U For each s, y S consider ν s,y U such that V s, y Js, y; ν s,y + ε. We know, by continuity of V and by 2.3, that, for each s, y there is also r s,y such that V s, y V s, y ε, Js, y ; ν E[Js, y; ν F s ] ε, for all ν U and for all s, y Bs, y; r s,y, where Bs, y; r := [s r, s] B r y. Since {Bs, y; r : s, y S, 0 < r r s,y } forms a Vitali covering of S, we can find a countable sequence s i, y i, r i such that {Bs i, y i ; r i } i forms a partition of S, modulo null sets, and 0 < r i r si,y i. For the notion of Vitali covering we refer the reader to [6, p. 158]. Define } Λ i := {τ, Xt,xτ νε Bs i, y i ; r i F τ F si, ν i := ν ε τ si ν si,y i, ν := i 1 Λi ν i = i 1 Λi ν ε si ν si,y i. Then ν U and V τ, Xt,xτ νε 1 Λi V s i, y i + ε1 Λi = E [V s i, y i F τ ] + ε 1 Λi E[Js i, y i ; ν si,y i F τ ] + 2ε1 Λi = E[Js i, y i ; ν i F τ ] + 2ε1 Λi J τ, Xt,xτ; νε ν i + 3ε 1 Λi = J τ, X ν t,xτ; ν i + 3ε 1Λi. Thus, esssup E [V τ, Xt,xτ ν ] Ft ν U [ ] E V τ, Xt,xτ νε Ft [ = E V = E i + ε ] τ, Xt,xτ νε Ft 1 Λi + ε [ E J ] τ, Xt,xτ; ν Ft ν i 1Λi + 4ε i [J τ, X ν t,xτ; ν Ft ] + 4ε = Jt, x; ν + 4ε V t, x + 4ε. Since ε is arbitrary, we conclude that esssup E [V τ, Xt,xτ ν ] Ft V t, x. ν U

27 2.5. HAMILTON-JACOBI-BELLMAN EQUATION 15 Remark 9. In our setting the value function is continuous, thus implying that V τ, X ν t,xτ is measurable. If this is not the case then more complicated arguments must be used. Examples of this can be seen in [5], where a deep measurable selection argument is used, and [7], where a compactification of the space of controls is carried through. As an alternative to this, in [1], a weak formulation of the dynamic programming principle is proposed. 2.5 Hamilton-Jacobi-Bellman equation Now that we established the dynamic programming principle and proved that V is a continuous function we can prove that it is a viscosity solution of the Hamilton-Jacobi-Bellman equation. In the stochastic setting the Hamilton-Jacobi-Bellman equation is a parabolic partial differential equation of second order. The Hamiltonian we consider in this case is where Ht, x, p, X := inf u U Hu t, x, p, X, H u t, x, p, X := p, µt, x; u 1 2 Tr σσ T t, x; ux. Notice that H u t, x, p, X is continuous in t, x, p, u, X. Thus, due to the compactness of U, H is also continuous. Theorem 10. The value function V is a viscosity solution of t V + H., DV, D 2 V t, x = 0, t, x ]0, T [ R d. 2.5 Proof. We begin by proving that V is a viscosity supersolution of 2.5. Consider φ C 1,2 S and ˆt, ˆx argminv φ. By Remark 101, in the Appendix, we can suppose that ˆt, ˆx is a strict global minimizer of V φ. Thus, for all t, x S, that is V φt, x V φˆt, ˆx, φˆt, ˆx φt, x V ˆt, ˆx V t, x. Fix u { U and consider the constant } control ν s := u. For each r > 0, consider the stopping time τ r := inf s ˆt : X νˆt,ˆx s / B rˆx. Then, φˆt, ˆx φ τ r, X νˆt,ˆx τ r V ˆt, ˆx V τ r, X νˆt,ˆx τ r. By the dynamic programming principle we know that [ ] V ˆt, ˆx E V τ r, X r νˆt,ˆx τ Fˆt, hence [ ] φˆt, ˆx E φ τ r, X r νˆt,ˆx τ Fˆt By Itô s formula we have φ τ r, X νˆt,ˆx τ r φˆt, ˆx = τr ˆt τr + ˆt t φ H νs.; Dφ, D 2 φ s, X νˆt,ˆx s ds + Dφ σ.; ν s s, X νˆt,ˆx s dw s. 2.7

28 16 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL Since τ r is an exit time and σ, Dφ are continuous we have that Dφ σ.; ν s s, X νˆt,ˆx s bounded for s [ˆt, τ], hence E [ τr ˆt ] Dφ σ.; ν s s, X νˆt,ˆx s dw s Fˆt = 0. Taking expectations on both sides of 2.7 and using 2.6 we get E [ τr ˆt t φ H νs.; Dφ, D 2 φ s, X νˆt,ˆx s ds Fˆt ] 0 Dividing the previous inequality by τ r ˆt and letting r 0 we have, by dominated convergence, t φ H u.; Dφ, D 2 φ ˆt, ˆx 0. remains Thus V is a supersolution of 2.5. To check that V is a subsolution we consider φ C 1,2 S and ˆt, ˆx argmaxv φ. By Remark 101 we can suppose that ˆt, ˆx is a strict global maximizer of V φ. Thus, for all t, x S, that is V φt, x V φˆt, ˆx, φˆt, ˆx φt, x V ˆt, ˆx V t, x. 2.8 We suppose by contradiction that t φ + H., Dφ, D 2 φ ˆt, ˆx 2δ, for some δ > 0. Consider ϕt, x := φt, x + t ˆt 2 + x ˆx 4. Then, for all u U, t ϕ + H u., Dϕ, D 2 ϕ ˆt, ˆx 2δ. Since H u t, x, p, X is continuous with respect to u, t, x, p, X and U is compact, we deduce that there exists R > 0 such that, for all u U, t ϕ + H u., Dϕ, D 2 ϕ t, x δ, for all t, x B R ˆt, ˆx. We define Let ν be such that η := min ϕ φ > 0. B R ˆt,ˆx V ˆt, ˆx Jˆt, ˆx; ν + η 2, 2.9 { } and consider the stopping time τ := inf s ˆt : s, X νˆt,ˆx s / B R ˆt, ˆx. Since ϕ C 1,2 S, we have by Itô s formula that ϕ τ, X νˆt,ˆx τ ϕˆt, ˆx = τ ˆt τ + ˆt t ϕ H νs., Dϕ, D 2 ϕ s, X νˆt,ˆx s ds + τ ˆtδ + Dϕ σ.; ν s s, X νˆt,ˆx s τ ˆt dw s Dϕ σ.; ν s s, X νˆt,ˆx sdw s. 2.10

29 2.6. VERIFICATION THEOREM 17 Since τ is an exit time and σ, Dϕ are continuous we have that Dϕ σ.; ν s s, X νˆt,ˆx s remains bounded for s [ˆt, τ], hence Thus, taking expectations on 2.10 we get [ τ ] E Dϕ σ.; ν s s, X νˆt,ˆx s dw s Fˆt = 0. ˆt [ ] E ϕ τ, X νˆt,ˆx τ Fˆt ϕˆt, ˆx 0. By definition of η and since τ, X νˆt,ˆx τ B R ˆt, ˆx, we have ϕ τ, X νˆt,ˆx τ φ τ, X νˆt,ˆx τ + η, and hence [ ] E φ τ, X νˆt,ˆx τ Fˆt φˆt, ˆx η. By 2.8 we then conclude that [ ] V ˆt, ˆx E V τ, X νˆt,ˆx τ Fˆt η. [ ] φˆt, ˆx E φ τ, X νˆt,ˆx τ Fˆt Thus [ ] V ˆt, ˆx E V τ, X νˆt,ˆx τ Fˆt + η. On the other hand, we have by 2.9 and 2.4 that We have thus reached a contradiction. V ˆt, ˆx Jˆt, ˆx; ν + η [ 2 ] = E J τ, X νˆt,ˆx τ; ν Fˆt + η [ ] 2 E V τ, X νˆt,ˆx τ Fˆt + η 2. Like in the deterministic case, the value function V also satisfies the terminal condition V T, x = fx. Regarding uniqueness of solution, we refer the reader to Section There we discuss uniqueness of solution for the Hamilton-Jacobi-Bellman-Isaacs HJBI equation, which is a second-order partial differential equation similar to the HJB equation. The same arguments can be adapted easily to prove uniqueness of solution for the HJB equation in this case. In fact, it is much easier, since in this Chapter we only consider the case of bounded value function. 2.6 Verification Theorem In this section we establish a verification result useful for the synthesis of optimal controls.

30 18 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL Theorem 11 Verification Theorem. Let v C 1,2 [0, T R d C[0, T ] R d be a classical solution of 2.5 such that vt, x = fx, and v has polynomial growth. Suppose that there exists a measurable function u : S U such that Then and ν s := u s, X t,x s is an optimal control. H.; Dv, D 2 vt, x = H u..; Dv, D 2 vt, x. V t, x = vt, x, Proof. Since u is measurable then νs := u s, X t,x s U. Let X := Xt,x ν be the unique strong solution of 2.2 and consider the stopping time τ n := T inf{s t : X s / [ n, n]}. Applying Itô s formula we have τn vτ n, Xτ n vt, x = t v H u..; Dv, D 2 v s, Xsds + Since τ n is an exit time, we have hence E [ τn t = t τn + t τn t Dv σ.; u.s, XsdW s Dv σ.; u.s, XsdW s. Dv σ.; u.s, XsdW s Ft ] = 0, vt, x = E [vτ n, Xτ n F t ]. Since v has polynomial growth we can take n to conclude, by the dominated convergence Theorem, that vt, x = E [vt, XT F t ] = E [fxt F t ] = Jt, x; ν. On the other hand, if we consider an arbitrary control ν U and X ν, the unique strong solution of 2.2, then we remark that H u..; Dv, D 2 v s, X ν s H νs.; Dv, D 2 v s, X ν s, hence, by an analogous argument to the one used before, we deduce that E [vτ n, X ν τ n F t ] vt, x E Letting n we deduce that [ τn t Dv σ.; ν s s, X ν sdw s Ft ] = 0. vt, x E [vt, X ν T F t ] = E [fx ν T F t ] = Jt, x; ν. Since ν is arbitrary and vt, x = Jt, x; ν we conclude that Thus ν is an optimal control. V t, x vt, x = Jt, x; ν V t, x.

31 2.7. MERTON S OPTIMAL PORTFOLIO Merton s optimal portfolio We now give an example of application of the previous results to the well known Merton s optimal portfolio problem. The setting for Merton s problem is that of a market with a risky asset S t and a non-risky asset which follow the following stochastic differential equation S 0 t ds t = S t µ t dt + σ t dw t, ds 0 t = S 0 t r t dt. In this setting we consider a self-financed portfolio which value, X t, we want to maximize. The value of the portfolio is to be maximized over all admissible strategies. An admissible strategy, π U, indicates the ratio, π t [π 0, π 1 ], of the value of the portfolio to invest at time t in the risky asset. The self-financing conditions are then: dx t = X tπ t ds t + X t1 π t S t S 0 t ds 0 t = X t π t µ t + 1 π t r t dt + π t σ t dw t. By choosing only self-financed strategies we are not allowing money to be added or taken from the portfolio. Furthermore it is easy to see that the value of the portfolio never becomes negative. Indeed, if at some point the value reaches 0 the we have dx t = 0 so that it remains 0. Given a terminal time T, Merton s optimal portfolio problem consists in optimizing, over all admissible strategies π U, the utility of the terminal value of the portfolio, that is, determine V t, x = sup E [f Xt,xT π ] Ft, π U where U is the set of progressively measurable processes taking values in [π 0, π 1 ], f is an utility function, Xt,xs π is the value of the portfolio at time s, with strategy π and initial data t, x. We consider in the following a power utility function, fx = x p, for p 0, 1, and deterministic parameters, µ t = µ, σ t = σ. In this case we have f R + 0 = R + 0, thus V t, x 0. Because the dynamics is homogeneous, that is, Xt,λx π = λxπ t,x, the power utility function is particularly easy to deal with. This can be seen in the following Lemma: Lemma 12. If fx is multiplicative, i.e. fxy = fxfy, then V t, x = fxv t, 1 Proof. This is a consequence of the following computation: Jt, x; π = E [f Xt,xT π ] Ft [ X π t,x T = E f x ] x F t = fxe [f Xt,1T π ] Ft = fxjt, 1; π. Thus we conclude that V is differentiable in the space variable and DV t, x = f xv t, 1 = px p 1 V t, 1 > 0, D 2 V t, x = f xv t, 1 = p1 px p 2 V t, 1 < 0,

32 20 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL for x > 0. Taking these facts in account and denoting ht := V t, 1, we now need to determine π such that Ht, x; px p 1 ht, p1 px p 2 ht = H π t,x t, x; px p 1 ht, p1 px p 2 ht. Thus π t, x = argmin πµ r + rpx p ht + 1 π [π 0,π 1] 2 σ2 π 2 p1 px p ht = argmin πµ r + r σ2 π 2 1 p = min π [π 0,π 1] µ r π 1, σ 2 1 p π 0. It is then easy to deduce that is a C 1,2 solution of 2.5 such that vt, x = x p e pt tr+π µ r 1 p 2 π σ 2 H.; Dv, D 2 vt, x = H π..; Dv, D 2 vt, x. Thus we conclude, by the verification Theorem, that V t, x = vt, x and π is an optimal control. Later, in Section 4.7, we will perform a worst-case approach to this problem, when the parameters µ, σ are considered to be stochastic.

33 Chapter 3 Deterministic differential games In this Chapter we consider two-person zero-sum differential games. Analogously to Chapter 1 our objective is to establish the dynamic programming principle and use it to characterize the value functions as viscosity solutions of the associated Hamilton-Jacobi-Bellman-Isaacs HJBI equations. The exposition is similar to the one in [3]. For more classical approaches without viscosity solutions we refer the reader to [8, 9]. 3.1 Introduction The scenario for a two-person differential game is similar to that of optimal control. One main difference is that now the state variable is controlled by two controls, a A and b B: dx a,b t,x s = µ s, X a,b t,x s; as, bs ds. The controls are adjusted by two players, who have antagonic objectives. We assume that player one wants to maximize a terminal reward, while the other wants to minimize it. The terminal reward is the quantity Jt, x; a, b := f X a,b t,x T. In this scenario we will give advantage to one of the players by allowing him to use strategies. A strategy for player 1, α := α[b], is a mapping α : B A. Analogously we define strategy for player 2. We then consider the optimization problems of determining the value functions vt, x := inf β ut, x := sup α sup Jt, x; a, β[a], a Jt, x; α[b], b. inf b These value functions, v, u, are called, respectively, lower and upper value of the game. The names are justified by the inequality, v u, which is a consequence of the fact that on v player two is given advantage, while the opposite happens on u. In fact, not to give too much advantage to one of the players, and to consider a problem that can be tackled via a dynamic programming approach, we need to consider only strategies that do not foresee the future of the opponent s control. This is made precise by introducing the notion of non-anticipativity, which is a property of interest in many applications. As the reader may predict, each of the value functions will verify a dynamic programming principle that will allows us to prove that they are solutions to a partial differential equation. This is, indeed, 21

34 22 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES true. The dynamic programming principle now takes the form: vt, x = inf sup v τ, X a,β[a] t,x τ, β a ut, x = sup inf u τ, X α[b],b t,x τ. α b The equations that the value functions satisfy are the so-called Hamilton-Jacobi-Bellman-Isaacs equations: t v + inf sup µ.; a, b, Dv t, x = 0, a b t u + sup inf µ.; a, b, Du t, x = 0. b a After establishing uniqueness for the above equations we can deduce that if Isaac s condition holds, that is, if, for all t, x and p, inf a sup b µt, x; a, b, p = sup b holds, then the game has a value, i.e., v = u. 3.2 The controlled dynamical system inf µt, x; a, b, p a In two-person differential games we must consider two control spaces A, B consisting of the measurable functions taking values in compact sets A R da, B R d b, respectively. We say that A, B are the control spaces of players one and two respectively. The nonlinear system in R d that characterizes the state variable is similar to the one considered in optimal control, { dx a,b t,x s = µ s, X a,b t,x s; as, bs ds 3.1 X a,b t,x t = x, where a A, b B, t, x are the initial conditions and µ : [0, T ] R d A B R d is a continuous function which is Lipschitz continuous in the space variable, that is µt, x; a, b µt, y; a, b K x y, for some constant K. The space S := [0, T ] R d is again denoted by state space. Under the assumptions made on µ, the existence and uniqueness of a solution X a,b t,x for 1.1 follows from the standard theory of ordinary differential equations, for each a, b. Furthermore there is continuity with respect to the initial conditions t, x uniformly in t and a, b. More precisely, for each x, there is a constant, C x, depending only on K, T, x such that for all a A, b B, t [0, T ], t t, s t, x B 1 x, X a,b t,x s Xa,b t,x s C x x x + t t This estimate is a particular case of Lemma 148, when there is no diffusion. 3.3 Lower and upper values We consider a terminal reward, which player one wants to maximize and player two wants to minimize, Jt, x; a, b := f X a,b t,x T.

35 3.4. DYNAMIC PROGRAMMING PRINCIPLE 23 The payoff function f is assumed to be bounded and continuous. We can think of this payoff as something that player two will have to pay to player one in the terminal time T. This would explain their antagonic interests. In this scenario it is natural to define the two following value functions: v s t, x := sup inf Jt, x; a, b, u s t, x := inf a A b B sup b B a A Jt, x; a, b. We call v s the lower static value function and u s the upper static value function. Notice that, on the lower value function, advantage is given to player 2, who is allowed to make his choice with the information of the whole future response of the first player. Similarly, on the upper value, advantage is given to the first player. In a dynamic formulation of the game we want to give advantage to one of the players but without letting him foresee the future. This is achieved by introducing the notion of non-anticipating strategies: Definition 13. A strategy for player 2 is a mapping β : A B. A strategy, β, is said to verify the non-anticipativity property if for every a 1, a 2 A, τ [t, T ] we have: s [t,τ] a 1 s = a 2 s s [t,τ] β[a 1 ]s = β[a 2 ]s. The space of non-anticipative strategies for player 2 is denoted by. The definition of strategy for player 1 is analogous. The space of non-anticipative strategies for player 1 is denoted by Γ. We are now able to define the value functions which will be studied in the subsequent sections. Definition 14. The lower value of a differential game is defined as vt, x = inf sup β a A Similarly, the upper value of a differential game is Jt, x; a, β[a]. ut, x = sup inf Jt, x; α[b], b. α Γ b B The names upper and lower value are justified by the following inequality: v u. 3.3 This intuitive inequality needs to be proved. We will do that, in an indirect way, using the HJBI equations associated with the game. Using the same arguments as in the deterministic optimal control problem we deduce the following result on the regularity of v, u: Proposition 15. The lower and upper value functions, v, u, are bounded and continuous in S. 3.4 Dynamic programming principle In this Section we establish the dynamic programming principle for the value functions v, u. Like in the case of optimal control, property 1.3 will be essential. Theorem 16 Dynamic programming principle. For all x R d, t [0, T ] and τ [t, T ] the following holds: vt, x = inf v τ, X a,β[a] t,x τ, sup β a A ut, x = sup inf u τ, X α[b],b t,x b B α Γ τ.

36 24 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES Proof. We give only the proof for the lower value function, since the other is similar. We prove the two inequalities separately. Step 1: vt, x inf v τ, X a,β[a] t,x τ. sup β a A Let β ε 1 be such that inf sup β a A For each y, let β ε y be such that We now define β ε by v τ, X a,β[a] t,x τ sup v a A τ, X a,βε 1 [a] t,x vτ, y sup J τ, y; a, βy[a] ε ε. a A β ε [a] := β1[a] ε τ β ε [a]. X a,βε 1 [a] t,x τ τ ε. It is straightforward to check that β ε. Furthermore, for any a A, Jt, x; a, β ε [a] = J τ, X a,βε [a] t,x τ; a, β ε [a] = J τ, X a,βε 1 [a] t,x τ; a, β ε [a] X a,βε 1 [a] t,x τ v τ, X a,βε 1 [a] t,x τ + ε sup v τ, X a,βε 1 [a] t,x τ + ε a A inf v τ, X a,β[a] t,x τ + 2ε. Since a A is arbitrary, we deduce that sup a A Jt, x; a, β ε [a] inf sup β a A sup β a A v τ, X a,β[a] t,x τ + 2ε. Thus, vt, x inf sup β a A v τ, X a,β[a] t,x τ + 2ε. Step 2: vt, x inf sup β a A Let β ε be such that v τ, X a,β[a] t,x τ. vt, x sup Jt, x; a, β ε [a] ε, a A and a ε 1 be such that sup v a A τ, X a,βε [a] t,x τ v τ, X aε 1,βε [a ε 1 ] t,x τ + ε. Finally, define β ε by β ε [a] := β ε [a ε 1 τ a],

37 3.5. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 25 and let a ε 2 be such that sup J a τ, X aε 1,βε [a ε 1 ] t,x τ; a, β ε [a] J τ, X aε 1,βε [a ε 1 ] t,x τ; a ε 2, β ε [a ε 2] + ε. Let a ε := a ε 1 τ a ε 2. Then, by the non-anticipativity property of β ε and by definition of β ε, we have Thus, β ε [a ε ] = β ε [a ε 1] τ βε [a ε 2]. vt, x sup Jt, x; a, β ε [a] ε a A Jt, x; a ε, β ε [a ε ] ε = J τ, X aε,β ε [a ε ] t,x τ; a ε, β ε [a ε ] ε = J τ, X aε 1,βε [a ε 1 ] t,x τ; a ε 2, β ε [a ε 2] ε sup J τ, X aε 1,βε [a ε 1 ] t,x τ; a, β ε [a] 2ε a v τ, X aε 1,βε [a ε 1 ] t,x τ 2ε sup v τ, X a,βε [a] t,x τ 3ε a A inf v τ, X a,β[a] t,x τ 3ε. sup β a A 3.5 Hamilton-Jacobi-Bellman-Isaacs equation In this Section we use the dynamic programming principle to prove that the value functions are viscosity solutions of the associated Hamilton-Jacobi-Bellman-Isaacs equations. Consider the Hamiltonians where H t, x, p := inf sup a A b B H a,b t, x, p, H + t, x, p := sup inf a A Ha,b t, x, p, b B H a,b t, x, p := p, µt, x; a, b. Notice that H a,b t, x, p is continuous in t, x, p, a, b. Thus, due to the compactness of A and B, H is also continuous. Theorem 17. The lower value function v is a viscosity solution of t v + H., Dvt, x = 0, t, x ]0, T [ R d. 3.4 The upper value function u is a viscosity solution of t u + H +., Dut, x = 0, t, x ]0, T [ R d. 3.5 Proof. We only make the proof for the lower value, since the other is analogous. We start with the supersolution property, arguing by contradiction. With that purpose in mind consider φ C 1 S, ˆt, ˆx argminv φ and suppose that t φ + H., Dφˆt, ˆx 3δ, 3.6

38 26 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES where δ > 0. By Remark 101, in the Appendix, we can suppose that ˆt, ˆx is a strict global minimizer. Thus, for all t, x S, that is v φt, x v φˆt, ˆx, φˆt, ˆx φt, x vˆt, ˆx vt, x. 3.7 By 3.6, there is a A such that, for all b B, t φ + H a,b., Dφ ˆt, ˆx 2δ. Because B is compact, then H a,b t, x is continuous in t, x uniformly with respect to b. Thus there is R such that, for all b B, t φ + H a,b., Dφ t, x δ, for all t, x B R ˆt, ˆx. Let τ be sufficiently close to ˆt so that, for all a A, b B, s, X a,b ˆt,ˆx s B R ˆt, ˆx, for all s [ˆt, τ]. Consider β arbitrary. Then, because φ is C 1, we have φ τ, X a,β[a ] ˆt,ˆx τ φˆt, ˆx = τ ˆt t φ H a,β[a ]., Dφ τ ˆtδ. s, X a,β[a ] ˆt,ˆx s ds Thus, by 3.7, v τ, X a,β[a ] ˆt,ˆx τ vˆt, ˆx φ τ, X a,β[a ] ˆt,ˆx τ ˆtδ. τ φˆt, ˆx Hence and, since β is arbitrary, sup v a A inf sup β a A τ, X a,β[a] ˆt,ˆx v τ, X a,β[a] ˆt,ˆx τ τ which contradicts the dynamic programming principle. vˆt, ˆx + τ ˆtδ, vˆt, ˆx + τ ˆtδ, To prove that v is a subsolution of 3.4 we also proceed by contradiction. Consider φ C 1 S, ˆt, ˆx argmaxv φ and suppose that t φ + H., Dφˆt, ˆx 4δ. 3.8 By Remark 101, in the Appendix, we can suppose that ˆt, ˆx is a strict global maximizer. Thus, for all t, x S, that is v φt, x v φˆt, ˆx, φˆt, ˆx φt, x vˆt, ˆx vt, x. 3.9

39 3.5. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 27 By 3.8, for each a A there is b a B such that t φ + H a,ba., Dφ ˆt, ˆx 3δ. Furthermore, by continuity of H a,b with respect to a, for each a there is r a such that, for all ã B ra a, t φ + Hã,ba., Dφ ˆt, ˆx 2δ. Since A is compact and {B ra a : a A} is an open covering of A, there is a finite collection {a i, r i } such that {B ri a i } covers A. We then define Λ 1 := B r1 a 1, Λ i+1 := B ri+1 a i+1 \ i k=1 Λ k. Now define ψ : A B by ψa := i 1 Λi ab ai, and β by β[a] s := ψa s. Then, for all a A, t φ + H a,ψa., Dφ ˆt, ˆx 2δ. Because A, B are compact then H a,b t, x is continuous in t, x uniformly with respect to a, b. Thus, there is R such that, for all a A, t φ + H a,ψa., Dφ t, x δ, for all t, x B R ˆt, ˆx. Let τ be sufficiently close to ˆt so that, for all a A, b B, s, X a,b ˆt,ˆx s B R ˆt, ˆx, for all s [ˆt, τ]. Then, because φ is C 1, we have, for an arbitrary a A, φ τ, X a,β[a] ˆt,ˆx τ φˆt, ˆx = τ ˆt t φ H as,ψas., Dφ τ ˆtδ. s, X a,β[a] ˆt,ˆx s ds Thus, by 3.9, v τ, X a,β[a] ˆt,ˆx τ vˆt, ˆx φ τ, X a,β[a] ˆt,ˆx τ ˆtδ. τ φˆt, ˆx Taking the supremum in a we get sup v a A Thus τ, X a,β[a] ˆt,ˆx τ vˆt, ˆx τ ˆtδ. inf sup β a A vτ, X a,β[a] τ vˆt, ˆx τ ˆtδ, ˆt,ˆx which contradicts the dynamic programming principle. It remains to establish uniqueness of solution for the HJBI equation. Theorem 18. The lower value v is the minimal supersolution and maximal subsolution of 3.4 in the class of bounded functions. In particular, it is the unique viscosity solution of 3.4 in that class. An analogous result holds for the upper value, u.

40 28 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES For the proof of the previous Theorem we refer the reader to [3, p. 442]. We will establish later uniqueness of solution for the analogous second order HJBI equation appearing in the stochastic scenario, see Section We can finally prove 3.3. Corollary 19. Let v, u be the lower and upper values of the game, respectively. Then v u. Proof. Notice that H H +. This implies that v is a subsolution of 3.5. Since, by the previous Theorem, u is the maximal subsolution of 3.5 we conclude that v u. Using the same arguments we can also establish a criterion for the game to have a value: Corollary 20 Isaac s condition. If H = H + then the game has a value, that is, v = u. Proof. The proof follows directly from the fact that in this case equations 3.4 and 3.5 are the same, hence, by uniqueness of solution, we have v = u.

Prof. Erhan Bayraktar (University of Michigan)

Prof. Erhan Bayraktar (University of Michigan) September 17, 2012 KAP 414 2:15 PM- 3:15 PM Prof. (University of Michigan) Abstract: We consider a zero-sum stochastic differential controller-and-stopper game in which the state process is a controlled