A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION. Contents

Size: px

Start display at page:

Download "A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION. Contents"

Ruby Whitehead
5 years ago
Views:

1 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION BOTAO WU Abstract. In ths paper, we attempt to answer the followng questons about dfferental games: 1) when does a two-player, zero-sum dfferental game have a value and s t unque? 2) How can we characterze the value of the game and relate ths value to some Hamlton-Jacob equaton? 3) When n general does the Nash Equlbrum of the dfferental game exst? To answer these questons, we use control theory and vscosty soluton approaches, appealng to the exstence of the Isaacs condtons: the relaton between the upper and lower Hamltonans. Contents 1. Introducton to dfferental game and optmal control 1 2. Strateges 4 3. Propertes of the value functon 5 4. Vscosty solutons and another verfcaton theorem 9 5. characterzaton and exstence of Nash equlbrum 14 Acknowledgments 17 References Introducton to dfferental game and optmal control We consder the followng ODE: For t < T, and x R m, ẋ(t) = f(t, x(t), u 1 (t), u 2 (t),..., u N (t)) (1.1) x( ) = x 0 We can nterpret the dynamcs of ths equaton n a game theory perspectve. Each player 1, 2,..., N chooses a control u (t) : [, T ] U R m. We denote by Φ ( ) the set of all such controls u that are feasble for player at tme. The state varable x(t) s a soluton to the above system and a response to the controls u 1, u 2,..., u N. In order for (1.1) to have a unque soluton, we assume that f : [, T ] R m U 1... U N R m s bounded and Lpschtz contnuous n x: (1.2) f(t, x, u 1, u 2,..., u N ) f(t, y, u 1, u 2,..., u N ) C 1 x y, for some C 1 R and all t [, T ]. Date: AUGUST 30,

2 2 BOTAO WU We defne: P (, x 0, u 1, u 2,..., u N ) = h (t, x(t), u 1 (t), u 2 (t),..., u N (t))dt + g (x(t )) to be the pay-off functonal of player. We can nterpret g : R m R as the termnal pay-off that s bounded and Lpschtz contnuous n x: (1.3) g(x) g(y) C 2 x y Smlarly, h : [, T ] R m U 1,..., U N R can be understood to be the runnng cost. We also assume h s bounded and Lpschtz contnuous n x: (1.4) h(t, x, u 1, u 2,..., u N ) h(t, y, u 1, u 2,..., u N ) C 3 x y for some C 2, C 3 R, and for all t [, T ]. We consder two types of controls. One type s called the open-loop control. By usng ths type of control, a player cannot make any observaton of the current state of the system. The other type of control s called the feedback control, whch enable the player to observe the current state of the system. Formally, Defnton 1.5. u (t) s a open loop control f u (t) = φ(t), where φ : [, T ] U. and u (t) s a feedback control f u (t) = φ(t, x(t)), where φ : [.T ] R m U. Let us now consder a two player, zero-sum game. Ths zero-sum game s determned by one sngle payoff: P = P 1 = P 2, snce, by defnton, one player s gan of the pay-off s exactly balanced by the loss of the pay-off of the other player n the zero-sum game. Here we assume that the frst player tres to maxmze the pay-off by choosng a feedback control φ 1 Φ 1 whle the second player s mnmzng the pay-off by choosng a feedback control φ 2 Φ 2, where Φ 1 and Φ 2 are player 1 s and 2 s feasble set of controls respectvely We wrte the pay-off as: P (, x 0, φ 1, φ 2 ) = the soluton to followng system: (1.6) h(t, x(t), φ 1 (t, x(t)), φ 2 (t, x(t)))dt + g(x(t )), where x s ẋ(t) = f(x(t), φ 1 (t, x(t)), φ 2 (t, x(t))) x( ) = x 0 We assume that functons g and h satsfy assumptons (1.3) and (1.4). φ 1 Φ 1 Defnton 1.7. The upper value functon of a two-player, zero-sum dfferental game s gven by V + (, x 0 ) = P (, x 0, φ 1, φ 2 ). The lower value func- φ 2 Φ 2 ton s gven by V (, x 0 ) = P (, x 0, φ 1, φ 2 ). One can show that φ 2 Φ 2 φ 1 Φ 1 V V +. We say that the game has a value f V + = V, n whch case, the value s V = V + = V. Defnton 1.8. We defne the upper and lower Hamltonan functons:

3 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION3 H + (t, x, p) = h(t, x, φ 1, φ 2 ) + p f(x, φ 1, φ 2 ) } φ 2 Φ 2 φ 1 Φ 1 H (t, x, p) = h(t, x, φ 1, φ 2 ) + p f(x, φ 1, φ 2 ) } φ 2 Φ 2 φ 1 Φ 1 We assume that there exst φ 1 (t, x, p) arg φ 1 Φ 1 φ 2 Φ 2h(t, x, φ1, φ 2 )+p f(x, φ 1, φ 2 )}, and φ 2 (t, x, p) arg h(t, x, φ 1, φ 2 )+p f(x, φ 1, φ 2 )} such that the Isaacs φ 2 Φ 2 φ 1 Φ 1 condton, the equalty of the two Hamltonan functons, holds: (1.9) H(t, x, p) := H + (t, x, p) = H (t, x, p) The above defntons allow us to prove the Verfcaton Theorem. The theorem states that n order to check that a gven functon s ndeed the value functon of the game, t suffces to check that ths functon s a soluton to a partal dfferental equaton (PDE), called a Hamlton-Jacob equaton. Theorem Let V (t, x) be a C 1 functon on [, T ] R m satsfyng: V t (t, x) + H(t, x, D x V (x)) = 0 (1.11) V (T, x) = g(x) If φ 1 Φ 1 and φ 2 Φ 2 where Φ 1 and Φ 2 are sets of feasble controls of both players, then V s the value of the game. Moreover, φ 1 (t, x, D x V (t, x)) and φ 2 (t, x, D x V (t, x)) actually are the optmal controls of player I and player II,.e., V (, x 0 ) = P (, x 0, φ 1, φ 2 ) = P (, x 0, φ 1, φ 2 ) = P (, x 0, φ 1, φ 2 ). φ 1 Φ 1 φ 2 Φ 2 Proof. For any (, x 0 ) [, T ] R m, let φ 1 Φ 1, and let x( ) solve the (1.6) wth controls φ 1 ( ) and φ 2 ( ). We compute: t d dt [V (t, x(t)) + h(s, x(s), φ 1 (s, x(s)), φ 2 (s, x(s)))ds = V t (t, x(t))+d x V (t, x(t)) f(x(t), φ 1 (t, x(t)), φ 2 (t, x(t)))+h(t, x(t), φ(t, x(t)), φ 2 (t, x(t)) V t (t, x(t))+ φ 1 Φ 1 D x V (t, x(t)) f(x(t), φ 1, φ 2 (t, x(t)))+h(t, x(t), φ 1, φ 2 (t, x(t))} = V t (t, x(t)) + H(t, x(t), D x V (t, x(t))) = 0 by (1.9). Integratng over [, T ], we get V (, x 0 ) 0 V (T, x(t )) + h(t, x(t), φ(t, x(t)), φ 2 (t, x(t)))dt Snce V (T, x(t )) = g(x(t )) by assumpton, we get: V (, x 0 ) h(, x 0, φ 1, φ 2 )dt + g(x(t )) φ 1 Φ 1 φ 1 Φ 1 P (, x 0, φ 1, φ 2 ) = V + (, x 0 )

4 4 BOTAO WU Smlarly, we can show that V (, x 0 ) P (, x 0, φ 1, φ 2 ) = V (, x 0 ). φ 2 Φ 2 Snce V V +, we have V (, x 0 ) = P (, x 0, φ 1, φ 2 ) s the value of the game. Remark: In the nte horzon problem whch has a pay-off functon of the form: P (x 0, φ 1, φ 2 ) = 0 e λt h(x 0, φ 1, φ 2 )dt for some λ > 0, one can show n a smlar way that the assocated Hamlton-Jacob equaton s: λv (x) + H(x, DV (x)) = 0. It should be notced that nether the open-loop controls nor the feedback controls gve addtonal ormaton about the decsons made by the other player. Ths s not consstent wth the tradtonal game theory lterature where a player s acton s usually dependent upon others actons and s often nterpreted as the best response to the other player s actons. Therefore, we ntroduce the concept of strategy n dfferental games. 2. Strateges For convenence, we wrte Φ ( ) = Φ 1 ( )... Φ 1 ( ) Φ +1 ( )... Φ N ( ). We also wrte an N tuple of control (u ) = (u 1,..., u N ) and an N 1 tuple of control (u ) = (u 1,...u 1, u +1,..., u N ), where u Φ. Defnton 2.1. A map α : Φ ( ) Φ ( ) s a non-antcpated strategy f, for all t > and (u ), (v ) Φ ( ), the condton (u )( ) = (v )( ) on [, t] mples that α [(u )]( ) = α [(v )]( ) on [, t]. A map α : Φ ( ) Φ ( ) s a non-antcpated strategy wth delay τ f, for all t > and (u ), (v ) Φ ( ), the condton (u )( ) = (v )( ) on [, t] mples that α [(u )]( ) = α [(v )]( ) on [, t + τ]. We denote the space A d () to be the set of delay strateges of player. Intutvely, by usng a strategy α A d (), a player should be able to answer to each control from other players (u (t)) Φ ( ) at tme by a control from hs own feasble set u Φ ( ) such that u = α[(u )]. Formally: Theorem 2.2. If (α 1,..., α N ) A 1 d ()... A N d (), then there exst unque N-tuple controls (u ) Φ 1 ( )... Φ N ( ) such that α ((u )) = u on [, ]. Proof. It suffces to consder a two-player game snce each player could thnk of all the others as one sngle player wth strateges from Φ 1 ( ). Let α A d ( ) and β B d ( ) be player 1 s and 2 s strateges. We clam that for any k 1, there exst unque (u k, v k ) : [, + kτ] Φ 1 Φ 2 such that α(v k ) = u k and β(u k ) = v k on [, + kτ]. We prove by nducton. For k = 1. Pck any v Φ 2 ( ), and set u 1 = α[v] and v 1 = β[u 1 ]. Snce α s a delay strategy and v 1,v Φ 2 ( ) agrees on the measure zero set [, ], we have α[v 1 ] = α[v] = u 1 on [, + τ]. Suppose for any k 1, there exst unque (u k, v k ) Φ 1 ( ) Φ 2 ( ) such that α[v k ] = u k and β[u k ] = v k on [, + τ]. We set u k+1 = α[v k ], and v k+1 = β[u k+1 ]

5 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION5 on [, ]. From our constructon, u k+1 = α[v k ] = u k on [, + kτ]. Snce β s a non-antcpated strategy, we have v k = β[u k ] = β[u k+1 ] = v k+1 on [, + kτ]. Snce α s a non-antcpated strategy wth delay, we have α[v k ] = α[v k+1 ] = u k+1 on [, + kτ + τ]. By our constructon, the par (u k, v k ) s unque. So f we set (u, v)=(u k, v k ) on [ + kτ, + (k + 1)τ]. the par (u, v) satsfy the desred property. We now ntroduce non-antcpated-strategy to our two-player, zero-sum dfferental game from secton 1. Consder the dynamcal system: ẋ(t) = f(t, x(t), u(t), v(t)) (2.3) x( ) = x 0 Let α A d ( ), β B d ( ) be player 1 s and 2 s strateges respectvely. We modfy the upper and lower value functons: V + (, x 0 ) = α A d () where the pay-off s: P (, x 0, u, v) = P (, x 0, α[v], v), and V (, x 0 ) = P (, x 0, u, β[u]), v Φ 2 β B d () u Φ 1 h(t, x(t), u(t), v(t))dt + g(x(t )) 3. Propertes of the value functon We now turn to the basc propertes of the value functon. dynamc programmng property. The frst s the Theorem 3.1 (Dynamc Programmng Property). For t + σ T, x R m, V (, x 0 ) = V + (, x 0 ) = β B d () u Φ 1 () α A d () v Φ 2 () h(t, x(t), u(t), β[u](t))dt+v ( +σ, x( +σ)) h(t, x(t), u(t), β[u](t))dt + V + ( + σ, x( + σ)) where x( ) solves the ODE system (2.3) wth u( ) = α[v]( ), and v( ) = β[u]( ). In other words, we can splt the soluton to the value functons on the nterval [, T ] nto two separate problems. 1). As a frst step, we solve the problem on the subnterval [ + σ, T ] wth runnng cost h and termnal pay-off g. In ths way, we determne the value functons V + ( + σ, ) and V ( + σ, ); 2). As a second step, we solve the problem on the subnterval [, + σ] wth runnng cost h and termnal pay-offs V + ( + σ, ) and V ( + σ, ), determned by the frst step. We now present the proof of the dynamc programmng property. Proof. We prove the case for the lower value functon. The proof follows deas from [1]. Set W (, x 0 ) = β B d () u Φ 1 () h(t, x(t), u(t), β[u](t))dt + V ( + σ, x( + σ)).

6 6 BOTAO WU Then for any ɛ > 0, there exsts δ B d ( ) such that (3.2) W (, x 0 )+ɛ u Φ 1 () By defnton, V ( +σ, x( +σ)) = β B d (+σ) u Φ 1 (+σ) h(t, x(t), u(t), δ[u](t))dt+v ( +σ, x( +σ)). where x( ) solves the ODE (2.3) on [ + σ, T ]. +σ h(t, x(t), u(t), β[u](t))dt+g(x(t )) Thus, there exst a strategy δ x(to+σ) B d ( + σ) such that (3.3) V ( +σ, x( +σ))+ɛ h(t, x(t), u(t), δ x(t0+σ)[u](t))dt+g(x(t )). u Φ 1 (+σ) +σ So we defne the strategy β B d ( ): for any u Φ( ), set δ[u](t) t + σ β[u](t) = δ x(t0+σ)[u](t) + σ t T So for any u Φ( ), by (3.2), and (3.3), we have: W (, x 0 ) u Φ 1 () + u Φ 1 (+σ) +σ u Φ() h(t, x(t), u(t), δ[u](t))dt h(t, x(t), u(t), δ x(t0+σ)[u](t))dt + g(x(t )) 2ɛ h(t, x(t), u(t), β[u](t))dt + g(x(t )) 2ɛ Therefore, we have V (, x 0 ) W (, x 0 ) + 2ɛ. On the other hand, by the defnton of V (, x 0 ), for any ɛ > 0, there exsts a strategy β B d ( ), such that (3.4) V (, x 0 ) + ɛ h(t, x(t), u(t), β[u](t))dt + g(x(t )) u Φ 1 () Also, we have W (, x 0 ) u Φ 1 () h(t, x(t), u(t), β[u](t))dt + V ( + σ, x( + σ))

7 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION7 So there exsts u 1 Φ 1 ( ) such that (3.5) W (, x 0 ) Therefore, for any u Φ 1 ( + σ), we defne ũ(t) = h(t, x(t), u 1 (t), β[u 1 ](t))dt + V ( + σ, x( + σ)) + ɛ u 1 (t) u(t) and a strategy β B d ( + σ) by β[u](t) = β[ũ](t), for + σ t T. By defnton, V ( + σ, x( + σ)) So there exsts u 2 Φ 1 ( + σ) such that (3.6) V ( + σ, x( + σ)) t + σ + σ t T h(t, x(t), u(t), β[u](t))dt + g(x(t )) u Φ 1 (+σ) +σ +σ We now fnally defne u Φ 1 ( ) by u(t) = By (3.4), (3.5) and (3.6) above, we can wrte W (, x 0 ) h(t, x(t), u(t), β[u](t))dt + g(x(t )) + ɛ u 1 (t) u 2 (t) t + σ + σ t T h(t, x(t), u(t), β[u](t))dt + g(x(t )) + 2ɛ V (, x 0 ) + 3ɛ Takng ɛ arbtrarly small, we obtan W (, x 0 ) = V (, x 0 ). The proof for the upper value functon s dentcal. From the dynamc programmng property, we can deduce that the value functon s Lpschtz contnuous. Lemma 3.7. There exsts a constant C R such that for any [0, T ], and for any x 0,y 0 R m, we have: V (, x 0 ) V (, y 0 ) C x 0 y 0, and V + (, x 0 ) V + (, y 0 ) C x 0 y 0 Proof. We wll prove the lemma for the lower value functon. For any arbtrary controls u and v, let x(t) and y(t) be solutons of the ODE (2.3), wth the assocated ntal condtons x( ) = x 0 and y( ) = y 0 Snce the functon f s Lpschtz contnuous n x from (1.2), there exst C 1 R such that: ẋ(t) ẏ(t) = f(t, x(t), u(t), v(t)) f(t, y(t), u(t), v(t)) C 1 x(t) y(t). Thus, by Gronwall s nequalty, x(t) y(t) x 0 y 0 e C1(t t0). Usng the Lpschtz contnuty of functons g and h from (1.3) and (1.4), we wrte:

8 8 BOTAO WU P (, x 0, u, v) P (, y 0, u, v) h(t, x(t), u(t), v(t)) h(t, y(t), u(t), v(t)) dt + g(x(t )) g(y(t )) C 3 x(t) y(t) dt + C 2 x(t ) y(t ) C 3 e C1(T t0) (T ) x 0 y 0 + C 2 e C1(T t0) x 0 y 0 C x 0 y 0 By Theorem 2.2, there exsts unque strategy β B d such that β[u] = v. Hence, P (, x 0, u, β[u]) P (, y 0, u, β[u]) C x 0 y 0. Takng the remum over u and mum over β, we obtan V (, x 0 ) V (, y 0 ) C x 0 y 0 Theorem 3.8. V +,and V are bounded and Lpschtz contnuous n all varables. Proof. We wll prove the theorem for the lower value functon V. Snce V (, x 0 ) = h(t, x(t), u(t, x(t)), v(t, x(t)))dt + g(x(t )), and β B d () u Φ 1 both functons h and g are bounded, we must have V s bounded as well. We have shown n Lemma 3.7 that V s Lpschtz contnuous n the state varable. We now try to prove that V s also Lpschtz n the tme varable. Let x 0 R m, and le t 1 T. By the dynamc programmng property, we can wrte V (, x 0 ) = β B d () u Φ 1 () t1 h(t, x(t), u(t), β[u](t))dt + V (t 1, x(t 1 )) where x( ) solves the ODE (2.3) on [, t 1 ] wth controls u( ) and β[u]( ). Snce h s bounded by some constant C 3 R as we have assumed n (1.4), then for any u Φ( ), and β B d ( ), we have t1 h(t, x(t), u, β[u])dt C 3 t 1. Because ẋ(t) = f(t, x(t), u, β[u]), and f s bounded by some constant C 1 R, we can wrte: x(t 1 ) x 0 Lemma 3.7 mples that: t1 f(t, x(t), u, β[u]) dt C 1 t 1. V (t 1, x(t 1 )) V (t 1, x 0 ) C x(t 1 ) x 0 CC 1 t 1 = C 4 t 1. Thus, we have: V (t 1, x 0 ) (C 3 + C 4 )(t 1 ) V (t 1, x(t 1 )) + t1 h(t, x(t), u(t), β[u](t))dt V (t 1, x 0 ) + (C 3 + C 4 )(t 1 )

9 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION9 Denote M := C 3 + C 4 and take the remum over u and the mum over β[u]. By the dynamc programmng property, we have: V (t 1, x 0 ) M(t 1 ) V (, x 0 ) V (t 1, x 0 ) and thus, V (t 1, x 0 ) V (, x 0 ) M(t 1 ). 4. Vscosty solutons and another verfcaton theorem We showed n Secton 1 that f the value functon s C 1, then the value functon satsfes the assocated Hamlton-Jacob equaton when the Isaacs Condton holds. However, the value functon may not possess such regularty and t s possble that the value functon has ponts that are non-dfferentable. In ths secton, we shall weaken our assumpton about our value functons; we shall only assume t to be bounded and Lpschtz contnuous. To accomodate ths weakened regularty assumpton, we ntroduce the dea of vscosty solutons. Consder the followng Hamlton-Jacob equaton: u t + H(t, x, D x u) = 0 (t, x) [, T ] R m (4.1) u(t, x) = g(x) x R m Defnton 4.2. A bounded, and Lpschtz contnuous functon u : [, T ] R m R m s called a vscosty subsoluton to the Hamlton-Jacob equaton (4.1) provded that for any test functon φ C 1, f u φ assumes a local maxmum at (, x 0 ) [, T ] R m, then φ t (, x 0 ) + H(, x 0, Dφ(, x 0 )) 0. Smlarly, a bounded, and Lpschtz contnuous functon u : [, T ] R m R m s called a vscosty ersoluton to the Hamlton-Jacob equaton (4.1) provded that for any test functon φ C 1, f u φ assumes a local mnmum at (, x 0 ) [, T ] R m, then φ t (, x 0 ) + H(, x 0, Dφ(, x 0 )) 0. We call u a vscosty soluton f t s both a vscosty subsoluton and a vscosty ersoluton. In secton 1, we saw that Issacs condton ensures that the value functon satsfes the Hamlton-Jacob equaton f the value functon s C 1. We shall now show that the upper and lower functons V + and V, as defned n the prevous secton, whch we showed to be Lpschtz contnuous, satsfy the above Hamlton-Jacob equaton n the vscosty sense. Theorem 4.3. a). V + s a vscosty soluton of the Hamlton-Jacob equaton V t + + H + (t, x, D x V + ) = 0 (t, x) [, T ] R m (4.4) V + (T, x) = g(x) x R m where the Hamltonan satsfes H + (t, x, p) = v b). V s a vscosty soluton of the Hamlton-Jacob equaton (4.5) f(t, x, u, v) p + h(t, x, u, v)}. u Vt + H (t, x, D x V ) = 0 (t, x) [, T ] R m V (T, x) = g(x) x R m

10 10 BOTAO WU where the Hamltonan satsfes H (t, x, p) = u The proof depends on the followng estmates: Lemma 4.6. Assume that φ C 1. f(t, x, u, v) p + h(t, x, u, v)} v a).if φ satsfes that φ t (, x 0 ) + H + (, x 0, D x φ(, x 0 )) θ < 0, then for each suffcently small σ > 0, there exsts some v Φ 2 ( ) such that (4.7) φ t (t, x(t))+f(t, x(t), α[v](t), v(t)) D x φ(t, x(t))+h(t, x(t), α[v](t), v(t))dt θσ 2 for all α A d ( ) b) f φ satsfes that φ t (, x 0 ) + H (, x 0, D x φ(, x 0 )) θ > 0, then for each suffcently small σ > 0, there exsts some v Φ 2 ( ) such that (4.8) φ t (t, x(t))+f(t, x(t), α[v](t), v(t)) D x φ(t, x(t))+h(t, x(t), α[v](t), v(t))dt θσ 2 for all α A d ( ) Proof. Let J(t, x, u, v) = h(t, x, u, v) + f(t, x, u, v) D x φ(t, x) + φ t (t, x). By our assumpton, v Φ 2 () u Φ 1 () Then, there exsts v Φ 2 ( ) such that J(, x 0, u, v) θ < 0. J(, x 0, u, v ) 3θ u Φ 1 () 4 < 0. By contnuty, we have that J(t, x(t), u, v ) θ u Φ 1 () 2 < 0, for t + σ, where σ s suffcently small and x( ) solves the ODE (2.6)for any u( ) and v( ) wth x( ) = x 0. In partcular, for v( ) = v, and α A d ( ) wth α[v ] = u, we have, on t + σ: φ t (t, x(t)) + f(t, x(t), α[v](t), v(t)) D x φ(t, x(t)) + h(t, x(t), α[v](t), v(t)) θ 2. Integratng over [, + σ], we obtan (4.7) The proof of b) s smlar. We now prove Theorem 4.2. The proof follows from [1]. Proof. Suppose that V + φ attans a local maxmum at (, x 0 ) [, T ] R m. We want to show that φ t (, x 0 ) + H + (, x 0, D x φ(, x 0 )) 0. Suppose not, there exsts θ > 0 such that φ t (, x 0 ) + H + (, x 0, D x φ(, x 0 )) θ < 0. Then by Lemma 4.3, we have: φ t (t, x(t)) + f(t, x(t), α[v](t), v(t)) D x φ(t, x(t)) + h(t, x(t), α[v](t), v(t))dt

11 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION 11 θσ 2. Thus, we have: α A d () v Φ 2 () θσ 2 < 0. φ t (t, x(t))+f(t, x(t), α[v](t), v(t)) D x φ(t, x(t))+h(t, x(t), α[v](t), v(t))dt On the other hand, by the dynamc programmng prncple, we have: V + (, x 0 ) = α A d () v Φ 2 () h(t, x(t), u(t), β[u](t))dt+v + ( +σ, x( +σ)). Snce V + φ has a local maxmum at (, x 0 ), for σ small we have: V + (, x 0 ) φ(, x 0 ) V + ( + σ, x( + σ)) φ( + σ, x( + σ)), where x( ) solves the ODE on [, + σ] for any u( ) and v( ) wth x( ) = x 0. Thus, α A d () v Φ 2 () Notce that: φ( +σ, x( +σ)) φ(, x 0 ) = Therefore, we fnally get: α A d () v Φ 2 () 0, whch s a contradcton to (4.7). h(t, x(t), α[v](t), v(t))+φ( +σ, x( +σ)) φ(, x 0 ) 0. φ t (t, x(t))+f(t, x(t), α[v](t), v(t)) D x φ(t, x(t))dt. φ t (t, x(t))+f(t, x(t), α[v](t), v(t)) D x φ(t, x(t))+h(t, x(t), α[v](t), v(t))dt Therefore, we must have φ t (, x 0 ) + H + (, x 0, D x φ(, x 0 )) 0. We can prove smlarly that f V + φ acheves a local mnmum at (, x 0 ), then φ t (, x 0 ) + H + (, x 0, D x φ(, x 0 )) 0. Therefore, V + s a vscosty soluton of + H + (t, x, D x V + ) = 0 wth V + (T, x) = g(x). V + t Smlarly, we can prove that V s a vscosty soluton of the Hamlton-Jacob equaton for the lower Hamltonan. We now attempt to show that the game has a value functon under Isaacs condton n the vscosty sense, and that ths value s actually unque. To begn, we shall study comparson prncple for the solutons of Hamlton-Jacob equatons.

12 12 BOTAO WU We assume the Hamltonan functon H : [, T ] R m R m satsfes the Lpschtz condton: (4.9) H(t, x, p) H(s, y, p) C( t s + x y )(1 + p ) and (4.10) H(t, x, p) H(t, x, q) C p q for some constant C R Theorem (Comparson Prncple) Let u : [, T ] R m R be a vscosty subsoluton of the Hamtlton-Jacob equaton: (4.12) u t (t, x) + H(t, x, D x u(t, x)) = 0. Let v : [, T ] R m R be a vscosty ersoluton of the Hamlton-Jacob equaton: (4.13) v t (t, x) + H(t, x, D x v(t, x)) = 0. If u(t, x) v(t, x) for any x R m, then u(t, x) v(t, x) for any (t, x) [, T ] R m Proof. We sketch the proof here snce the proof s long and should be standard n any reference of vscosty solutons of Hamlton-Jacob equatons. It s enough to show that for any α > 0, we have u(t, x) v(t, x) α(t t) 0 for any (t, x) [, T ] R m. If not, then there exsts α > 0 such that: M := u(t, x) v(t, x) α(t t)} > 0. (t,x) [,T ] R m Consder the functon Φ ɛ (t, s, x, y) : [, T ] [, t] R m R m R such that: Φ ɛ (t, s, x, y) = u(x, t) v(y, s) α(t t) ɛ 2 ( x 2 + y 2 ) 1 2ɛ 2 ( t s 2 + x y 2 ). Thanks to the penalzaton terms, we can show that the above functon assumes a mamum value at some (t ɛ, s ɛ, x ɛ, y ɛ ) ([, T ] R m ) 2. We can show that t ɛ s ɛ 2 0 and x ɛ y ɛ 2 0 as ɛ 0, and therefore that lm 2ɛ 2 t ɛ s ɛ 2 + x ɛ y ɛ 2 = 0. We denote by (t, x ) the lmt pont. If t = T, then 0 < M u(t, x ) v(t, x ) 0, whch s a contradcton. Thus, t (, T ). ɛ 0 1 Now consder the functon: (t, x) u(x, t) v(y ɛ, s ɛ ) + α(t t) + ɛ 2 ( x 2 + y ɛ 2 ) + 1 2ɛ 2 ( t s ɛ 2 + x y ɛ 2 )} The functon assumes a local maxmum at (t ɛ, x ɛ ). Snce u s a vscosty subsoluton of (4.12), we have: α + 1 ɛ 2 (t ɛ s ɛ ) + H(t ɛ, x ɛ, 1 ɛ 2 (x ɛ y ɛ ) + ɛx ɛ ) 0. Smlarly, consder the functon:

13 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION 13 (s, y) v(s, y) u(x ɛ, t ɛ ) α(t t ɛ ) ɛ 2 ( x ɛ 2 + y 2 ) 1 2ɛ 2 ( t ɛ s 2 + x ɛ y 2 )} Snce v s a vscosty er- The functon assumes a local mnmum at (t ɛ, x ɛ ). soluton of (4.13), we have: 1 ɛ 2 (t ɛ s ɛ ) + H(s ɛ, y ɛ, 1 ɛ 2 (x ɛ y ɛ ) ɛy ɛ ) 0. Therefore, we can wrte: 0 < α H(t ɛ, x ɛ, 1 ɛ 2 (x ɛ y ɛ ) + ɛx ɛ ) H(s ɛ, y ɛ, 1 ɛ 2 (x ɛ y ɛ ) ɛy ɛ ) = H(t ɛ, x ɛ, 1 ɛ 2 (x ɛ y ɛ ) + ɛx ɛ ) H(s ɛ, y ɛ, 1 ɛ 2 (x ɛ y ɛ ) + ɛx ɛ ) +H(s ɛ, y ɛ, 1 ɛ 2 (x ɛ y ɛ ) + ɛx ɛ ) H(s ɛ, y ɛ, 1 ɛ 2 (x ɛ y ɛ ) ɛy ɛ ) C( t ɛ s ɛ + x ɛ y ɛ )(1 + 1 ɛ 2 (x ɛ y ɛ ) + ɛx ɛ ) + Cɛ( x ɛ + y ɛ ) 0 as ɛ 0 by (4.9) and (4.10), whch s a contradcton. Therefore, we must have that u(x, t) v(x, t) for any (t, x) [, T ] R m. Wth the comparson prncple, we are able to show that the game has a value satsfyng the Hamlton-Jacob equaton (4.1) f Isaacs condton holds. Ths s an analogous result to Theorem 1.10, where we assume that our value functon s smooth. Theorem Assume that Isaacs condton holds: H + (t, x, p) = H (t, x, p) for any (t, x, p) [, T ] R m R m. Then the game has a value V (t, x) = V + (t, x) = V (t, x) for any (t, x) [, T ] R m. Proof. Snce H + (t, x, p) = H (t, x, p) := H(t, x, p), then by Theorem 4.2, V + s a vscosty soluton of V t + (t, x) + H(t, x, D x V + (t, x)) = 0. In partcular, t s a vscosty subsoluton. Smlarly, V s a vscosty soluton of Vt (t, x) + H(t, x, D x V (t, x)) = 0. In partcular, t s a vscosty ersoluton. Moreover, g(x(t )) = V + (T, x) V (T, x) = g(x(t )). Therefore, V + (t, x) V (t, x) for any (t, x) [, T ] R m. Reversng the roles of V + and V, we can also get V + (t, x) V (t, x) for any (t, x) [, T ] R m. Therefore, the game has a value: V (t, x) := V + (t, x) = V (t, x), whch satsfes V t (t, x) + H(t, x, D x V (t, x)) = 0. We brefly menton here the nte horzon problem snce t shares smlar propertes wth the fnte tme horzon problems. Consder the followng ODE: For 0 t <, and x R m, (4.15) ẋ(t) = f(x(t), u(t), v(t)) x(0) = x 0

14 14 BOTAO WU The pay-off functonal s: P (x 0, u, v) = 0 e λt h(x(t), u(t), v(t))dt, where x( ) solves the ODE (4.13) for any u( ) Φ 1, and v( ) Φ 2. The upper and lower value functons are defned accordngly as before. We have the followng propertes of the value functons: Theorem The value functons V + and V are bounded and Lpschtz contnuous. Theorem The value functons satsfy the dynamc programmng property. For the upper value functon V +, the dynamc programmng property s: V + (x 0 ) = β B d u U 0 e λs h(x(s), u(s), β[u](s))ds + e λt V + (x(t)) As Secton 1 ponts out, the assocated Hamlton-Jacob equaton s λv (x) + H(x, DV (x)) = 0. The proofs of the above theorems are smlar to the proofs of the fnte tme problem. 5. characterzaton and exstence of Nash equlbrum In secton 1, we showed n the verfcaton theorem that V (, x 0 ) = P (, x 0, u, ṽ) = P (, x 0, ũ, v) = P (, x 0, ũ, ṽ). u Φ 1 v Φ 2 Notce that ũ, and ṽ closely resemble the Nash equlbrum strateges n the game theory lterature, where each control s the best response map to the control of the other player. In ths secton, we attempt to show rgorously the exstence of Nash equlbrum pay-offs from Isaacs condton n a dfferental game. We consder a N-player dfferental game. Defnton 5.1. Let (, x 0 ) [0, T ] R N. We say that (e 1, e 2,..., e n ) s a Nash equlbrum pay-off at (, x 0 ) f for any ɛ > 0, there exst (ᾱ ) A d... AN d of delay strateges such that for any 1, 2,..., N, we have: e P (, x 0, (ᾱ )) ɛ and P (, x 0, ᾱ, ᾱ ) P (, x 0, ᾱ, u ) ɛ, for any u Φ ( ). [3] suggests that t s suffcent to maxmze just the termnal pay-off as opposed to maxmzng both the ntegral and the termnal pay-offs. We shall, therefore, restrct: P (, x 0, (α )) = g (x(t )), where x( ) solves the ODE wth x( ) = x 0 and some u ( ) = α [(u )]( ). We see that Defnton 5.1 s the tradtonal Nash Equlbrum defnton n game theory lterature where t s emphaszed that no unlateral devaton n strategy by any sngle player s proftable. For convenence, we wll be prmarly workng on an equvalent defnton that [2] gves. Defnton 5.2. (Equvalent defnton of Nash equlbrum). (e ) s a Nash equlbrum pay-off f and only f t s

15 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION 15 (Reachable) For any 1, 2,..., N, e g (x(t )) ɛ. (Consstent) For any t [, T ], e V (t, x(t)) ɛ, where x( ) solves the ODE wth x( ) = x 0 and a N-tuple of controls (u ), and V s the lower value of the game wth (5.3) V (, x 0 ) := α A d () (u ) Φ () Isaacs condton n ths case states that: (u ) Φ u Φ f(x, u, (u )) p} = u Φ P (, x 0, α [(u )], u ) f(x, u, (u )) p} (u ) Φ for any (x, p) R N R N. We have shown that under ths condton, V the value of the game and satsfes the followng equalty: V (, x 0 ) = α A d () (u ) Φ () P (, x 0, α [(u )], u ). s ndeed Our goal s to prove the followng theorem: Theorem 5.4. If the functons f and g are bounded and Lpschtz contnuous, and Isaacs condton holds, then for any (, x 0 ) [, T ] R N, there exsts at least one Nash Equlbum at (, x 0 ). The proof of ths theorem follows deas from [2] and depends on the followng two lemmas. Lemma 5.5. For any (, x 0 ) [, T ] R N and for any ɛ > 0, there exst (ū ) Φ 1 ( )... Φ N ( ) such that for all 1, 2,..., N, and for all t [, T ], we have V (t, x(t)) V (, x 0 ) + ɛ, where x(t) satsfes the ODE (1.1). Proof. For any ɛ > 0, by defnton of the lower value functons, there exsts a delay strategy ᾱ A d () such that V (, x 0 ) ɛ P (, x 0, ᾱ, (u )) (u ) Φ () By Theorem 2.2, there exsts a unque N-tuple of controls (ū ) such that ᾱ [(ū )] = ū for any 1, 2,..., N. Let x(t) be the trajectory of the ODE wth x( ) = x 0 and controls (ū ). Then for any t 1 [, T ], we defne a new strategy α A d (t 1) by α [(u 1 )] = ᾱ[(ũ )] [t1,t ] where (ũ (ū )(t) t t 1 )(t) = (u for any (u ) U (t 1 ). Thus: )(t) t 1 t T V (t 1, x(t 1 )) = α A d (t 1) (u ) Φ (t 1) P (t 1, x(t 1 ), α, (u )) P (t 1, x(t 1 ), α, (u )) (u ) Φ (t 1) P (, x( ), ᾱ, (v )) (u ) Φ ()

16 16 BOTAO WU because f (v ) = (u ) on [, t 1 ], then we have the trajectores: x(t ; x( ) = x 0, ᾱ, (v )) = x(t ; x(t 1 ), α, (v ) [t1,t ]). Thus, we have: V (t 1, x(t 1 )) P (, x 0, ᾱ, (v )) V (v ) Φ (.x 0 ) ɛ () Lemma 5.6. For any (, x 0 ) [, T ] R N, and for any ɛ > 0, there exsts a N-tuple of control (u ) Φ 1 ( )... Φ N ( ), such that: V (s, x(s)) V (t, x(t)) + ɛ for any s t T where x(t) solves the ODE (1.1) wth x( ) = x 0 and controls (u ). (T t0)k Proof. For n > 1, let t k = + n. By Lemma 5.5, we can construct a N-tuple controls (u n) : [t k, t k+1 ) Φ 1... Φ N such that: V (t, x n (t)) V (t k, x n (t k )) 1 n for any t [t 2 k, t k+1 ), where x n ( ) s the ODE (1.1) trajectory wth x( ) = x 0 and controls (u n). Now for any s t T, we can fnd t k1 and t k2 such that t k1 s t k1+1 and t k2 t t k2+1. Snce we proved that the value functon V s Lpschtz contnuous n all varables and functon f s bounded, we can wrte: V (t, x n (t)) V (t k2, x n (t k2 )) V (t, x n (t)) V (t k2, x n (t)) + V (t k2, x n (t)) V (t k2, x n (t k2 )) M t k2+1 t k2 + C x n (t) x n (t k2 ) where M andc are the Lpschtz constants t M t k2+1 t k2 + f(s, x N (s), (u n) ds t k2 M t k2+1 t k2 + C 1 t k2+1 t k2 where C 1 s the bound of functon f. 1 n (M + C 1) := K n from our constructons of t k 2. We can prove smlarly that V V k 2 1 (s, x n (s)) V (t k1, x n (t k1 )) K n. Therefore, (t, x n (t)) V (s, x n (s)) V (t k2, x n (t k2 )) V (t k1+1, x n (t k1+1)) 2K n (V (t j+1, x n (t +1 )) V (t j, x n (t j ))) 2K n 1 n 2K ɛ for some n j=k 1+1 sutable choce of ɛ. We now prove Theorem 5.3 usng the Defnton 5.2, the equvalent defnton of Nash Equlbrum. Proof. Let the N-tuple controls (u n) be constructed as n Lemma 5.5 wth ɛ = 1/n. Let x n (t) be the ODE (1.1) trajectory wth x n ( ) = x 0 and controls (u n). The sequence x n s unformly bounded and Lpschtz contnuous on [, T ], and thus

17 A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION 17 we can fnd a subsequence x nj unformly on [, T ]. From the contnuty of the value functon V whch converges to some contnuous trajectory x and the con- s non-decreasng: structon of the controls (u n), by Lemma 5.5 we have that V V (s, x(s)) V (t, x(t)) for any s t T and for any 1, 2,...N. Set (e ) = (g (x(t ))). Snce V V s non-decreasng, we have: (t, x(t)) V (T, x(t )) = e for any s t T and for any 1, 2,...N Agan, snce x n unformly converges to x and V s contnuous, for any ɛ > 0, we can fnd some n such that: e g (x n (T )) ɛ and V (t, x n (t)) e + ɛ for any t [, T ]. By Defnton 5.2 ths means that (e ) s Nash equlbrum pay-off. Acknowledgments. It s a pleasure to thank my mentor, Yan Zhang, for hs help and advce for ths paper. I would also lke to thank Prof. Peter May for organzng the REU program each year. References [1] L.C.Evans and P.E.Sougands. Dfferental Games and Representaton Formulas for Solutons of Hamlton-Jacob-Isaacs Equatons. [2] Cardalaguet Perre. Introducton to Dfferental Games. [3] Rufus Isaacs. Dfferental Games.

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player