STOCHASTIC DIFFERENTIAL GAMES:THE LINEAR QUADRATIC ZERO SUM CASE

Sankhyā : he Indian Journal of Statitic 1995, Volume 57, Serie A, Pt. 1, pp.161 165 SOCHASIC DIFFERENIAL GAMES:HE LINEAR QUADRAIC ZERO SUM CASE By R. ARDANUY Univeridad de Salamanca SUMMARY. hi paper conider the problem of finding optimal trategie, with two player, baed on linear tochatic differential ytem dξ =(Aξ +B 1 u 1 +B 2 )dt+σdw. Uing a minimax principle we found that the feedback optimal trategie are linear in the tate of the ytem and may be calculated by olving a Ricatti matrix differential equation. We alo formulate a eparation principle that relate the olution in the determinitic cae with the olution in the tochatic cae, a in a linear tochatic control problem. 1. Introduction Let u conider the linear tochatic differential ytem: dξ =(Aξ + B 1 u 1 + B 2 )dt + σdw... (1) where ξ i n-dimentional, W i a m-dimenional Wiener Proce, u 1 i the trategy of player i, anr i -dimenional column (i =1, 2); one aume that there are no retriction on the value of u i ; A, B 1,B 2 and σ are matrice of appropriate dimenion and, in general, function of time. Player 1 attempt to maximize J and player 2 to minimize it, where: J(x,, u 1, )=E x, (ξ u 1 ) W ξ 00 W 01 W 02... W01 W 11 W u 1 W02 W dt + ξ(τ) Fξ(τ) W 22... τ...(2) where τ i the final moment of the game that one aume to be fixed and 0 τ; moreover: Paper received. May 1993. AMS (1985) claification. 93E05, 90D05, 90D25. Key word and phrae. Stochatic differential game, tochatic optimization, eparation principle.

162 r. ardanuy W 00 = W 00 (t) i a ymmetric quare matrix of order n. W 11 = W 11 (t) i a negative definite ymmetric quare matrix of order r 1. W 22 = W 22 (t) i a poitive definite ymmetric quare matrix of order r 2. F i a ymmetric quare matrix. indicate the tranpoition. With a view to implifying the notation, let u et: f(x, t, u 1, )=Ax + B 1 u 1 + B 2...(3) a = σσ...(4) L(x,, u 1, )=(x u 1 ) W x 00 W 01 W 02... W01 W 11 W u 1 W02 W W 22... = x W 00 x + u 1 W 11 u 1 + W 22 +2x W 01 u 1 +2x W 02 +2u 1 W...(5) H(x, t, u 1,,p)=pf(x, t, u 1, )+L(x, t, u 1, )...(6) herefore, taking into account that there are no retriction on u 1 (x, t), (x, t), except for the analytical condition neceary for (1) to have a unique olution and having fixed an initial condition, and by virtue of the Minimax Principle (ee Ardanuy and Alcalá (1991, 1992)), one mut earch for a function V (x, ) of cla C 2,1 (R n (0,τ)), uch that { 0= V + 1 2 tr [V xxa]+h ( x,, u 1,,V x ) ;0 τ V (x, τ) =x Fx where the pair u 1, i a addle point for the Hamiltonian H: max u 1 min...(7) H ( x,, u 1, ),V x = min max H ( x,, u 1, ) (,V x = H x,, u 1, ),V x u 1...(8) 2. Determination of the addle point o determine the addle point, it i neceary to olve the following ytem of equation: { ( ) Hu 1 ( x,, u 1,,V x ) =0 x,, u 1,...(9),V x =0 H After tranpoing term, ytem (9) may be expreed in matrix term a: [ ][ ] [ W11 W u 1 1 ] W W 22 = 2 B 1 Vx + W01x 1 2 B 2 Vx + W02x...(10)

tochatic differential game 163 Since W 11 i negative definite and W 22 poitive definite, there exit the invere of the matrix: [ ] W11 W W = W...(11) W 22 where: [ ] W 1 Γ11 Γ =Γ= Γ...() Γ 22 with: Γ 11 = ( W 11 W W22 1 W ) 1 Γ 22 = ( W 22 W W 11 1 W ) 1...(13) Γ = W11 1 W Γ 22 = Γ 11 W W22 1 and hence: [ ] { [ ] [ ] } u 1 1 B = Γ 1 2 B2 Vx W + 01 W02 x...(14) that i: { [ u 1 = Γ 1 11 2 B 1 Vx + W01x ] [ Γ 1 2 B 2 Vx + W02x ] [ = Γ 1 22 2 B 2 Vx + W02x ] [ Γ 1 2 B 1 Vx + W01x ]...(15) In view of the tructure of H and ince W 11 i negative definite and W 22 poitive definite, it i eay to check that u 1, i a addle point for H. 3. Determination of the value of the game V (x, ) mut atify the partial differential equation (7) that may be rewritten in ymmetrical form: 0 = V + 1 2 tr [V xxa]+ 1 2 V x Ax +[B 1 : B 2 ] u1 + 1 2 x A +[u 1 ] B 1 B2 V x + x W 00 x...(16) +[u 1 ]W u1 + x [W 01 : W 02 ] u1 +[u 1 : ] W 01 W02 with the boundary condition V (x, τ) =x Fx, and where u 1, are a given in (14) or (15). By analogy with the problem of linear-quadratic control, we potulate the following expreion for V (x, ): V (x, ) =x P ()x + g()...(17)

164 r. ardanuy where P i ymmetric and atifie the boundary condition P (τ) =F, while g() i a real function with g(τ) = 0. From (17), one ha: V = x Px+ġ; V x =2x P and V xx =2P...(18) where with the point we indicate derivative with repect to time; by ubtituting in (14), one obtain the optimal trategie; u1 = Γ B 1 P + W 01 B2 W02 x...(19) that are linear in the tate of the ytem. o determine P and g we ubtitute (18) and (19) in (16), and taking into account expreion (), one obtain: where: 0=ġ +tr[pa]+x { P + PG 1 P + PG 2 + G 2 P + G 3 } x...(20) G 1 = [B 1 B 2 ]Γ[B 1 B 2 ] G 2 = A [B 1 B 2 ]Γ[W 01 W 02 ]...(21) G 3 = W 00 [W 01 W 02 Γ[W 01 W 02 ] However, for P and g to fulfill (20) with the repective boundary condition, it i only neceary to olve the ordinary matrix differential equation: P + PG 1 P + PG 2 + G 2 P + G 3 =0...(22) with the boundary condition P (τ) =F, and take: τ g() = tr [P (t)a(t)]dt...(23) In ummary, the calculation method conit in olving, by integrating backward in time, the Ricatti matrix equation (22); by determining P (S), 0 τ, one arrive at (23) and calculate g(), 0 τ. With thee two function, one thu determine the value of the game (17) at each moment and for each tate of the ytem, together with the optimal trategie given in (19). Upon tudying thee equaiton one ee that in the cae where a(t) = 0, with which the game i determinitic, the optimal trategie with feedback continue to be the ame, and only the value of the game varie; i.e.: V (x, ) determinitic = x P ()x...(24) ince by (23), in the determinitic cae g() i equal to zero, and one can thu offer the following eparation principle.

tochatic differential game 165 heorem 1: o olve the linear-quadratic tochatic game, it uffice to olve the correponding linear-quadratic determinitic game (σ = 0); when thi i done, the optimal feedback trategie are linear and in both cae coincide, wherea the value of the game will be given by: τ V (x, ) tochatic = V (x, ) determinitic + tr [P (t)a(t)]dt Regarding the numerical technique to olve the Ricatti equation, among other the following are of particular interet: i) Method of direct integration: the implet of thee i the Euler method. However, owing to numerical error, the value of P (t) i detroyed, uch that it i appropriate to ymmetrize at each point; i.e. ubtitute the P calculated by (P + P )/2. Alternatively, it i poible to take advantage of the ymmetry of P to reduce the matrix equation (22) to a ytem of n(n +1)/2 differential equation of firt order, which coniderably reduce computation time. A lengthy dicuion of thi method of direct integration can be found in Bucy and Joeph (1968). ii) he Kalman-Englar Method, thi i uually applied to time-invariant Ricatti equation (ee, for example, Kalman and Englar (1966), Vaughan (1966), etc.). Reference Ardanuy, R. and Alcala, A. (1991). Minimax Principle for Stochatic Differential Game, Extracta Mathematicae, 6, No. 2-3, pp. 184-186. (1992). Weak infiniteimal Operator and Stochatic Differential Game, Stochatica, 13, No. 1, pp. 5-. Bucy, R. S. and Joeph, P.D. (1968). Filtering for Stochatic Procee with Application to Guidance, J. Wiley, New York. Kalman, R. E. and Englar,.S. (1966). A Uer Manual for the Automatic Synthei Program. NASA, Report CR-475. Vaughan, D. R. (1969). A Negative Exponential Solution for the Matrix Ricatti Equation. IEEE ran. Autom. Control. 14, No. 1, pp. 72-75. Depto. de Matemática Pura y Aplicada Plaza de la Merced, 1 E37008 - Salamanca, Spain.