Signaling equilibria for dynamic LQG games with. asymmetric information

Size: px

Start display at page:

Download "Signaling equilibria for dynamic LQG games with. asymmetric information"

Brianna Perkins
5 years ago
Views:

1 Signaling equilibria for dynamic LQG games wih asymmeric informaion Deepanshu Vasal and Achilleas Anasasopoulos Absrac We consider a finie horizon dynamic game wih wo players who observe heir ypes privaely and ake acions, which are publicly observed. Players ypes evolve as independen, conrolled linear Gaussian processes and players incur quadraic insananeous coss. This forms a dynamic linear quadraic Gaussian (LQG) game wih asymmeric informaion. We show ha under cerain condiions, players sraegies ha are linear in heir privae ypes, ogeher wih Gaussian beliefs form a perfec Bayesian equilibrium (PBE) of he game. Furhermore, i is shown ha his is a signaling equilibrium due o he fac ha fuure beliefs on players ypes are affeced by he equilibrium sraegies. We provide a backward-forward algorihm o find he PBE. Each sep of he backward algorihm reduces o solving an algebraic marix equaion for every possible realizaion of he sae esimae covariance marix. The forward algorihm consiss of Kalman filer recursions, where sae esimae covariance marices depend on equilibrium sraegies. I. INTRODUCTION Linear quadraic Gaussian (LQG) eam problems have been sudied exensively under he framework of classical sochasic conrol wih single conroller and perfec recall [1, Ch.7]. In such a sysem, he sae evolves linearly and he conroller makes a noisy observaion of he sae which is also linear in he sae and noise. The conroller incurs a quadraic insananeous cos. Wih all basic random variables being independen and Gaussian, he problem is modeled as a parially observed Markov decision process (POMDP). The belief sae process under any conrol law happens o be Gaussian and hus can be sufficienly described by he corresponding mean and covariance processes, which can be updaed by he Kalman filer equaions. Moreover, he covariance can be compued offline and hus he mean (sae esimae) is a sufficien saisic for conrol. Finally, due o he quadraic naure of he coss, he The auhors are wih he Deparmen of Elecrical Engineering and Compuer Science, Universiy of Michigan, Ann Arbor, MI, USA {dvasal, anasas} a umich.edu

2 opimal conrol sraegy is linear in he sae. Thus, unlike mos POMDP problems, he LQG sochasic conrol problem can be solved analyically and admis an easy-o-implemen opimal sraegy. LQG eam problems have also been sudied under non-classical informaion srucure such as in muli-agen decenralized eam problems where wo conrollers wih differen informaion ses minimize he same objecive. Such sysems wih asymmeric informaion srucure are of special ineres oday because of he emergence of large scale neworks such as social or power neworks, where here are muliple decision makers wih local or parial informaion abou he sysem. I is well known ha for decenralized LQG eam problems, linear conrol policies are no opimal in general [2]. However here exis special informaion srucures, such as parially nesed [3] and sochasically nesed [4], where linear conrol is shown o be opimal. Furhermore, due o heir srong appeal for ease of implemenaion, linear sraegies have been sudied on heir own for decenralized eams even a he possibiliy of being subopimal (see [5] and references herein). When conrollers (or players) are sraegic, he problem is classified as a dynamic game and an appropriae soluion concep is some noion of equilibrium. When players have differen informaion ses, such games are called games wih asymmeric informaion. There are several noions of equilibrium for such games, including perfec Bayesian equilibrium (PBE), sequenial equilibrium, rembling hand equilibrium [6], [7]. Each of hese noions of equilibrium consiss of a sraegy and a belief profile of all players where he equilibrium sraegies are opimal given he beliefs and he beliefs are derived from he equilibrium sraegy profile and using Bayes rule (whenever possible), wih some equilibrium conceps requiring furher refinemens. Due o his circular argumen of beliefs being consisen wih sraegies which are in urn opimal given he beliefs, finding such equilibria is a difficul ask. To dae, here is no known sequenial decomposiion mehodology o find such equilibria for general dynamic games wih asymmeric informaion. Auhors in [8] sudied a discree-ime dynamic LQG game wih one sep delayed sharing of observaions. Auhors in [9] sudied a class of dynamic games wih asymmeric informaion under he assumpion ha player s poserior beliefs abou he sysem sae condiioned on heir common informaion are independen of he sraegies used by he players in he pas. Due o his independence of beliefs and pas sraegies, he auhors of [9] were able o provide a backward recursive algorihm similar o dynamic programming o find Markov perfec equilibria [11] of a ransformed game which

3 are equivalenly a class of Nash equilibria of he original game. The same auhors specialized heir resuls in [12] o find non-signaling equilibria of dynamic LQG games wih asymmeric informaion. Recenly, we considered a general class of dynamic games wih asymmeric informaion and independen privae ypes in [13] and provided a sequenial decomposiion mehodology o find a class of PBE of he game considered. In our model, beliefs depend on he players sraegies, so our mehodology allows he possibiliy of finding signaling equilibria. In his paper, we build on his mehodology o find signaling equilibria for wo-player dynamic LQG games wih asymmeric informaion. We show ha players sraegies ha are linear in heir privae ypes in conjuncion wih consisen Gaussian beliefs form a PBE of he game. Our conribuions are: (a) Under sraegies ha are linear in players privae ypes, we show ha he belief updaes are Gaussian and he corresponding mean and covariance are updaed hrough Kalman filering equaions which depend on he players sraegies, unlike he case in classical sochasic conrol and he model considered in [12]. Thus here is signaling [14], [15]. (b) We sequenially decompose he problem by specializing he forward-backward algorihm presened in [13] for he dynamic LQG model. The backward algorihm requires, a each sep, solving a fixed poin equaion in parial sraegies of he players for all possible beliefs. We show ha in his seing, solving his fixed poin equaion reduces o solving a marix algebraic equaion for each realizaion of he sae esimae covariance marices. (c) The cos-o-go value funcions are shown o be quadraic in he privae ype and sae esimaes, which ogeher wih quadraic insananeous coss and mean updaes being linear in he conrol acion, implies ha a every ime player i faces an opimizaion problem which is quadraic in her conrol. Thus linear conrol sraegies are shown o saisfy he opimaliy condiions in [13]. (d) For he special case of scalar acions, we provide sufficien algorihmic condiions for exisence of a soluion of he algebraic marix equaion. Finally, we presen numerical resuls on he seady sae soluion for specific parameers of he problem. The paper is srucured as follows. In Secion II, we define he model. In Secion III, we inroduce he soluion concep and summarize he general mehodology in [13]. In Secion IV, we presen our main resuls where we consruc equilibrium sraegies and belief hrough a forward-backward recursion. In Secion V we discuss exisence issues and presen numerical seady sae soluions. We conclude in

4 Secion VI. A. Noaion We use uppercase leers for random variables and lowercase for heir realizaions. We use bold upper case leers for marices. For any variable, subscrips represen ime indices and superscrips represen player ideniies. We use noaion i o represen he player oher han player i. We use noaion a : o represen vecor (a, a +1,... a ) when or an empy vecor if <. We remove superscrips or subscrips if we wan o represen he whole vecor, for example a represens (a 1, a 2 ). We use δ( ) for he Dirac dela funcion. We use he noaion X F o denoe ha he random variable X has disribuion F. For any Euclidean se S, P(S) represens he space of probabiliy measures on S wih respec o he Borel sigma algebra. We denoe by P g (or E g ) he probabiliy measure generaed by (or expecaion wih respec o) sraegy profile g. We denoe he se of real numbers by R. For any random vecor X and even A, we use he noaion sm( ) o denoe he condiional second momen, sm(x A) := E[XX A]. For any marices A and B, we will also use he noaion quad( ; ) o denoe he quadraic funcion, quad(a; B) := B AB. We denoe race of a marix A by r(a). N(ˆx, Σ) represens he vecor Gaussian disribuion wih mean vecor ˆx and covariance marix Σ. All equaliies and inequaliies involving random variables are o be inerpreed in a.s. sense and inequaliies in marices are o be inerpreed in he sense of posiive definiedness. All marix inverses are inerpreed as pseudoinverses. II. MODEL We consider a discree-ime dynamical sysem wih 2 sraegic players over a finie ime horizon T := {1, 2,... T } and wih perfec recall. There is a dynamic sae of he sysem x := (x 1, x 2 ), where x i X i := R n i is privae ype of player i a ime which is perfecly observed by her. Player i a ime akes acion u i U i := R m i afer observing u 1: 1, which is common informaion beween he players, and x i 1:, which i observes privaely. Thus a any ime T, player i s informaion is u 1: 1, x i 1:. Players ypes evolve linearly as x i +1 = A i x i + B i u + w i, (1)

5 where A i, B i are known marices. (X 1 1, X 2 1, (W i ) T ) are basic random variables of he sysem which are assumed o be independen and Gaussian such ha X i 1 N(0, Σ i 1) and W i consequence, ypes evolve as condiionally independen, conrolled Markov processes, N(0, Q i ). As a P (x +1 u 1:, x 1: ) = P (x +1 u, x ) = 2 Q i (x i +1 u, x i ). (2) i=1 where Q i (x i +1 u, x i ) := P (w i = x i +1 A i x i B i u ). A he end of inerval, player i incurs an insananeous cos R i (x, u ), R i (x, u ) = u T i u + x P i x + 2u S i x [ ] = u x T i S i u, (3) where T i, P i, S i are real marices of appropriae dimensions and T i, P i are symmeric. We define he insananeous cos marix R i as R i := T i S i S i P i S i P i x. Le g i = (g) i T be a probabilisic sraegy of player i, where g i : (U i ) 1 (X i ) P(U i ) such ha player i plays acion u i according o disribuion g i ( u 1: 1, x i 1:). Le g := (g i ) i=1,2 be a sraegy profile of boh players. The disribuion of he basic random variables and heir independence srucure ogeher wih he sysem evoluion in (1) and players sraegy profile g define a join disribuion on all random variables involved in he dynamical process. The objecive of player i is o maximize her oal expeced cos } J i,g := E g { T =1 R i (X, U ). (4) Wih boh players being sraegic, his problem is modeled as a dynamic LQG game wih asymmeric informaion and wih simulaneous moves. III. PRELIMINARIES In his secion we inroduce he equilibrium concep for dynamic games wih asymmeric informaion and summarize he general mehodology developed in [13] o find a class of such equilibria.

6 A. Soluion concep Any hisory of his game a which players ake acion is of he form h = (u 1: 1, x 1: ). Le H be he se of such hisories a ime and H := T =0H be he se of all possible such hisories. A any ime player i observes h i = (u 1: 1, x i 1:) and boh players ogeher have h c = u 1: 1 as common hisory. Le H i be he se of observed hisories of player i a ime and H c be he se of common hisories a ime. An appropriae concep of equilibrium for such games is he PBE [7] which consiss of a pair (β, µ ) of sraegy profile β = (β,i ) T,i=1,2 where β,i where µ,i : H i P(U i ) and a belief profile µ = (µ,i ) T,i=1,2 : H i P(H ) ha saisfy sequenial raionaliy so ha for i = 1, 2, T, h i H i, β i E (β,i β, i, µ ) { T R i (X n, U n ) n= h i } { T E (βi β, i, µ ) R i (X n, U n ) and he beliefs saisfy consisency condiions as described in [7, p. 331]. n= h i }, (5) B. Srucured perfec Bayesian equilibria A general class of dynamic games wih asymmeric informaion was considered in [13] by he auhors where players ypes evolve as condiionally independen conrolled Markov processes. A backwardforward algorihm was provided o find a class of PBE of he game called srucured perfec Bayesian equilibria (SPBE). In hese equilibria, player i s sraegy is of he form U i m i ( π 1, π 2, x i ) where m i : P(X 1 ) P(X 2 ) X i P(U i ). Specifically, player i s acion a ime depends on her privae hisory x i 1: only hrough x i. Furhermore, i depends on he common informaion u 1: 1 hrough a common belief vecor π := (π 1, π 2 ) where π i P(X i ) is belief on player i s curren ype x i condiioned on common informaion u 1: 1, i.e. π(x i i ) := P g (X i = x i u 1: 1 ). The common informaion u 1: 1 was summarized ino he belief vecor (π 1, π 2 ) following he common agen approach used for dynamic decenralized eam problems [16]. Using his approach, player i s sraegy can be equivalenly described as follows: player i a ime observes u 1: 1 and akes acion γ, i where γ i : X i P(U i ) is a parial (sochasic) funcion from her privae informaion x i o u i of he form U i γ( x i i ). These acions are generaed hrough some policy ψ i = (ψ) i T, ψ i : (U i ) 1 {X i P(U i )}, ha operaes on he common informaion u 1: 1 so ha γ i = ψ[u i 1: 1 ]. Then any policy of he player i of he form U i g( u i 1: 1, x i ) is equivalen o U i ψ[u i 1: 1 ]( x i ) [16].

7 The common belief π i is shown in Lemma 2 of [13] o be updaed as π i +1(x i +1) = π x (x i i )γ(u i i x i )Q i (x i +1 x i, u i )dx i, (6a) π i x ( x i )γ(u i i x i )d x i i if he denominaor is no 0, and as π+1(x i i +1) = π(x i i )Q i (x i +1 x i, u )dx i, x i (6b) if he denominaor is 0. The belief updae can be summarized as, π i +1 = F (π i, γ i, u ), (7) where F is independen of players sraegy profile g. The SPBE of he game can be found hrough a wo-sep backward-forward algorihm. In he backward recursive par, an equilibrium generaing funcion θ is defined based on which a sraegy and belief profile (β, µ ) are defined hrough a forward recursion. In he following we summarize he algorihm and resuls of [13]. 1) Backward Recursion: An equilibrium generaing funcion θ = (θ i ) i=1,2, T and a sequence of value funcions (V i ) i=1,2, {1,2,...T +1} are defined hrough backward recursion, where θ i : P(X 1 ) P(X 2 ) {X i P(U i )}, V i : P(X 1 ) P(X 2 ) X i R, as follows. (a) Iniialize π T +1 P(X 1 ) P(X 2 ), x i T +1 X i, (b) For = T, T 1,... 1, V i T +1(π T +1, x i T +1) := 0. (8) π P(X 1 ) P(X 2 ), le θ [π ] be generaed as follows. Se γ = θ [π ] where γ is he soluion of he following fixed poin equaion, i N, x i X i, { } γ ( x i i ) arg min E γi ( xi i ) γ R i (X, U ) + V+1(F i (π, γ, U ), X i +1) π, x i γ i( xi ), (9) where expecaion in (9) is wih respec o random variables (x i, u, x i +1) hrough he measure π i (x i )γ i (u i x i ) γ i (u i x i )Q i (x i +1 x i, u ) and F (π, γ, u ) := ( F (π 1, γ 1, u ), F (π 2, γ 2, u )).

8 Also define V i (π, x i ) := E γi ( xi ) γ i { R i (X, U ) + V+1(F i (π, γ, U ), X+1) i π, x i }. (10) From he equilibrium generaing funcion θ defined hough his backward recursion, he equilibrium sraegy and belief profile (β, µ ) are defined as follows. 2) Forward Recursion: (a) Iniialize a ime = 1, µ 1[φ](x 1 ) := N Q i (x i 1). (11) i=1 (b) For = 1, 2... T, i = 1, 2, u 1: H c +1, x i 1: (X i ) β,i (u i u 1: 1 x i 1:) := θ i [µ [u 1: 1 ]](u i x i ) (12) and µ,i +1[u 1: ] := F (µ,i [u 1: 1 ], θ[µ i [u 1: 1 ]], u ) (13a) 2 µ +1[u 1: ](x 1, x 2 ) := µ,i +1[u 1: ](x i ). (13b) i=1 The sraegy and belief profile (β, µ ) hus consruced form an SPBE of he game [13, Theorem 1]. IV. SPBE OF THE DYNAMIC LQG GAME In his secion, we apply he general mehodology for finding SPBE described in he previous secion, on he specific dynamic LQG game model described in Secion II. We show ha players sraegies ha are linear in heir privae ypes in conjuncion wih Gaussian beliefs, form an SPBE of he game. We prove his resul by consrucing an equilibrium generaing funcion θ using backward recursion such ha for all Gaussian belief vecors π, γ = θ [π ], γ i is of he form γ (u i i x i ) = δ(u i L i x i m i ) and saisfies (9). Based on θ, we consruc an equilibrium belief and sraegy profile. The following lemma shows ha common beliefs remain Gaussian under linear deerminisic γ of he form γ(u i i x i ) = δ(u i L i x i m i ). Lemma 1: If π i is a Gaussian disribuion wih mean ˆx i and covariance Σ i, and γ(u i i x i ) = δ(u i L i x i m i ) hen π+1, i given by (6), is also Gaussian disribuion wih mean ˆx i +1 and covariance Σ i +1,

9 where ˆx i +1 = A i ˆx i + B i u + A i G i (u i L i ˆx i m i ) (14a) Σ i +1 = A i (I G i L i ) Σ i (I G i L i )A i + Q i, (14b) where G i = Σ i L i (L i Σ i L i ) 1. (15) Proof: See Appendix I Based on previous lemma, we define φ i x, φ i s as updae funcions of mean and covariance marix, respecively, as defined in (14), such ha ˆx i +1 = φ i x(ˆx i, Σ i, L i, m i, u ) Σ i +1 = φ i s(σ i, L i ). (16a) (16b) We also say, ˆx +1 = φ x (ˆx, Σ, L, m, u ) (17) Σ +1 = φ s (Σ, L ). (18) The previous lemma shows ha wih linear deerminisic γ, i he nex updae of he mean of he common belief, ˆx i +1 is linear in ˆx i and he conrol acion u i. Furhermore, hese updaes are given by appropriae Kalman filer equaions. I should be noed however ha he covariance updae in (14b) depends on he sraegy hrough γ i and specifically hrough he marix L i. This specifically shows how belief updaes depend on sraegies on he players which leads o signaling, unlike he case in classical sochasic conrol and he model considered in [12]. Now we will consruc an equilibrium generaing funcion θ using he backward recursion in (8) (10). The θ funcion generaes linear deerminisic parial funcions γ, which, from Lemma 1 and he fac ha iniial beliefs (or priors) are Gaussian, generaes only Gaussian belief vecors (π 1, π 2 ) T for he whole ime horizon. These beliefs can be sufficienly described by heir mean and covariance processes (ˆx 1, Σ 1 ) T and (ˆx 2, Σ 2 ) T which are updaed using (14).

10 For = T + 1, T,..., 1, we define he vecors e i := x i ˆx 1 ˆx 2 zi := u i x i ˆx 1 ˆx 2 y i := u 1 u 2 x 1 x 2 x i +1 ˆx (19) ˆx 2 +1 Theorem 1: The backward recursion (8) (10) admis 1 a soluion of he form θ [π ] = θ [ˆx, Σ ] = γ where γ i (u i x i ) = δ(u i L i x i m i ) and L i, m i are appropriaely defined marices and vecors, respecively. Furhermore, he value funcion reduces o V i (π, x i ) = V i (ˆx, Σ, x i ) (20a) = quad(v i (Σ ); e i ) + ρ i (Σ ). (20b) wih V(Σ i ) and ρ i (Σ ) as appropriaely defined marix and scalar quaniies, respecively. Proof: We consruc such a θ funcion hrough he backward recursive consrucion and prove he properies of he corresponding value funcions inducively. (a) For i = 1, 2, Σ T +1, le VT i +1 (Σ T +1) := 0, ρ i T +1 (Σ T +1) := 0. Then ˆx 1 T +1, ˆx2 T +1, Σ 1 T +1, Σ2 T +1, xi T +1 and for π = (π 1, π 2 ), where π i is N(ˆx i, Σ i ), V i T +1(π T +1, x i T +1) := 0 = V i T +1(ˆx T +1, Σ T +1, x i T +1) (21a) (21b) = quad(v i T +1(Σ T +1 ), e i T +1) + ρ i T +1(Σ T +1 ). (21c) (b) For all {T, T 1,..., 1}, i = 1, 2, Suppose V i +1(π +1, x i +1) = quad(v i +1(Σ +1 ), e i +1) + ρ i +1(Σ +1 ) (from inducion hypohesis) 1 Under cerain condiions, saed in he proof.

11 where V+1 i is a symmeric marix defined recursively. Define V i as T i S i 0 V (Σ i, L ) := S i P i 0. (22) 0 0 V+1(φ i s (Σ, L )) Since T i, P i are symmeric by assumpion, Vi is also symmeric. For ease of exposiion, we will assume i = 1 and for player 2, a similar argumen holds. A ime, he quaniy ha is minimized for player i = 1 in (9) can be wrien as E γ1 ( x1 ) [E γ2 [ R 1 (X, U ) + V+1(F 1 (π, γ, U ), X+1) 1 π, x 1, u 1 ] ] π, x 1. (23) The inner expecaion can be wrien as follows, where γ 2 (u 2 x 2 ) = δ(u 2 L 2 x 2 m 2 ), T 1 S 1 ( ) E γ2 quad ; z i S 1 P 1 + quad V+1(φ 1 s (Σ, L )); e i +1 + ρ 1 +1(φ s (Σ, L )) π, x 1, u i = E γ2 [ ( ) ] quad V1 (Σ, L ); y 1 + ρ 1 +1(φ s (Σ, L )) π, x 1, u 1 = quad V 1 (Σ, L ); D 1 z 1 + C 1 m 1 m 2 + ρ 1 (Σ ), (24a) (24b) (24c) where V i is defined in (22) and funcion φ s is defined in (18); y i, z i are defined in (19); ρ i is given by ( ρ i (Σ ) = r Σ i quad ( )) Vi (Σ, L ); J i + r(q i V11,+1(φ i s (Σ, L ))) + ρ i +1(φ s (Σ, L )), (25) where V i 11,+1 is he marix corresponding o x i +1 in V i +1 i.e. in he firs row and firs column of

12 he marix V i +1; and marices D i, C i, J i are as follows, D 1 := I L 2 0 I I B 1 1, A 1 0 B 1 2, L 2 A 1 G 1 + B 1 1, 0 A 1 (I G 1 L 1 ) B 1 2, L 2 B 2 1, 0 0 A 2 + B 2 2, L 2 (26a) D 2 := 0 0 L 1 0 I I 0 0 I 0 0 B 2 2, A 2 B 2 1, L 1 0 B 1 2, 0 A 1 + B 1 1, L 1 0 A 2 G 2 + B 2 2, 0 B 2 1, L 1 A 2 (I G 2 L 2 ) (26b) C 1 := I B 1 2, A 1 G 1 B 1 2, 0 B 2 2, C 2 := I B 2 1, 0 B 1 1, 0 B 2 1, A 2 G 2 (27)

13 [ J 1 := J 2 := 0 L 2 0 I B 1 2,L 2 B 1 2,L 2 (B 2 2, + A 2 G 2 )L 2 [ ] L 1 0 I 0 B 2 1,L 1 (B 1 1, + A 1 G 1 )L 1 B 2 1,L 1 (28) [ ] where B i =: B i 1, B i, B i 2, 1,, B i 2, are he pars of he marix B i ha corresponds o u 1, u 2 [ ] respecively. Le D 1 =: D u1 D e1 where D u1 is he firs column marix of D 1 corresponding o u 1 and D e1 is he marix composed of remaining hree column marices of D 1 corresponding o e 1. The expression in (24c) is averaged wih respec o u 1 ] using he measure γ 1 ( x 1 ) and minimized in (9) over γ 1 ( x 1 ). This minimizaion can be performed componen wise leading o a deerminisic policy γ 1 (u 1 x 1 ) = δ(u 1 L 1 x 1 m 1 ) = δ(u 1 u 1 ), assuming ha he marix D u1 V 1 D u1 is posiive definie 2. In ha case, he unique minimizer u 1 by differeniaing (24c) w.r.. u 1 and equaing i o 0, resuling in he equaion, [ ] ( ) 0 = 2 I D 1 V 1 (Σ, L ) D 1 z 1 + C 1 m 0 = D u1 0 = D u1 = L 1 x 1 + m 1 can be found (29a) ( ) V 1 (Σ, L ) D u1 u 1 + D e1 e 1 + C 1 m (29b) ( ) V 1 (Σ, L ) D u1 ( L 1 x 1 + m 1 ) + [ D e1 ] 1 x 1 + [ D e1 ] 23ˆx + C 1 m, (29c) where [D ei ] 1 is he firs marix column of D ei, [D ei ] 23 is he marix composed of he second and hird column marices of D ei. Marices D i, C i are obained by subsiuing L i, G i in place of L i, G i in he definiion of D i, C i in (27), respecively, and G i is he marix obained by subsiuing L i in place of L i in (15). Thus (29c) is equivalen o (9) and wih a similar analysis for player 2, i implies ha L i is soluion of he following algebraic fixed poin equaion, ( D ui ) V (Σ i, L ) D ui L i = D ui V (Σ i, L )[ D ei ] 1. (30a) 2 This condiion is rue if he insananeous cos marix R i = by showing ha V i and V i are posiive definie. [ T i S i S i P i ] is posiive definie and can be proved inducively in he proof

14 For player 1, i reduces o, B 1 1, B 2 1, T A 1 G 1 + B 1 1, V+1(φ 1 s (Σ, L )) A 1 G 1 + B 1 1, = S B 1 1, B 2 1, and a similar expression holds for player 2. B 1 1, B 2 1, A 1 A 1 G 1 + B 1 1, V+1(φ 1 s (Σ, L )) 0, (30b) 0 L 1 In addiion, m can be found from (29c) as D u1 V 1 u1 D 0 m = 0 D u2 D u1 =: M ˆx =: V 1 V 2 D u2 D u1 0 0 D u2 M 1 M 2 ˆx, m = V 2 D u2 + D u1 D u2 D u1 D u2 V 1 [ D e1 V 2 [ D e2 ] 23 ] 23 V 1 C 1 V 2 C 2 ˆx 1 D u1 D u2 D u1 V 1 C 1 D u2 V 2 C m 2 V 1 [ D e1 ] 23 ˆx V 2 [ D e2 ] 23 (31a) (31b) (31c) Finally, he resuling cos for player i is, V i (π, x i ) = V i (ˆx, Σ, x i ) := quad V (Σ i, L ); [ D ui D ei ] L i x i + M i ˆx + C i M ˆx + ρ i (Σ ) e i (32a) (32b) ( = quad Vi (Σ, L ); D ui ( L i x i + M i ˆx ) + D e1 e i + C i M ) ˆx + ρ i (Σ ) (32c) ( ([ ] ) ) = quad Vi (Σ, L ); D ui L i D ui M i + C i M + D ei e i + ρ i (Σ ) (32d) ( ) = quad Vi (Σ, L ); F i e i + ρ i (Σ ) (32e) ) = quad ( F i V (Σ i, L ) F i ; e i + ρ i (Σ ) (32f) = quad ( V i (Σ ); e i ) + ρ i (Σ ), (32g)

15 where, F i := [ D ui L i D ui M i + C i M ] + D ei (33a) V i (Σ ) := F i V i (Σ, L ) F i. (33b) Since V i is symmeric, so is V i. Thus he inducion sep is compleed. Taking moivaion from he previous heorem and wih sligh abuse of noaion, we define γ = θ [π ] = θ [ˆx, Σ ], (34) and since γ i (u i x i ) = δ(u i L i x i m i ), we define a reduced mapping (θ L, θ m ) as θ Li [ˆx, Σ ] = θ Li [Σ ] := L i and θ mi [ˆx, Σ ] := m i, (35) where L i does no depend on ˆx and m i is linear in ˆx and is of he form m i = M i ˆx. Now we consruc he equilibrium sraegy and belief profile (β, µ ) hrough he forward recursion in (11) (13b), using he equilibrium generaing funcion θ (θ L, θ m ). (a) Le µ,i 1 [φ](x i 1) = N(0, Σ i 1). (36) (b) For = 1, 2... T 1, u 1: H c +1, if µ,i M i ˆx. Then x i 1: (X i ) [u 1: 1 ] = N(ˆx i, Σ i ), le L i = θ Li [Σ ], m i = θ mi [ˆx, Σ ] = β,i (u i u 1: 1 x i 1:) := δ(u i L i x i M i ˆx ) (37a) µ,i +1[u 1: ] := N(ˆx i +1, Σ i +1) 2 µ +1[u 1: ](x 1, x 2 ) := µ,i +1[u 1: ](x i ), i=1 (37b) (37c) where ˆx i +1 = φ i x(ˆx i, L i, m i, u ) and Σ i +1 = φ i s(σ i, L i ). Theorem 2: (β, µ ) consruced above is a PBE of he dynamic LQG game. Proof: The sraegy and belief profile (β, µ ) is consruced using he forward recursion seps (11) (13b) on equilibrium generaing funcion θ, which is defined hrough backward recursion seps

16 (8) (10) implemened in he proof Theorem 1. Thus he resul is direcly implied by Theorem 1 in [13]. A. Exisence In he proof of Theorem 1, D u1 V 1 D u1 V. DISUSSION is assumed o be posiive definie. This can be achieved if R i is posiive definie, hrough which i can be easily shown inducively in he proof of Theorem 1 ha he marices V 1, V 1 are also posiive definie. Consrucing he equilibrium generaing funcion θ involves solving he algebraic fixed poin equaion in (30) for L for all Σ. In general, he exisence is no guaraneed, as is he case for exisence of γ in (9) for general dynamic games wih asymmeric informaion. A his poin, we don have a general proof for exisence. However, in he following lemma, we provide sufficien condiions on he marices A i, B i, T i, S i, P i, V+1 i and for he case m i = 1, for a soluion o exis. Lemma 2: For m 1 = m 2 = 1, here exiss a soluion o (30) if and only if for i = 1, 2, l i R ni such ha l i i (l 1, l 2 )l i 0, or sufficienly i (l 1, l 2 )+ i, (l 1, l 2 ) is posiive definie, where i, i = 1, 2 are defined in Appendix II. Proof: See Appendix II. B. Seady sae In Secion III, we presened he backward/forward mehodology o find SPBE for finie imehorizon dynamic games, and specialized ha mehodology in his chaper, in Secion IV, o find SPBE for dynamic LQG games wih asymmeric informaion, where equilibrium sraegies are linear in players ypes. I requires furher invesigaion o find he condiions for which he backward-forward mehodology could be exended o infinie ime-horizon dynamic games, wih eiher expeced discouned or ime-average cos crieria. Such a mehodology for infinie ime-horizon could be useful o characerize seady sae behavior of he games. Specifically, for ime homogenous dynamic LQG games wih asymmeric informaion (where marices A i, B i are ime independen), under he required echnical condiions for which such a mehodology is applicable, he seady sae behavior can be characerized by he fixed poin equaion in marices (L i, Σ i, V i ) i=1,2 hrough (18), (30b) and (33), where he ime index is dropped in hese equaions, i.e. for i = 1, 2,

17 1. Σ = φ s (Σ, L) (38) 2. ( D ui Vi D ui) L i = D ui Vi [D ei ] 1 (39) 3. V i = F i Vi F i, (40) T i S i 0 where V i = S i P i V i Observe ha in he above equaions he marices V i and V i do no appear as funcions of Σ, as in he finie horizon case described in (22), (33b), in he proof of Theorem 1. The reason for ha is as follows. The seady sae behavior for a general dynamic game wih asymmeric informaion and independen ypes, if i exiss, would involve fixed poin equaion in value funcions (V i ( )) i. However, for he LQG case, i reduces o a fixed poin equaion in (V i (Σ)) i, i.e. value funcions evaluaed a a specific value of Σ. This is so because he funcions V i are evaluaed a Σ and φ(σ, L), which a seady sae are exacly he same (see (38)). As a resul, he fixed poin equaion reduces o he hree algebraic equaions as shown above wih variables he marices Σ, L, V and V, which represens an enormous reducion in complexiy. 1) Numerical examples: In his secion, we presen numerically found soluions for seady sae, assuming ha our mehodology exends o he infinie horizon problem for he model considered. We assume B i = 0 which implies ha he sae process (X i ) T is unconrolled. 1. For i = 1, 2, m i = 1, n i = 2, A i = 0.9I, B i = 0, Q i = I, 1 T 1 I 4 = I 1, T 2 0 = I 4, P 1 I 0 =, 1 I 0 1 I I P 2 = 0 0, S 1 = 1 0, S 2 = 0 0, (41) 0 I here exiss a symmeric soluion as, for i = 1, 2, [ ] L i = , Σ i = (42)

18 2. For i = 1, 2, m i = 2, n i = 2, A =, A 2 = 0.9I, and B i, T i, P i, S i used as before wih appropriae dimensions, here exiss a soluion, L 1 = , L 2 = Σ 1 = I, Σ =. (43) I is ineresing o noe ha for player 1, where A 1 does no weigh he wo componens equally, he corresponding L 1 is full rank, and hus reveals her complee privae informaion. Whereas for player 2, where A 2 has equal weigh componens, he corresponding L 2 is rank deficien, which implies, a equilibrium player 2 does no compleely reveal her privae informaion. Also i is easy o check from (14b) ha wih full rank L i marices, seady sae Σ i = Q i. VI. CONCLUSION In his paper, we sudy a wo-player dynamic LQG game wih asymmeric informaion and perfec recall where players privae ypes evolve as independen conrolled Markov processes. We show ha under cerain condiions, here exis sraegies ha are linear in players privae ypes which, ogeher wih Gaussian beliefs, form a PBE of he game. We show his by specializing he general mehodology developed in [13] o our model. Specifically, we prove ha (a) he common beliefs remain Gaussian under he sraegies ha are linear in players ypes where we find updae equaions for he corresponding mean and covariance processes; (b) using he backward recursive approach of [13], we compue an equilibrium generaing funcion θ by solving a fixed poin equaion in linear deerminisic parial sraegies γ for all possible common beliefs and all ime epochs. Solving his fixed poin equaion reduces o solving a marix algebraic equaion for each realizaion of he sae esimae covariance marices. Also, he cos-o-go value funcions are shown o be quadraic in privae ype and sae esimaes. This resul is one of he very few resuls available on finding signaling perfec Bayesian equilibria of a ruly dynamic game wih asymmeric informaion.

19 APPENDIX I This lemma could be inerpreed as Theorem 2.30 in [1, Ch. 7] wih appropriae marix subsiuion where specifically, heir observaion marix C k should be subsiued by our L k. We provide an alernae proof here for convenience. π+1 i is updaed from π i hrough (6). Since π i is Gaussian, γ(u i i x i ) = δ(u i L i x i m i ) is a linear deerminisic consrain and kernel Q i is Gaussian, hus π+1 i is also Gaussian. We find is mean and covariance as follows. We know ha x i +1 = A i x i + B i u + w. i Then, E[X i +1 π i, γ i, u ] = E[A i X i + B i U + W i π i, γ i, u ] (44a) = A i E[X i π i, γ i, u ] + B i u (44b) = A i E[X i L i X i = u i m i ] + B i u (44c) where (44b) follows because W i has mean zero. Suppose here exiss a marix G i such ha X i G i L i X i and L i X i are independen. Then E[X i L i X i = u i m i ] = E[X i G i L i X i + G i L i X i L i X i = u i m i ] (45a) (45b) = E[X i G i L i X i ] + G i (u i m i ) (45c) = ˆx i + G i (u i L i ˆx i m i ), (45d) where G i saisfies E[(X i G i L i X i )(L i X i ) ] = E[(X i G i L i X i )]E[(L i X i ) ] (46a) (I G i L i )E[XX i i ]L i = (I G i L i )E[X]E[X i i ]L i (I G i L i )(Σ i + ˆx i ˆx i )L i = (I G i L i )ˆx i ˆx i L i G i = Σ i L i (L i Σ i L i ) 1. (46b) (46c) (46d)

20 Σ i +1 = sm ( ) A i X i E[A i X L i i X i = u i m i ] L i X i = u i m i + Q i (47a) Now sm ( ) X i E[X L i i X i = u i m i ] L i X i = u i m i = sm ( (X i G i L i X i ) (E[X i G i L i X i L i X i = u i m i ]) L i X i = u i m i ) (48a) (48b) = sm ( (X i G i L i X i ) (E[X i G i L i X i ]) ) (48c) = sm ( (I G i L i )(X i E[X i ]) ) (48d) = (I G i L i )Σ i (I G i L i ) (48e) APPENDIX II We prove he lemma for player 1 and he resul follows for player 2 by similar argumens. For he scope of his appendix, we define B 1 = B 1 1, B 1 1, B 2 1, and for any marix V, we define V i, V i as he i h column and he i h row of V, respecively. Then he fixed poin equaion (30) can be wrien as, 0 = [ T (A 1 G 1 ) V 1 22,+1(A 1 G 1 )+ B 1 V 1 2,+1A 1 G 1 + (A 1 G 1 ) V 1 2,+1 B 1 + B 1 V 1 +1 B 1 + [ S (A 1 G 1 ) V 1 21,+1A 1 + B 1 V 1 1,+1A 1 ] L 1 ]. (49) I should be noed ha V i +1 is a funcion of Σ +1, which is updaed hrough Σ and L as Σ +1 = φ s (Σ, L ) (we drop his dependence here for ease of exposiion). Subsiuing G 1 = Σ 1 L 1 (L 1 Σ 1 L 1 ) 1 and muliplying (49) by (L 1 Σ 1 L 1 ) from lef and (Σ 1 L 1 ) from righ, we ge

21 0 = L 1 Σ 1 [ L 1 (T B 1 V 1 +1 B 1 )L 1 + A 1 V 1 22,+1A 1 + L 1 ( B 1 V 1 2,+1A 1 + S B 1 +(A 1 V 1 2,+1 B 1 + A 1 V 1 21,+1A 1 )L 1 V 1 1,+1A 1 ) ] Σ 1 L 1 (50) Le L i = L i (Σ i ) 1/2, Ā i = A i (Σ i ) 1/2, Λ 1 a(l ) := T B 1 V 1 +1 B 1 Λ 1 b(l ) := Ā 1 V 1 22,+1Ā1 Λ 1 c(l ) := B 1 V 1 2,+1Ā1 + S 1 11(Σ 1 ) 1/2 + B 1 V 1 1,+1Ā1 Λ 1 d(l ) := Ā 1 V 1 2,+1 B 1 + Ā 1 V 1 21,+1Ā1. (51a) (51b) (51c) (51d) Then, 0 = L 1 1 L Λ 1 a(l ) L 1 1 L + L 1 Λ 1 b(l ) L 1 + L 1 1 L Λ 1 c(l ) L 1 + L 1 Λ 1 d(l ) L 1 1 L (52) Since m=1, Λ 1 a is a scalar. Le L i = λ i l i, where λ i = L i 2 and l i is a normalized vecor and 1 = T 11. Moreover, since he updae of Σ in (14b) is scaling invarian, V 1 +1 only depends on he direcions l = (l 1, l 2 ). Then, (52) reduces o he following quadraic equaion in λ 1 (λ 1 ) 2 Λ 1 a(l) + λ 1 (Λ 1 c(l)l 1 + l 1 Λ 1 d(l)) + l 1 Λ 1 b(l)l 1 = 0. (53) There exiss a real-valued soluion 3 of his quadraic equaion in λ 1 if and only if (Λ c (l)l 1 + l 1 Λ 1 d(l)) 2 4Λ 1 a(l)l 1 Λ 1 b(l)l 1 (54a) l 1 (Λ 1 c (l)λ 1 c(l) + Λ 1 d(l)λ 1 d (l) + 2Λ1 d(l)λ 1 c(l) 4Λ 1 a(l)λ 1 b(l))l 1 0. (54b) Le 1 (l) := (Λ 1 c (l)λ 1 c(l) + Λ 1 d(l)λ 1 d (l) + 2Λ1 d(l)λ 1 c(l) 4Λ 1 a(l)λ 1 b(l)). (55) There exiss a soluion o he fixed poin equaion (30) if and only if l 1, l 2 R n such ha l 1 1 (l)l 1 0, 3 Noe ha a negaive sign of λ 1 can be absorbed in l 1.

22 or sufficienly 1 (l) + 1 (l) is posiive definie. REFERENCES [1] P. R. Kumar and P. Varaiya, Sochasic sysems: esimaion, idenificaion, and adapive conrol. Englewood Cliffs, NJ: Prenice-Hall, [2] H. Wisenhausen, A counerexample in sochasic opimum conrol, SIAM Journal on Conrol, vol. 6, no. 1, pp , [3] Y. C. Ho and K.-H. Chu, Team decision heory and informaion srucures in opimal conrol problems par i, Auomaic Conrol, IEEE Transacions on, vol. 17, no. 1, pp , [4] S. Yüksel, Sochasic nesedness and he belief sharing informaion paern, Auomaic Conrol, IEEE Transacions on, vol. 54, no. 12, pp , [5] A. Mahajan and A. Nayyar, Sufficien saisics for linear conrol sraegies in decenralized sysems wih parial hisory sharing, IEEE Transacions on Auomaic Conrol, vol. 60, no. 8, pp , Aug [6] M. J. Osborne and A. Rubinsein, A Course in Game Theory, ser. MIT Press Books. The MIT Press, 1994, vol. 1. [7] D. Fudenberg and J. Tirole, Game Theory. Cambridge, MA: MIT Press, [8] T. Başar, Two-crieria LQG decision problems wih one-sep delay observaion sharing paern, Informaion and Conrol, vol. 38, no. 1, pp , [9] A. Nayyar, A. Gupa, C. Langbor, and T. Başar, Common informaion based Markov perfec equilibria for sochasic games wih asymmeric informaion: Finie games, IEEE Trans. Auomaic Conrol, vol. 59, no. 3, pp , March [10] H. S. Wisenhausen, Separaion of esimaion and conrol for discree ime sysems, Proceedings of he IEEE, vol. 59, no. 11, pp , [11] E. Maskin and J. Tirole, Markov perfec equilibrium: I. observable acions, Journal of Economic Theory, vol. 100, no. 2, pp , [12] A. Gupa, A. Nayyar, C. Langbor, and T. Başar, Common informaion based Markov perfec equilibria for linear-gaussian games wih asymmeric informaion, SIAM Journal on Conrol and Opimizaion, vol. 52, no. 5, pp , [13] D. Vasal and A. Anasasopoulos, A sysemaic process for evaluaing srucured perfec Bayesian equilibria in dynamic games wih asymmeric informaion, in American Conrol Conference, Boson, US, 2016, (Acceped for publicaion), Available on arxiv. [14] Y.-C. Ho, Team decision heory and informaion srucures, Proceedings of he IEEE, vol. 68, no. 6, pp , [15] D. M. Kreps and J. Sobel, Chaper 25 signalling, ser. Handbook of Game Theory wih Economic Applicaions. Elsevier, 1994, vol. 2, pp [16] A. Nayyar, A. Mahajan, and D. Tenekezis, Decenralized sochasic conrol wih parial hisory sharing: A common informaion approach, Auomaic Conrol, IEEE Transacions on, vol. 58, no. 7, pp , 2013.

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model