Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Size: px

Start display at page:

Download "Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach"

Harry Cummings
5 years ago
Views:

1 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv: v1 [cs.sy] 8 Sep 2012 Absrac A general model of decenralized sochasic conrol called parial hisory sharing informaion srucure is presened. In his model, a each sep he conrollers share par of heir observaion and conrol hisory wih each oher. This general model subsumes several exising models of informaion sharing as special cases. Based on he informaion commonly known o all he conrollers, he decenralized problem is reformulaed as an equivalen cenralized problem from he perspecive of a coordinaor. The coordinaor knows he common informaion and selec prescripions ha map each conroller s local informaion o is conrol acions. The opimal conrol problem a he coordinaor is shown o be a parially observable Markov decision process (POMDP) which is solved using echniques from Markov decision heory. This approach provides (a) srucural resuls for opimal sraegies, and (b) a dynamic program for obaining opimal sraegies for all conrollers in he original decenralized problem. Thus, his approach unifies he various ad-hoc approaches aken in he lieraure. In addiion, he srucural resuls on opimal conrol sraegies obained by he proposed approach canno be obained by he exising generic approach (he person-by-person approach) for obaining srucural resuls in decenralized problems; and he dynamic program obained by he proposed approach is simpler han ha obained by he exising generic approach (he designer s approach) for obaining dynamic programs in decenralized problems. Index Terms Decenralized Conrol, Sochasic Conrol, Informaion Srucures, Markov Decision Theory, Team Theory I. INTRODUCTION Sochasic conrol heory provides analyic and compuaional echniques for cenralized decision making in sochasic sysems wih noisy observaions. For specific models such as Markov decision processes and linear quadraic and Gaussian sysems, sochasic conrol gives Preliminary version of his paper appeared in he proceedings of he 46h Alleron conference on communicaion, conrol, and compuaion, 2008 (see [1]).

2 2 resuls ha are inuiively appealing and compuaionally racable. However, hese resuls are derived under he assumpion ha all decisions are made by a cenralized decision maker who sees all observaions and perfecly recalls pas observaions and acions. This assumpion of a cenralized decision maker is no rue in a number of modern conrol applicaions such as neworked conrol sysems, communicaion and queuing neworks, sensor neworks, and smar grids. In such applicaions, decisions are made by muliple decision makers who have access o differen informaion. In his paper, we invesigae such problems of decenralized sochasic conrol. The echniques from cenralized sochasic conrol canno be direcly applied o decenralized conrol problems. Noneheless, wo general soluion approaches ha indirecly use echniques from cenralized sochasic conrol have been used in he lieraure: (i) he person-by-person approach which akes he viewpoin of an individual decision maker (DM); and (ii) he designer s approach which akes he viewpoin of he collecive eam of DMs. The person-by-person approach invesigaes he decenralized conrol problem from he viewpoin of one DM, say DM i and proceeds as follows: (i) arbirarily fix he sraegy of all DMs excep DM i; and (ii) use cenralized sochasic conrol o derive srucural properies for he opimal bes-response sraegy of DM i. If such a srucural propery does no depend on he choice of he sraegy of oher DMs, hen i also holds for globally opimal sraegy of DM i. By cyclically using his approach for all DMs, we can idenify he srucure of globally opimal sraegies for all DMs. A variaion of his approach may be used o idenify person-by-person opimal sraegies. The variaion proceeds ieraively as follows. Sar wih an iniial guess for he sraegies of all DMs. A each ieraion, selec one DM (say DM i), and change is sraegy o he bes response sraegy given he sraegy of all oher DMs. Repea he process unil a fixed poin is reached, i.e., when no DM can improve performance by unilaerally changing is sraegy. The resuling sraegies are person-by-person opimal [2], and in general, no globally opimal. In summary, he person-by-person approach idenifies srucural properies of globally opimal sraegies and provides an ieraive mehod o obain person-by-person opimal sraegies. This mehod has been successfully used o idenify srucural properies of globally opimal sraegies for various applicaions including real-ime communicaion [3] [7], decenralized hypohesis esing and quickes change deecion [8] [16], and neworked conrol sysems [17] [19]. Under

3 3 cerain condiions, he person-by-person opimal sraegies found by his approach are globally opimal [2], [20], [21]. The designer s approach, which is developed in [22], [23], invesigaes he decenralized conrol problem from he viewpoin of he collecive eam of DMs or, equivalenly, from he viewpoin of a sysem designer who knows he sysem model and probabiliy disribuion of he primiive random variables and chooses conrol sraegies for all DMs. Effecively, he designer is solving a cenralized planning problem. The designer s approach proceeds by: (i) modeling his cenralized planning problem as a muli-sage, open-loop sochasic conrol problem in which he designer s decision a each ime is he conrol law for ha ime for all DMs; and (ii) using cenralized sochasic conrol o obain a dynamic programming decomposiion. Each sep of he resuling dynamic program is a funcional opimizaion problem (in conras o cenralized dynamic programming where each sep is a parameer opimizaion problem). The designer approach is ofen used in andem wih he person-by-person approach as follows. Firs, he person-by-person approach is used o idenify srucural properies of globally opimal sraegies. Then, resricing aenion o sraegies wih he idenified srucural propery, he designer s approach is used o obain a dynamic programming decomposiion for selecing he globally opimal sraegy. Such a andem approach has been used in various applicaions including real-ime communicaion [4], [24], [25], decenralized hypohesis esing [13], and neworked conrol sysems [17], [18]. In addiion o he above general approaches, oher specialized approaches have been developed o address specific problems in decenralized sysems. Decenralized problems wih parially nesed informaion srucure were defined and sudied in [26]. Decenralized linear quadraic Gaussian (LQG) conrol problems wih wo conrollers and parially nesed informaion srucure were sudied in [27], [28]. Parially nesed decenralized LQG problems wih conrollers conneced via a graph were sudied in [29], [30]. A generalizaion of parial nesedness called sochasic nesedness was defined and sudied in [31]. An imporan propery of LQG conrol problems wih parially nesed informaion srucure is ha here exiss an affine conrol sraegy which is globally opimal. In general, he problem of finding he bes affine conrol sraegies may no be a convex opimizaion problem. Condiions under which he problem of deermining opimal conrol sraegies wihin he class of affine conrol sraegies becomes a convex opimizaion problem were idenified in [32], [33].

4 4 Decenralized sochasic conrol problems wih specific models of informaion sharing among conrollers have also been sudied in he lieraure. Examples include sysems wih delayed sharing informaion srucures [34] [36], sysems wih periodic sharing informaion srucure [37], conrol sharing informaion srucure [38], [39], sysems wih broadcas informaion srucure [19], and sysems wih common and privae observaions [1]. In his paper, we presen a new general model of decenralized sochasic conrol called parial hisory sharing informaion srucure. In his model, we assume ha: (i) conrollers sequenially share par of heir pas daa (pas observaions and conrol) wih each oher by means of a shared memory; and (ii) all conrollers have perfec recall of commonly available daa (common informaion). This model subsumes a large class of decenralized conrol models in which informaion is shared among he conrollers. For his model, we presen a general soluion mehodology ha reformulaes he original decenralized problem ino an equivalen cenralized problem from he perspecive of a coordinaor. The coordinaor knows he common informaion and selecs prescripions ha map each conroller s local informaion o is conrol acions. The opimal conrol problem a he coordinaor is shown o be a parially observable Markov decision process (POMDP) which is solved using echniques from Markov decision heory. This approach provides (a) srucural resuls for opimal sraegies, and (b) a dynamic program for obaining opimal sraegies for all conrollers in he original decenralized problem. Thus, his approach unifies he various ad-hoc approaches aken in he lieraure. A similar soluion approach is used in [36] for a model ha is a special case of he model presened in his paper. We presen an informaion sae (Eq. (51)) for he model of [36] ha is simpler han ha presened in [36, Theorem 2]. A preliminary version of he general soluion approach presened here was presened in [1] for a model ha had feaures (e.g., direc bu noisy communicaion links beween conrollers) ha are no necessary for parial hisory sharing. However, i can be shown ha by suiable redefiniion of variables, he model in [1] can be recas as an insance of he model in his paper and vice versa (see Appendix C). The informaion sae for parial hisory sharing ha is presened in his paper (see Thereom 4) is simpler han ha presened in [1, Eq. (39)].

5 5 A. Common Informaion Approach for a Saic Team Problem We firs illusrae how common informaion can be used in a saic eam problem wih wo conrollers. Le X denoe he sae of naure and Y, Y 1, Y 2 be hree correlaed random variables ha depend on X. Assume ha he join disribuion of (X, Y, Y 1, Y 2 ) is given. Conroller i, i = 1, 2, observes (Y, Y i ) and chooses a conrol acion U i = g i (Y, Y i ). The sysem incurs a cos l(x, U 1, U 2 ). The conrol objecive is o choose (g 1, g 2 ) o minimize J(g 1, g 2 ) := E (g1,g 2) [l(x, U 1, U 2 )] If all he sysem variables are finie valued, we can solve he above opimizaion problem by a brue force search over all conrol sraegies (g 1, g 2 ). For example, if all variables are binary valued, we need o compue he performance of = 256 conrol sraegies and choose he one wih he bes performance. In his example, boh conrollers have a common observaion Y. One of he main ideas of his paper is o use such common informaion among he conrollers o simplify he search process as follows. Insead of specifying he conrol sraegies (g 1, g 2 ) direcly, we consider a coordinaed sysem in which a coordinaor observes he common informaion Y and chooses prescripions (Γ 1, Γ 2 ) where Γ i is a mapping from Y i o U i, i = 1, 2. Hence, (Γ 1, Γ 2 ) = d(y ), where d is called he coordinaion sraegy. The coordinaor hen communicaes hese prescripions o he conrollers who simply use hem o choose U i = Γ i (Y i ), i = 1, 2. I is easy o verify (see Proposiion 3 for a formal proof) ha choosing he conrol sraegies (g 1, g 2 ) in he original sysem is equivalen o choosing a coordinaion sraegy d in he coordinaed sysem. The problem of choosing he bes coordinaion sraegy, however, is a cenralized problem in which he coordinaor is he only decision-maker. For example, consider he case when all sysem variables are binary valued. For any coordinaion sraegy d, le (γ0, 1 γ0) 2 = d(0) and (γ1, 1 γ1) 2 = d(1). Then, he cos associaed wih his coordinaion sraegy is given as: J(d) := E (d) [l(x, U 1, U 2 )] = P(Y = 0)E[l(X, γ0(y 1 1 ), γ0(y 2 2 )) Y = 0] + P(Y = 1)E[l(X, γ1(y 1 1 ), γ1(y 2 2 )) Y = 1] To minimize he above cos, we can minimize he wo erms separaely. Therefore, o find he bes coordinaion sraegy d, we can search for opimal prescripions for he cases Y = 0 and Y = 1

6 6 separaely. Searching for he bes prescripions for each of hese cases involves compuing he performance of = 16 prescripion pairs and choosing he one wih he bes performance. Thus, o find he bes coordinaion sraegy, we need o evaluae he performance of = 32 prescripion pairs. Conras his wih he 256 sraegies whose coss we need o evaluae o solve he original problem by brue force. The above example described a saic sysem and illusraes ha common informaion can be exploied o conver he decenralized opimizaion problem ino a cenralized opimizaion problem involving a coordinaor. In his paper, we build upon his basic idea and presen a soluion approach based on common informaion ha works for dynamical decenralized sysems as well. Our approach convers he decenralized problem ino a cenralized sochasic conrol problem (in paricular, a parially observable Markov decision process), idenifies srucure of opimal conrol sraegies, and provides a dynamic program like decomposiion for he decenralized problem. B. Conribuions of he Paper We inroduce a general model of decenralized sochasic conrol problem in which muliple conrollers share par of heir informaion wih each oher. We call his model he parial hisory sharing informaion srucure. This model subsumes several exising models of informaion sharing in decenralized sochasic conrol as special cases (see Secion II-B). We esablish wo resuls for our model. Firsly, we esablish a srucural propery of opimal conrol sraegies. Secondly, we provide a dynamic programming decomposiion of he problem of finding opimal conrol sraegies. As in [1], [36], our resuls are derived using a common informaion based approach (see Secion III). This approach differs from he person-by-person approach and he designer s approach menioned earlier. In paricular, he srucural properies found in his paper canno be found by he person-by-person approach described earlier. Moreover, he dynamic programming decomposiion found in his paper is disinc from and simpler han he dynamic programming decomposiion based on he designer s approach. For a general framework for using common informaion in sequenial decision making problems, see [40]. C. Noaion Random variables are denoed by upper case leers; heir realizaion by he corresponding lower case leer. For inegers a b and c d, X a:b is a shor hand for he vecor (X a, X a+1,..., X b )

7 7 while X c:d is a shor hand for he vecor (X c, X c+1,..., X d ). When a > b, X a:b equals he empy se. The combined noaion X c:d a:b is a shor hand for he vecor (Xj i : i = a, a + 1,..., b, j = c, c + 1,..., d). In general, subscrips are used as ime index while superscrips are used o index conrollers. Bold leers X are used as a shor hand for he vecor (X 1:n ). P( ) is he probabiliy of an even, E( ) is he expecaion of a random variable. For a collecion of funcions g, we use P g ( ) and E g ( ) o denoe ha he probabiliy measure/expecaion depends on he choice of funcions in g. 1 A ( ) is he indicaor funcion of a se A. For singleon ses {a}, we also denoe 1 {a} ( ) by 1 a ( ). For a singleon a and a se B, {a, B} denoes he se {a} B. For wo ses A and B, {A, B} denoes he se A B. For wo finie ses A, B, F (A, B) is he se of all funcions from A o B. Also, if A =, F (A, B) := B. For a finie se A, (A) is he se of all probabiliy mass funcions over A. For he ease of exposiion, we assume ha all sae, observaion and conrol variables ake values in finie ses. For wo random variables (or random vecors) X and Y aking values in X and Y, P(X = x Y ) denoes he condiional probabiliy of he even {X = x} given Y and P(X Y ) denoes he condiional PMF (probabiliy mass funcion) of X given Y, ha is, i denoes he collecion of condiional probabiliies P(X = x Y ), x X. Finally, all equaliies involving random variables are o be inerpreed as almos sure equaliies (ha is, hey hold wih probabiliy one). D. Organizaion The res of his paper is organized as follows. We presen our model of a decenralized sochasic conrol problem in Secion II. We also presen several special cases of our model in his secion. We prove our main resuls in Secion III. We apply our resul o some special cases in Secion III-B. We presen a simplificaion of our resul and a generalizaion of our model in Secion IV. We consider he infinie ime-horizon discouned cos analogue of our problem in Secion V. Finally, we conclude in Secion VI. II. PROBLEM FORMULATION A. Basic model: Parial Hisory Sharing Informaion Srucure 1) The Dynamic Sysem: Consider a dynamic sysem wih n conrollers. The sysem operaes in discree ime for a horizon T. Le X X denoe he sae of he sysem a ime, U i U i

8 8 denoe he conrol acion of conroller i, i = 1,..., n a ime, and U denoe he vecor (U 1,..., U n ). The iniial sae X 1 has a probabiliy disribuion Q 1 and evolves according o X +1 = f (X, U, W 0 ), (1) where {W 0 } T =1 is a sequence of i.i.d. random variables wih probabiliy disribuion Q 0 W. 2) Daa available a he conroller: A any ime, each conroller has access o hree ypes of daa: curren observaion, local memory, and shared memory. (i) Curren local observaion: Each conroller makes a local observaion Y i of he sysem a ime, Y i Y i on he sae = h i (X, W i ), (2) where {W i } T =1 is a sequence of i.i.d. random variables wih probabiliy disribuion Q i W. We assume ha he random variables in he collecion {X 1, W j, = 1,..., T, j = 0, 1,..., n}, called primiive random variables, are muually independen. (ii) Local memory : Each conroller sores a subse M i of is pas local observaions and is pas acions in a local memory: A = 1, he local memory is empy, M i 1 =. M i {Y i 1: 1, U i 1: 1}. (3) (iii) Shared memory: In addiion o is local memory, each conroller has access o a shared memory. The conens C of he shared memory a ime are a subse of he pas local observaions and conrol acions of all conrollers: C {Y 1: 1, U 1: 1 } (4) where Y and U denoe he vecors (Y 1,..., Y n ) and (U 1,..., U n ) respecively. A = 1, he shared memory is empy, C 1 =. Conroller i chooses acion U i Specifically, for every conroller i, i = 1,..., n, as a funcion of he oal daa (Y i, M i, C ) available o i. U i = g i (Y i, M i, C ), (5) where g i is called he conrol law of conroller i. The collecion g i = (g1, i..., gt i ) is called he conrol sraegy of conroller i. The collecion g 1:n = (g 1,..., g n ) is called he conrol sraegy of he sysem.

9 9 3) Updae of local and shared memories: (i) Shared memory updae: Afer aking he conrol acion a ime, he local informaion a conroller i consiss of he conens M i of is local memory, is local observaion Y i and is conrol acion U i. Conroller i sends a subse Z i of his local informaion {M i, Y i, U i } o he shared memory. The subse Z i is chosen according o a pre-specified proocol. The conens of shared memory are nesed in ime, ha is, he conens C +1 of he shared memory a ime + 1 are he conens C a ime augmened wih he new daa Z = (Z 1, Z 2,..., Z n ) sen by all he conrollers a ime : C +1 = {C, Z }. (6) (ii) Local memory updae: Afer aking he conrol acion and sending daa o he shared memory a ime, conroller i updaes is local memory according o a pre-specified proocol. The conen M i +1 of he local memory can a mos equal he oal local informaion {M i, Y i, U i } a he conroller. However, o ensure ha he local and shared memories a ime + 1 don overlap, we assume ha M i +1 {M i, Y i, U i } \ Z i. (7) Figure 1 shows he ime order of observaions, acions and memory updaes. We refer o he + 1 Shared Memory C Z C +1 Conroller 1 M 1 Y 1 U 1 Z 1 M 1 +1 Conroller n M n Y n U n Z n M n +1 Fig. 1. Time ordering of Observaions, Acions and Memory Updaes above model as he parial hisory sharing informaion srucure. 4) The opimizaion problem: A ime, he sysem incurs a cos l(x, U ). The performance of he conrol sraegy of he sysem is measured by he expeced oal cos J(g 1:n ) := E g1:n[ T ] l(x, U ), (8) =1

10 10 where he expecaion is wih respec o he join probabiliy measure on (X 1:T, U 1:T ) induced by he choice of g 1:n. We are ineresed in he following opimizaion problem. Problem 1 For he model described above, given he sae evoluion funcions f, he observaion funcions h i, he proocols for updaing local and share memory, he cos funcion l, he disribuions Q 1, Q i W, i = 0, 1,..., n, and he horizon T, find a conrol sraegy g1:n for he sysem ha minimizes he expeced oal cos given by (8). B. Special Cases: The Models In he above model, alhough we have no specified he exac proocols by which conrollers updae he local and shared memories, we assume ha pre-specified proocols are being used. Differen choices of his proocol resul in differen informaion srucures for he sysem. In his secion, we describe several models of decenralized conrol sysems ha can be viewed as special cases of our model by assuming a paricular choice of proocol for local and shared memory updaes. 1) Delayed Sharing Informaion Srucure: Consider he following special case of he model of Secion II-A. (i) The shared memory a he beginning of ime is C = {Y 1: s, U 1: s }, where s 1 is a fixed number. The local memory a he beginning of ime is M i = {Y i s+1: 1, U i s+1: 1}. (ii) A each ime, afer aking he acion U i, conroller i sends Z i = {Y i s+1, U i s+1} o he shared memory and he shared memory a + 1 becomes C +1 = {Y 1: s+1, U 1: s+1 }. (iii) Afer sending Z i = {Y i s+1, U i s+1} o he shared memory, conroller i updaes he local memory o M i +1 = {Y i s+2:, U i s+2:}. In his spacial case, he observaions and conrol acions of each conroller are shared wih every oher conroller afer a delay of s ime seps. Hence, he above special case corresponds o he delayed sharing informaion srucure considered in [34], [36], [41]. 2) Delayed Sae Sharing Informaion Srucure: A special case of he delayed sharing informaion srucure (which iself is a special case of our basic model) is he delayed sae sharing informaion srucure [35]. This informaion srucure can be obained from he delayed sharing informaion srucure by making he following assumpions:

11 11 (i) The sae of he sysem a ime is a n-dimensional vecor X = (X 1, X 2,..., X n ). (ii) A each ime, he curren local observaion of conroller i is Y i = X i, for i = 1, 2,..., n. In his spacial case, he complee sae vecor X is available o all conrollers afer a delay of s ime seps. 3) Periodic Sharing Informaion Srucure: Consider he following special case of he model of Secion II-A where conrollers updae he shared memory periodically wih period s 1: (i) For ime ks < (k + 1)s, where k = 0, 1, 2,..., he shared memory a he beginning of ime is C = {Y 1:ks, U 1:ks }. The local memory a he beginning of ime is M i = {Y i ks+1: 1, U i ks+1: 1 }. (ii) A each ime = (k + 1)s, k = 1, 2,..., afer aking he acion U i, conroller i sends Z i = {Yks+1:(k+1)s i, U ks+1:(k+1)s i } o he shared memory. A oher imes, each conroller does no send anyhing (hus Z i = ). (iii) Afer sending Z i o he shared memory, conroller i updaes he local memory o M i +1 = {M i, Y i, U i } \ Z i. In his spacial case, he enire hisory of observaions and conrol acions are shared periodically beween conrollers wih period s. Hence, he above special case corresponds o he periodic sharing informaion srucure considered in [37]. 4) Conrol Sharing Informaion Srucure: Consider he following special case of he model of Secion II-A. (i) The shared memory a he beginning of ime is C = {U 1: 1 }. The local memory a he beginning of ime is M i = {Y i 1: 1}. (ii) A each ime, afer aking he acion U i, conroller i sends Z i = {U i } o he shared memory. (iii) Afer sending Z i = U i M i +1 = Y i 1:. o he shared memory, conroller i updaes he local memory o In his spacial case, he conrol acions of each conroller are shared wih every oher conroller afer a delay of 1 ime sep. Hence, he above special case corresponds o he conrol sharing informaion srucure considered in [38]. 5) No Shared Memory wih or wihou finie local memory: Consider he following special case of he model of Secion II-A.

12 12 (i) The shared memory a each ime is empy, C = and he local memory a he beginning of ime is M i = {Y i s: 1, U i s: 1}, where s 1 is a fixed number. (ii) Conrollers do no send any daa o shared memory, Z i =. (iii) A he end of ime, conrollers updae heir local memories o M i +1 = {Y i s+1:, U i s+1:}. In his special case, he conrollers don share any daa. The above model is relaed o he finie-memory conroller model of [42]. A relaed special case is he siuaion where he local memory a each conroller consiss of all of is pas local observaions and is pas acions, ha is, M i = {Y i 1: 1, U i 1: 1}. Remark 1 All he special cases considered above are examples of symmeric sharing. Tha is, differen conrollers updae heir local memories according o idenical proocols and he daa sen by a conroller o he shared memory is seleced according o idenical proocols. However, his symmery is no required for our model. Consider for example, he delayed sharing informaion srucure where a he end of ime, conroller i sends Y i s i, U i s i o he shared memory, wih s i, i = 1, 2,..., n, being fixed, bu no necessarily idenical, numbers. This kind of asymmeric sharing is also a special case of our model. III. MAIN RESULTS For cenralized sysems, sochasic conrol heory provides wo imporan analyical resuls. Firsly, i provides a srucural resul. This resul saes ha here is an opimal conrol sraegy which selecs conrol acions as a funcion only of he conroller s poserior belief on he sae of he sysem condiioned on all is observaions and acions ill he curren ime. The conroller s poserior belief is called is informaion sae. Secondly, sochasic conrol heory provides a dynamic programming decomposiion of he problem of finding opimal conrol sraegies in cenralized sysems. This dynamic programming decomposiion allows one o evaluae he opimal acion for each realizaion of he conroller s informaion sae in a backward inducive manner. In his secion, we provide a srucural resul and a dynamic programming decomposiion for he decenralized sochasic conrol problem wih parial informaion sharing formulaed above (Problem 1). The main idea of he proof is o formulae an equivalen cenralized sochasic conrol problem; solve he equivalen problem using classical sochasic-conrol echniques; and ranslae he resuls back o he basic model. For ha maer, we proceed as follows:

13 13 1) Formulae a cenralized coordinaed sysem from he poin of view of a coordinaor ha observes only he common informaion among he conrollers in he basic model, i.e., he coordinaor observes he shared memory C bu no he local memories (M i, i = 1,..., n) or local observaions (Y i, i = 1,..., n). The coordinaor chooses prescripions Γ = (Γ 1,..., Γ n ), where Γ i is a mapping from (Y i, M i ) o U i, i = 1,..., n. 2) Show ha he coordinaed sysem is a POMDP (parially observable Markov decision process). 3) For he coordinaed sysem, deermine he srucure of an opimal coordinaion sraegy and a dynamic program o find an opimal coordinaion sraegy. 4) Show ha any sraegy of he coordinaed sysem is implemenable in he basic model wih he same value of he oal expeced cos. Conversely, any sraegy of he basic model is implemenable in he coordinaed sysem wih he same value of he oal expeced cos. Hence, he wo sysems are equivalen. 5) Translae he srucural resuls and dynamic programming decomposiion of he coordinaed sysem (obained in sage 3) o he basic model. Sage 1: The coordinaed sysem Consider a coordinaed sysem ha consiss of a coordinaor and n passive conrollers. The coordinaor knows he shared memory C a ime, bu no he local memories (M i, i = 1,..., n) or local observaions (Y i, i = 1,..., n). A each ime, he coordinaor chooses mappings Γ i : Y i M i U i, i = 1, 2,..., n, according o Γ = d (C, Γ 1: 1 ), (9) where Γ = (Γ 1, Γ 2,..., Γ n ). The funcion d is called he coordinaion rule a ime and he collecion of funcions d := (d 1,..., d T ) is called he coordinaion sraegy. The seleced Γ i is communicaed o conroller i a ime. The funcion Γ i ells conroller i how o process is curren local observaion and is local memory a ime ; for ha reason, we call Γ i he coordinaor s prescripion o conroller i. Conroller i generaes an acion using is prescripion as follows: U i = Γ i (Y i, M i ). (10)

14 14 For his coordinaed sysem, he sysem dynamics, he observaion model and he cos are he same as he basic model of Secion II-A: he sysem dynamics are given by (1), each conroller s curren observaion is given by (2) and he insananeous cos a ime is l(x, U ). As before, he performance of a coordinaion sraegy is measured by he expeced oal cos [ T ] Ĵ(d) = E l(x, U ), (11) =1 where he expecaion is wih respec o a join measure on (X 1:T, U 1:T ) induced by he choice of d. In his coordinaed sysem, we are ineresed in he following opimizaion problem: Problem 2 For he model of he coordinaed sysem described above, find a coordinaion sraegy d ha minimizes he oal expeced cos given by (11). Sage 2: The coordinaed sysem as a POMDP We will now show ha he coordinaed sysem is a parially observed Markov decision process. For ha maer, we firs describe he model of POMDPs [43]. POMDP Model: A parially observable Markov decision process consiss of a sae process S S, an observaion process O O, an acion process A A, = 1, 2,..., T, and a single decision-maker where 1) The acion a ime is chosen by he decision-maker as a funcion of observaion and acion hisory, ha is, A = d (O 1:, A 1: 1 ), (12) d is he decision rule a ime. 2) Afer he acion a ime is aken, he new sae and new observaion are generaed according o he ransiion probabiliy rule P(S +1, O +1 S 1:, O 1:, A 1: ) = P(S +1, O +1 S, A ). (13) 3) A each ime, an insananeous cos l(s, A ) is incurred. 4) The opimizaion problem for he decision-maker is o choose a decision sraegy d := (d 1,..., d T ) o minimize a oal cos given as T E[ l(s, A )]. (14) =1

15 15 The following well-known resuls provides he srucure of opimal sraegies and a dynamic program for POMDPs. For deails, see [43]. Theorem 1 (POMDP Resul) Le Θ be he condiional probabiliy disribuion of he sae S a ime given he observaions O 1: and acions A 1: 1, Then, Θ (s) = P(S = s O 1:, A 1: 1 ), s S. 1) Θ +1 = η (Θ, A, O +1 ), where η is he sandard non-linear filer: If θ, a, o +1 are he realizaions of Θ, A and O +1, hen he realizaion of s h elemen of he vecor Θ +1 is θ θ +1 (s) = s (s )P(S +1 = s, O +1 = o +1 S = s, A = a ) ŝ, s θ (ŝ)p(s +1 = s, O +1 = o +1 S = ŝ, A = a ) =: η s (θ, a, o +1 ) (15) and η (θ, a, o +1 ) is he vecor (η s (θ, a, o +1 )) s S. 2) There exiss an opimal decision sraegy of he form A = ˆd (Θ ). Furher, such a sraegy can be found by he following dynamic program: and for 1 T 1, V T (θ) = inf a E{ l(s T, a) Θ T = θ}, (16) V (θ) = inf a E{ l(s, a) + V +1 (η (θ, a, O +1 )) Θ = θ, A = a }. (17) We will now show ha he coordinaed sysem can be viewed as an insance of he above POMDP model by defining he sae process as S := {X, Y, M }, he observaion process as O := Z 1, and he acion process A := Γ. Lemma 1 For he coordinaed sysem of Problem 2, 1) There exis funcions f and h, = 1,..., T, such ha and S +1 = f (S, Γ, W 0, W +1 ), (18) Z = h (S, Γ ). (19)

16 16 In paricular, we have ha P(S +1, Z S 1:, Z 1: 1, Γ 1: ) = P(S +1, Z S, Γ ). (20) 2) Furhermore, here exiss a funcion l such ha Thus, he objecive of minimizing (11) is same as minimizing l(x, U ) = l(s, Γ ). (21) [ T ] Ĵ(d) = E l(s, Γ ). (22) =1 Proof: The exisence of f follows from (1), (2), (10), (7) and he definiion of S. The exisence of h follows from he fac ha Z i is a fixed subse of {M i, Y i, U i }, equaion (10) and he definiion of S. Equaion (20) follows from (18) and he independence of W 0, W +1 from all random variables in he condiioning in he lef hand side of (20). The exisence of l follows from he definiion of S and (10). form Recall ha he coordinaor is choosing is acions according o a coordinaion sraegy of he Γ = d (C, Γ 1: 1 ) = d (Z 1: 1, Γ 1: 1 ). (23) Equaion (23) and Lemma 1 imply ha he coordinaed sysem is an insance of he POMDP model described above. Sage 3: Srucural resul and dynamic program for he coordinaed sysem Since he coordinaed sysem is a POMDP, Theorem 1 gives he srucure of he opimal coordinaion sraegies. For ha maer, define coordinaor s informaion sae Π := P(S Z 1: 1, Γ 1: 1 ) = P(S C, Γ 1: 1 ). (24) Then, we have he following: Proposiion 1 For Problem 2, here is no loss of opimaliy in resricing aenion o coordinaion rules of he form Γ = ˆd (Π ). (25)

17 17 Furhermore, an opimal coordinaion sraegy of he above form can be found using a dynamic program. For ha maer, observe ha we can wrie Π +1 = η (Π, Z, Γ ) (26) where η is he sandard non-linear filering updae funcion (see Appendix A). We denoe by B he space of possible realizaions of Π. Thus, B := (X Y 1 M 1... Y n M n ). (27) Recall ha F (Y i M i, U i ) is he se of all funcions from Y i M i o U i (see Secion I-C). Then, we have he following resul. Proposiion 2 For all π in B, define V T (π) = and for 1 T 1, V (π) = inf { γ i T F (Yi T Mi T,U i T ),1 i n} E[ l(s, Γ T ) Π = π, Γ T = (γ 1 T,..., γ n T )], (28) inf E[ l(s, Γ ) + V +1 (η (Π, Γ, Z ) Π = π, Γ = (γ 1 { γ i F (Y i Mi,U i),1 i n},..., γ n )]. Then he arg inf a each ime sep gives he coordinaor s opimal prescripions for he conrollers when he coordinaor s informaion sae is π. Proposiion 2 gives a dynamic program for he coordinaor s problem (Problem 2). Since he coordinaed sysem is a POMDP, i implies ha compuaional algorihms for POMDPs can be used o solve he dynamic program for he coordinaor s problem as well. We refer he reader o [44] and references herein for a review of algorihms o solve POMDPs. (29) Sage 4: Equivalence beween he wo models We firs observe ha since C s C, for all s <, under any given coordinaion sraegy d, we can use C o evaluae he pas prescripions by recursive subsiuion. For example, for = 2, 3, he pas prescripions can be evaluaed as funcions of C 2, C 3 as follows: Γ 1 = d 1 (C 1 ) =: d 1 (C 2 ), Γ 2 = d 2 (C 2, Γ 1 ) = d 2 (C 2, d 1 (C 2 )) =: d 2 (C 3 )

18 18 We can now sae he following resul. Proposiion 3 The basic model of Secion II-A and he coordinaed sysem are equivalen. More precisely: (a) Given any conrol sraegy g 1:n for he basic model, choose a coordinaion sraegy d for he coordinaed sysem of sage 1 as d (C ) = ( g 1 (,, C ),..., g n (,, C ) ). Then Ĵ(d) = J(g1:n ). (b) Conversely, for any coordinaion sraegy for he coordinaed sysem, choose a conrol sraegy g 1:n for he basic model as g1(, i, C 1 ) = d i 1(C 1 ), and g(, i, C ) = d i (C, Γ 1: 1 ), where Γ k = d k (C k, Γ 1:k 1 ), k = 1, 2,..., 1 and d i ( ) is he i-h componen of d ( ) (ha is, d i ( ) gives he coordinaor s prescripion for he i-h conroller). Then, J(g 1:n ) = Ĵ(d). Proof: See Appendix B. Sage 5: Srucural resul and dynamic program for he basic model Combining Proposiion 1 wih Proposiion 3, we ge he following srucural resul for Problem 1. Theorem 2 (Srucural Resul for Opimal Conrol Sraegies) In Problem 1, here exis opimal conrol sraegies of he form U i = ĝ i (Y i, M i, Π ), i = 1, 2,..., n, (30) where Π is he condiional disribuion on X, Y, M given C, defined as Π (x, y, m) := Pĝ1:n 1: 1 (X = x, Y = y, M = m C ), (31) for all possible realizaions (x, y, m) of (X, Y, M ). (27). We call Π he common informaion sae. Recall ha Π akes values in he se B defined in

19 19 Consider a conrol sraegy ĝ i for conroller i of he form specified in Theorem 2. The conrol law ĝ i a ime is a funcion from he space Y i M i B o he space of decisions U i. Equivalenly, he conrol law ĝ i can be represened as a collecion of funcions {ĝ i (,, π)} π B, where each elemen of his collecion is a funcion from Y i M i o U i. An elemen ĝ i (,, π) of his collecion specifies a conrol acion for each possible realizaion of Y i, M i and a fixed realizaion π of Π. We call ĝ i (,, π) he parial conrol law of conroller i a ime for he given realizaion π of he common informaion sae Π. We now use Proposiion 2 o describe a dynamic programming decomposiion of he problem of finding opimal conrol sraegies. This dynamic programming decomposiion allows us o evaluae opimal parial conrol laws for each realizaion π of he common informaion sae in a backward inducive manner. Recall ha B is he space of all possible realizaions of Π (see (27)) and F (Y i M i, U i ) is he se of all funcions from Y i M i o U i (see Secion I-C). Theorem 3 (Dynamic Programming Decomposiion) Define he funcions V : B R, for = 1,..., T as follows: V T (π) = and for 1 T 1, V (π) = inf E{l(X T, γ 1 { γ T i F (Yi T Mi T,U T i ),1 i n} T (YT 1, MT 1 ),..., γ T n (YT n, MT n )) Π T = π}, (32) inf E { l(x, γ 1 { γ i F (Yi Mi,U i),1 i n} (Y 1, M i ),..., γ n (Y n, M n ))+ where η is a B +1 -valued funcion defined in (26) and Appendix A. V +1 (η (π, γ 1,..., γ n, Z )) Π = π }, (33) For = 1,..., T and for each π B, an opimal parial conrol law for conroller i is he minimizing choice of γ i in he definiion of V (π). Le Ψ (π) denoe he arg inf of he righ hand side of V (π), and Ψ i denoe is i-h componen. Then, an opimal conrol saegy is given by: ĝ i (,, π) = Ψ i (π). (34) A. Comparison wih Person by Person and Designer Approaches The common informaion based approach adoped above differs from he person-by-person approach and he designer s approach menioned in he inroducion. In paricular, he srucural

20 20 resul of Theorem 2 canno be found by he person-by- person approach. If we fix sraegies of all bu he ih conroller o an arbirary choice, hen i is no necessarily opimal for conroller i o use a sraegy of he form in Theorem 2. This is because if conroller j s sraegy uses he enire common informaion C, hen conroller i, in general, would need o consider he enire common informaion o beer predic conroller j s acions and hence conroller i s opimal choice of acion may oo depend on he enire common informaion. The use of common informaion based approach allowed us o prove ha all conrollers can joinly use sraegies of he form in Theorem 2 wihou loss of opimaliy. The dynamic programming decomposiion of Theorem 3 is simpler han any dynamic programming decomposiion obained using he designer s approach. As described earlier, he designer s approach models he decenralized conrol problem as an open-loop cenralized planning problem in which a designer a each sage chooses conrol laws g i ha map (Y i, M i, C ) o U i, i = 1,..., n. On he oher hand, he common-informaion approach developed in his paper models he decenralized conrol problem as a closed-loop cenralized planning problem in which a coordinaor a each sage chooses he parial conrol laws γ i ha map (Y i, M i ) o U i, i = 1,..., n. The space of parial conrol laws is always smaller han he space of full conrol laws; if he common informaion is non-empy, hen hey are sricly smaller. Thus, he dynamic programming decomposiion of Theorem 3 is simpler han ha obained by he designer s approach. This simplificaion is bes illusraed by he example of Secion IV-C1 where all conrollers receive a common observaion Y com. For his example, we show ha our informaion sae (and hence our dynamic program) reduce o P(X Y com 1: ), which is idenical o he informaion sae of cenralized sochasic conrol. In conras, he informaion sae P(X, Y com 1: ) obained by he designer s approach is much more complicaed. B. Special Cases: The Resuls In Secion II-B, we described several models of decenralized conrol problems ha are special cases of he model described in Secion II-A. In his secion, we sae he resuls of Theorems 2 and 3 for hese models. 1) Delayed Sharing Informaion Srucure: Corollary 1 In he delayed sharing informaion srucure of secion II-B1, here exis opimal

21 21 conrol sraegies of he form U i = ĝ i (Y i s+1:, U i s+1: 1, Π ), i = 1, 2,..., n, (35) where Π := Pĝ1:n 1: 1 (X, Y s+1:, U s+1: 1 C ). (36) Moreover, opimal conrol sraegies can be obained by a dynamic program similar o ha of Theorem 3. The above resul is analogous o he resul in [36]. 2) Delayed Sae Sharing Informaion Srucure: Corollary 2 In he delayed sae sharing informaion srucure of secion II-B2, here exis opimal conrol sraegies of he form U i = ĝ(x i s+1:, i U s+1: 1, i Π ), i = 1, 2,..., n, (37) where Π := Pĝ1:n 1: 1 (X s+1:, U s+1: 1 C ). (38) Moreover, opimal conrol sraegies can be obained by a dynamic program similar o ha of Theorem 3. The above resul is analogous o he resul in [36]. 3) Periodic Sharing Informaion Srucure: Corollary 3 In he periodic sharing informaion srucure of secion II-B3, here exis opimal conrol sraegies of he form U i = ĝ i (Y i ks+1:, U i ks+1: 1, Π ), i = 1, 2,..., n, ks < (k + 1)s, (39) where Π := Pĝ1:n 1: 1 (X, Y ks+1:, U ks+1: 1 C ), ks < (k + 1)s. (40) Moreover, opimal conrol sraegies can be obained by a dynamic program similar o ha of Theorem 3.

22 22 The above resul gives a finer dynamic programming decomposiion ha [37]. In [37], he dynamic programming decomposiion is only carried ou a he imes of informaion sharing, = ks, s = 1, 2,... ; and a each sep he parial conrol laws unil he nex sharing insan are chosen. In conras, in he above dynamic program, he parial conrol laws of each sep are chosen sequenially. 4) Conrol Sharing Informaion Srucure: Corollary 4 In he conrol sharing informaion srucure of secion II-B4, here exis opimal conrol sraegies of he form U i = ĝ i (Y i 1:, Π ), i = 1, 2,..., n, (41) where Π := Pĝ1:n 1: 1 (X, Y 1: C ). (42) Moreover, opimal conrol sraegies can be obained by a dynamic program similar o ha of Theorem 3. 5) No Shared Memory wih or wihou finie local memory: Corollary 5 In he informaion srucure of Secion II-B5, here exis opimal conrol sraegies of he form U i = ĝ i (Y i, M i, Π ) (43) where Π = Pĝ1:n 1: 1 (X, Y, M ) (44) Moreover, opimal conrol sraegies can be obained by a dynamic program similar o ha of Theorem 3. Noe ha, since he common informaion is empy, he common informaion sae Π is now an uncondiional probabiliy. In paricular, Π is a consan random variable and akes a fixed value ha depends only on he choice of pas conrol laws. Therefore, we can define an appropriae conrol law g i such ha ĝ i (Y i, M i, Π ) = g i (Y i, M i ), wih probabiliy 1. Hence, he srucural resul of (43) may be simplified o U i = ĝ i (Y i, M i, Π ) = g i (Y i, M i ).

23 23 This resul is redundan since all conrol laws are of he above form. Noneheless, Corollary 5 gives a procedure of finding such conrol laws using he dynamic program of Theorem 3. The above resul is similar o he resuls in [42] for he case of one conroller wih finie memory and o hose in [23] for he case of wo conrollers wih finie memories. IV. SIMPLIFICATIONS AND GENERALIZATIONS A. Simplificaion of he Common Informaion Sae Theorems 2 and 3 idenify he condiional probabiliy disribuion on (X, Y, M ) given C as he common informaion sae for our problem. In he following lemma, we make he simple observaion ha in our model he condiional disribuion on (X, Y, M ) given C is compleely deermined by he condiional disribuion on (X, M ) given C. Lemma 2 For any choice of conrol laws ĝ 1:n 1: 1, define he condiional disribuion on X, M given C as Π new (x, m) := Pĝ1:n 1: 1 (X = x, M = m C ), for all possible realizaions (x, m) of (X, M ). Also define B new := (X M i... M n ). Then, Therefore, Π new Π new (x, m) = y Π (x, y, m). (45) = χ (Π ), where each componen of he B new - valued funcion χ is deermined by he righ hand side of (45). Also, Π (x, y, m) = Π new (x, m)p(y = y X = x), (46) where he second erm on righ hand side of (46) is deermined by he fixed disribuion of he observaions noises. Therefore, Π = ζ (Π new ), where each componen of he B - valued funcion ζ is deermined by he righ hand side of (46). Lemma 2 implies ha he resuls of Theorems 2 and 3 can be wrien in erms of Π new. Theorem 4 (Alernaive Common Informaion Sae) In Problem 1, here exis opimal conrol sraegies of he form U i = ĝ i (Y i, M i, Π new ), i = 1, 2,..., n, (47)

24 24 where Π new := P ĝ1:n 1: 1 (X, M C ). (48) Furher, define he funcions V new : B new R, for = 1,..., T as follows: V new T (π new ) = inf { γ i T F (Yi T Mi T,U i T ),1 i n} E{l(X T, γ 1 T (Y 1 T, M 1 T ),..., γ n T (Y n T, M n T )) Π T = ζ T (π new )}, and for 1 T 1, (49) V new (π new ) = inf { γ i F (Yi Mi,U i ),1 i n} E { l(x, γ 1 (Y 1, M i ),..., γ n (Y n, M n ))+ V new +1 (χ (η (Π, γ 1,..., γ n, Z ))) Π = ζ (π new ) }, (50) where ζ, χ are defined in Lemma 2, and η is defined in (26) and Appendix A. For 1 T and for each π new, an opimal parial conrol law for conroller i is he minimizing choice of γ i in he definiion of V new (π new ). Proof: For any π new B new backward inducion argumen ha V new and any π B, i is sraighforward o esablish using a (π new ) = V (ζ (π new )) and V (π) = V new (χ (π)), where V ( ) is he value funcion from he dynamic program in Theorem 3. The opimaliy of he new dynamic program hen follows from he opimaliy of he dynamic program in Theorem 3. The resul of Theorem 4 is concepually he same as he resuls in Theorems 2 and 3. Theorem 4 implies ha he Corollaries of Secion III-B can be resaed in erms of new informaion saes by simply removing Y from he definiion of original informaion saes. For example, he resul of Corollary 1 for delayed sharing informaion srucure is also rue when Π is replaced by Π new := Pĝ1:n 1: 1 (X, Y s+1: 1, U s+1: 1 C ). (51) This resul is simpler han ha of [36, Theorem 2]. B. Generalizaion of he Model The mehodology described in Secion III relies on he fac ha he shared memory is common informaion among all conrollers. Since he coordinaor in he coordinaed sysem knows only he common informaion, any coordinaion sraegy can be mapped o an equivalen conrol sraegy in he basic model (see Sage 4 of Secion III). In some cases, in addiion o he shared

25 25 memory, he curren observaion (or if he curren observaion is a vecor, some componens of i) may also be commonly available o all conrollers. The general mehodology of Secion 2 can be easily modified o include such cases as well. Consider he model of Secion II-A wih he following modificaions: 1) In addiion o heir curren local observaion, all conrollers have a common observaion a ime. Y com = h com (X, V ) (52) where {V, = 1,..., T } is a sequence of i.i.d. random variables wih probabiliy disribuion Q V which is independen of all oher primiive random variables. 2) The shared memory C a ime is a subse of {Y com 1: 1, Y 1: 1, U 1: 1 }. 3) Each conroller selecs is acion using a conrol law of he form U i = g i (Y i, M i, C, Y com ). (53) 4) Afer aking he conrol acion a ime, conroller i sends a subse Z i of {M i, Y i, U i, Y com } ha necessarily includes Y com. Tha is, Y com Z i {M i, Y i, U i, Y com }. This implies ha he hisory of common observaions is necessarily a par of he shared memory, ha is, Y com 1: 1 C. The res of he model is same as in Secion II-A. In paricular, he local memory updae saisfies (7), so he local memory and shared memory a ime + 1 don overlap. The insananeous cos is given by l(x, U ) and he objecive is o minimize an expeced oal cos given by (8). The argumens of Secion III are also valid for his model. The observaion process in Lemma 1 is now defined as R +1 = {Z, Y com +1 }. The analysis of Secion III leads o srucural resuls and dynamic programming decomposiions analogous o Theorems 2 and 3 wih Π now defined as Π := P g1:n 1: 1 (X, Y, M C, Y com ). (54) Using an argumen similar o Lemma 2, we can show ha he resul of Theorem 4 is rue for he above model wih Π new defined as Π new := Pĝ1:n 1: 1 (X, M C, Y com ). (55)

26 26 C. Examples of he Generalized Model 1) Conrollers wih Idenical Informaion: Consider he following special case of he above generalized model. 1) All conrollers only make he common observaion Y com ; conrollers have no local observaion or local memory. 2) The shared memory a ime is C = Y com 1: 1. Thus, a ime, all conrollers have idenical informaion given as {C, Y com } = Y1: com. 3) Afer aking he acion a ime, each conroller sends Z i = Y com o he shared memory. Recall ha he coordinaor s prescripion Γ i in Secion III are chosen from he se of funcions from Y i M i o U i. Since, in his case Y i = M i =, we inerpre he coordinaor s prescripion as prescribed acions. Tha is, Γ i U i. Wih his inerpreaion, he common informaion sae becomes and he dynamic program of Theorem 3 becomes and for 1 T 1, V (π) = V T (π) = Π := P g1:n 1: 1 (X Y com 1: ) (56) inf E{l(X T, u 1 {u i T U T i ),1 i n} T,..., u n T ) Π T = π}, (57) inf E { l(x, u 1 {u i U T i ),1 i n},..., u n ) + V +1 (η (π, u 1,..., u n, Y+1 com )) Π = π }. (58) Since all he conrollers have idenical informaion, he above resuls correspond o he cenralized dynamic program of Theorem 1 wih a single conroller choosing all he acions. 2) Coupled subsysems wih conrol sharing informaion srucure: Consider he following special case of he above generalized model. 1) The sae of he sysem a ime is a (n+1)-dimensional vecor X = (X 1, X 2,..., X n, X ), where X i, i = 1,..., n corresponds o he local sae of subsysem i, and X sae of he sysem. 2) The sae updae funcion is such ha he global sae evolves according o X +1 = f (X, U, N 0 ), while he local sae of subsysem i evolves according o X i +1 = f i (X i, X, U, N i ), is a global

27 27 where {N 0, = 1,... T },..., {N n, = 1,... T } are muually independen i.i.d noise processes ha are independen of he iniial sae, X 1 = (X 1 1, X 2 1,..., X n 1, X 1). 3) A ime, he common observaion of all conrollers is given by Y com = X. 4) A ime, he local observaion of conroller i is given by Y i = X i, i = 1,..., n. 5) The shared memory a ime is C = {X 1: 1, U 1: 1 }. A each ime, afer aking he acion U i, conroller i sends Z i = {X, U i } o he shared memory. The above special case corresponds o he model of coupled subsysems wih conrol sharing considered in [39], where several applicaions of his model are also presened. I is shown in [39] ha here is no loss of opimaliy in resricing aenion o conrollers wih no local memory, i.e., M =. Wih his addiional resricion, he resul of Theorems 1 and 2 apply for his model wih Π defined as Noe ha Π can be evaluaed from X ha X 1, X 2,..., X n Π := P g1:n 1: 1 (X, X 1,..., X n X 1:, U 1: 1 ). and P g1:n 1: 1 (X 1,..., X n X 1:, U 1: 1 ). I is shown in [39] are condiionally independen given X 1:, U 1: 1, hence he join disribuion P g1:n 1: 1 (X 1,..., X n X 1:, U 1: 1 ) is a produc of is marginal disribuions. 3) Broadcas informaion srucure: Consider he following special case of he above generalized model. 1) The sae of he sysem a ime is a n-dimensional vecor X = (X 1, X 2,..., X n ), where X i, i = 1,..., n corresponds o he local sae of subsysem i. The firs componen i = 1 is special and called he cenral node. Oher componens, i = 2,..., n, are called peripheral nodes. 2) The sae updae funcion is such ha he sae of he cenral node evolves according o X 1 +1 = f 1 (X 1, U 1, N 1 ) while he sae of he peripheral nodes evolves according o X i +1 = f i (X i, X 1, U i, U 1, N i ) where {N i, i = 1, 2,... n; = 1,... } are noise processes ha are independen across ime and independen of each oher. 3) A ime, he common observaion of all conrollers is given by Y com = X 1.

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal