GAME theory is a field of mathematics that studies conflict. Dynamic Potential Games with Constraints: Fundamentals and Applications in Communications

Size: px

Start display at page:

Download "GAME theory is a field of mathematics that studies conflict. Dynamic Potential Games with Constraints: Fundamentals and Applications in Communications"

Abigayle Crawford
5 years ago
Views:

1 1 Dynamc Poenal Games wh Consrans: Fundamenals and Applcaons n Communcaons Sanago Zazo, Member, IEEE, Sergo Valcarcel Macua, Suden Member, IEEE, Malde Sánchez-Fernández, Senor Member, IEEE, Javer Zazo arxv: v2 [cs.sy] 28 Dec 215 Absrac In a noncooperave dynamc game, mulple agens operang n a changng envronmen am o opmze her ules over an nfne me horzon. Tme-varyng envronmens allow o model more realsc scenaros e.g., moble devces equpped wh baeres, wreless communcaons over a fadng channel, ec.. However, solvng a dynamc game s a dffcul ask ha requres dealng wh mulple coupled opmal conrol problems. We focus our analyss on a class of problems, named dynamc poenal games, whose soluon can be found hrough a sngle mulvarae opmal conrol problem. Our analyss generalzes prevous sudes by consderng ha he se of envronmen s saes and he se of players acons are consraned, as s requred by mos of he applcaons. We also show ha he heorecal resuls are he naural exenson of he analyss for sac poenal games. We apply he analyss and provde numercal mehods o solve four key example problems, wh dfferen feaures each: energy demand conrol n a smargrd nework, nework flow opmzaon n whch he relays have bounded lnk capacy and lmed baery lfe, uplnk mulple access communcaon wh users ha have o opmze he use of her baeres, and v wo opmal schedulng games wh nonsaonary channels. Index Terms Dynamc games, dynamc programmng, game heory, mulple access, nework flow, opmal conrol, resource allocaon, schedulng, smar grd. I. INTRODUCTION GAME heory s a feld of mahemacs ha sudes conflc and cooperaon beween nellgen decson makers [1]. I has become a useful ool for modelng communcaon and neworkng problems, such as power conrol and resource sharng see, e.g., [2], wheren he sraeges followed by he users.e., players nfluence each oher, and he acons have o be aken n a decenralzed manner. However, one man assumpon of classc game heory s ha he users operae n a sac envronmen, whch s no nfluenced by he players acons. Ths assumpon s unrealsc n many communcaon and neworkng problems. For nsance, wreless devces have o hroughpu whle facng me-varyng fadng channels, and moble devces may have o conrol her ransmer power whle savng her baery Ths work has been parly funded by he Spansh Mnsry of Economy and Compeveness under he gran TEC C3-1-R, by he Spansh Mnsry of Scence and Innovaon wh he projec ELISA TEC C3-R3 and by an FPU docoral gran o he fourh auhor. S. Zazo, S. Valcarcel Macua and J. Zazo are wh he Sgnals, Sysems & Radocommuncaons Dep., Unversdad Polécnca de Madrd. E-mal: {sanago,sergo}@gaps.ssr.upm.es, javer.zazo.ruz@upm.es. M. Sánchez-Fernández s wh he Sgnal Theory & Communcaons Deparmen, Unversdad Carlos III de Madrd. E-mal: ma@sc.uc3m.es. level. These me-varyng scenaros can be beer modeled by dynamc games. In a noncooperave dynamc game, he players compee n a me-varyng envronmen, whch we assume can be characerzed by a deermnsc dscree-me dynamcal sysem equpped wh a se of saes and a Markovan sae-ranson equaon. Each player has s uly funcon, whch depends on he curren sae of he sysem and he players curren acons. Boh he sae and acon ses are subjec o consrans. Snce he sae-ransons nduce a noon of me-evoluon n he game, we consder he general case wheren ules, sae-ranson funcon and consrans can be nonsaonary. A dynamc game sars a an nal sae. Then, he players ake some acon, based on he curren sae of he game, and receve some uly values. Then, he game moves o anoher sae. Ths sequence of sae-ransons s repeaed a every me sep over a possbly nfne me horzon. We consder he case n whch he am of each player s o fnd he sequence of acons ha s s long erm cumulave uly, gven oher players sequence of acons. Thus, a game can be represened as a se of coupled opmal-conrol-problems OCP, whch are dffcul o solve n general. Forunaely, here s a class of dynamc games, named dynamc poenal games DPG, ha can be solved hrough a sngle mulvaraeopmal-conrol-problem MOCP. The benef of DPG s ha solvng a sngle MOCP s generally smpler han solvng a se of coupled OCP see [3] for a recen survey on DPG. The poneerng work n he feld of DPG s ha of [4], laer exended by [5] and [6]. There have been wo man approaches o sudy DPG: he Euler-Lagrange equaons and he Ponryagn s maxmum or mnmum prncple. Recen analyss by [3] and [7] used he Euler-Lagrange wh DPG n s reduced form, ha s when s possble o solae he acon from he sae-ranson equaon, so ha he acon s expressed as a funcon of he curren and fuure.e., afer ranson saes. Consder, for example, ha he fuure sae s lnear n he curren acon; hen, s easy o nver he sae-ranson funcon and rewre he problem n reduced form, wh he acon expressed as a funcon of he curren and fuure saes. However, n many cases, s no possble o fnd such reduced form of he game.e., we canno solae he acon because he sae-ranson funcon s no nverble e.g., when he sae ranson funcon s quadrac n he acon varable. The more general case of DPG n nonreduced form was suded wh he Ponryagn s maxmum prncple approach by [5] and [8] for dscree and connuous me models, respecvely. However, n all hese sudes [3] [8],

2 2 he games have been analyzed whou explcly consderng consrans for he sae and acon ses. Oher works ha consder poenal games wh saedynamcs nclude [9] [11]. However, hese references sudy he myopc problem n whch he agens am o her mmedae reward. Ths s dfferen from DPG, where he agens am o her long erm uly by solvng a conrol problem. Dynamc games offer wo knds of possble analyss based on he ype of conrol ha players use. These cases are normally referred o as open loop OL and closed loop CL game analyss. In he open loop approach, n order o fnd he opmal acon sequence, he players have o ake no accoun oher players acon sequences. On he oher hand, n a closed loop approach, players fnd a sraegy ha s a funcon of he sae,.e., s a mappng from saes o acons. Thus, n order o fnd her opmal polces, hey need o know he form of oher players polcy funcons. The OL analyss has, n general, more racable analyss han he CL analyss. Indeed, here are only few CL known soluons for smple games, such as he fsh war example presened n [12], olgopolsc Courno games [13], or quadrac games [14]. The man heorecal conrbuon of hs work s o analyze DPG wh consraned acon and sae ses, as s requred by mos of applcaons e.g., n a nework flow problem, he aggregaed hroughpu of mulple users s bounded by he maxmum lnk capacy; or n cognve rado, he aggregaed power of all secondary users s bounded by he maxmum nerference allowed by he prmary users. To do so, we apply he Euler-Lagrange equaon o he Lagrangan as s cusomary n he MOCP leraure [15], raher han o he uly funcon as done by earler works [3] and [7]. Usng he Lagrangan, we can formulae he opmaly condon n he general nonreduced form.e., s no necessary o solae he acon n he ranson equaon. In addon, we esablsh he exsence of a suable conservave vecor feld as an easly verfable condon for a dynamc game o be of he poenal ype. To he bes of our knowledge, hs s a novel exenson of he condons esablshed for sac games by [16] and [17]. The second man conrbuon of hs work s o show ha he proposed framework can be appled o several communcaon and neworkng problems n a unfed manner. We presen four examples wh ncreasng complexy level. Frs, we model he energy demand conrol n a smar grd nework as a lnear-quadrac-dynamc-game LQDG. Ths scenaro s llusrave because he analycal soluon of an LQDG s known. The second example s an opmal nework flow problem, n whch here are wo levels of relay nodes equpped wh fne baeres. The users am o her flow whle opmzng he use of he nodes baeres. Ths problem llusraes ha, when he ules have some separable form, s sraghforward o esablsh ha he problem s a DPG. However, he analycal soluon for hs problem s unknown and we have o solve numercally. I urns ou ha, snce all baeres wll deplee evenually, he game wll ge suck n hs depleon-sae. Hence, we can approxmae he nfnehorzon MOCP by an effecve fne-horzon problem, whch smplfes he numercal compuaon. The hrd example s an uplnk mulple access channel wheren he users devces are also equpped wh baeres hs example was nroduced n he prelmnary paper [18]. Agan, he smple bu more realsc exenson of baery-usage opmzaon makes he game dynamc. In hs example, nsead of rewrng he ules n a separable form, we perform a very general analyss o esablsh ha he problem s a DPG. The fourh example sudes wo decenralzed schedulng problems: proporonal far and equal rae schedulng, where mulple users share a me-varyng channel see he prelmnary paper [19]. Ths example shows how o use he proposed framework n s mos general form. The problems are nonconcave and he ules have a nonobvous separable form. The problem s nonsaonary, wh sae-ranson equaon changng wh me. And here s no reason ha jusfes a fne horzon approxmaon of he problem, so we have o use opmal conrol mehods e.g., dynamc programmng o solve numercally. Oulne: Sec. II nroduces he problem seng, s soluon and he assumpons on whch we base our analyss. In Sec. III, we revew sac poenal games ogeher wh he nsrumenal noon of conservave vecor feld. In Sec. IV, we provde suffcen condons for a dynamc game wh consraned sae and acon ses o be a DPG, and show ha a DPG can be solved hrough and equvalen MOCP. Secons V VIII deal wh applcaon examples, he mehods for solvng hem, and some llusrave smulaons. We provde some conclusons n Sec. IX. II. PROBLEM SETTING Le Q {1,..., Q} denoe he se of players and le X R S denoe he se of saes of he game. Noe ha he dmensonaly of he sae se can be dfferen from he number of players.e., S Q. A every me sep, he saevecor of he game s represened by x x k S X. Every player Q can be nfluenced only by a subse of saes X X. The paron of he sae space X among players s done n he componen doman. We defne X {1,..., S} as he subse of ndexes of sae-vecor componens ha nfluence player, hen x x m m X ndcaes he value of he sae-vecor for player a me. Ths generaly allows for games n whch mulple players are affeced by common componens of he sae vecor e.g., when hey share a common resource, and ncludes he parcular case wheren hey share no componens. We also defne x x l l / X X for he vecor of componens ha do no nfluence player, for some subse X X. Le U R Q denoe he se of acons of all players, and le U R sand for he subse of acons of player, such ha U Q U. The exenson o hgher dmensonal acon ses s sraghforward.e., when U R A, bu we resrc o scalar acons n order o smplfy noaon he general case wll be nroduced when necessary for some of he applcaon examples. We wre u U he acon varable of player a me, such ha he vecor u u 1,..., u Q U conans he acons of all players. We also defne u u 1,..., u 1, u +1,..., u Q U j U j as he

3 3 vecor of all players acons excep ha of player. Hence, by slghly abusng noaon, we can rewre u = u, u. The sae ransons are deermned by f : X U N X, such ha he nonsaonary Markovan dynamc equaon of he game s x +1 = fx, u,, whch can be spl among componens: x k +1 = f k x, u, for k = 1,..., S, such ha f f k S. The dynamc s Markovan because he sae ranson o x +1 depends on he curren sae-acon par x, u, raher han on he whole hsory of sae-acon pars {x, u,... x, u }. We remark ha f corresponds o a nonreduced form, such ha here s no funcon ϕ such ha u = ϕx, x +1,. We nclude a vecor of C nonsaonary consrans g g c C c=1, as s requred by mos applcaons, and defne he ses C {X U} {x, u : gx, u, } {x, u : x +1 = fx, u, }. Each player has s nonsaonary uly funcon π : X U N R, such ha, a every me, each player receves a uly value equal o π x, u, u,. The am of player s o fnd he sequence of acons {u,..., u,...} ha s s long erm cumulave uly, gven oher players sequence of acons {u,..., u,...}. Thus, a dscreeme nfne-horzon noncooperave nonsaonary Markovan dynamc game can be represened as a se of Q coupled opmal conrol problems: G 1 : Q {u } = U s.. = β π x, u, u, x +1 = fx, u,, x gven gx, u, where < β < 1 s he dscoun facor ha bounds he cumulave uly for smplcy, we defne he same β for every player. Noe ha, snce he players can share saevecor componens, he consrans may affec every player s feasble regon. Problem 1 s nfne-horzon because he reward s accumulaed over nfne me seps. The soluon concep of problem 1 n whch we are neresed s he Nash Equlbrum NE of he game, whch s defned as follows. Defnon 1. A soluon of problem 1, known as a Nash Equlbrum NE, s a feasble sequence of acons {u } = ha sasfes he followng condon for every player Q: β π x, u, u, β π x, u, u, = = 1 x, u C 2 We consder he followng assumpons: Assumpon 1. The ules π are wce connuously dfferenable n X U. Assumpon 2. The sae and acon spaces, X and U, are open and convex subses of a real vecor space. Assumpon 3. The sae-ranson funcon f and he consrans g are connuously dfferenable n X U and sasfy some regulary condons. In general, fndng an NE of problem 1 s a dffcul ask because he ules, dynamc equaon and consrans of he ndvdual opmal conrol problems OCP are coupled among players. However, when problem 1 s a DPG, we can solve hrough an equvalen MOCP as opposed o a se of coupled unvarae OCP. We use Assumpons 1 and 2 o oban a verfable condon for problem 1 o be a DPG. Assumpon 3 s requred o nroduce he condons ha guaranee equvalence beween he soluon of he MOCP and an NE of he orgnal DPG. In parcular, snce we derve he KKT opmaly condons for boh problems namely he DPG and he MOCP, some regulary condons such as Slaer s, he lnear ndependence of gradens or he Mangasaran-Fromovz consran qualfcaons are requred o ensure ha he KKT condons hold a he opmal pons and ha feasble dual varables exs see, e.g., [2, Sec ], [21]. Fnally, we nroduce one furher assumpon n Sec. IV o ensure exsence of a soluon o he MOCP and, hence, exsence of an NE of he DPG. Ths equvalence beween DPG and MOCP generalzes he well suded bu smpler case of sac poenal games [16], [17], whch s revewed n he followng secon. III. OVERVIEW OF STATIC POTENTIAL GAMES Sac games are a smplfed verson of dynamc games n he sense ha here are neher saes, nor sysem dynamcs. The am of each player, gven oher players acons u, s o choose an acon u U ha s s uly funcon: G 2 : Q u U π u, u s.. gu where smlar o dynamc games bu removng he medependence subscrp u U refers o he acon of player ; and u = u j j Q:j s he se of acons of he res of agens, such ha u = u, u U denoes he se of all players acons. We assume U R Q o be open and convex. In general, fndng or even characerzng he se of equlbrum pons e.g., n erms of exsence or unqueness of problem 3 s dffcul. Forunaely, here are parcular cases of hs problem for whch he analyss s grealy smplfed. Poenal games s one of hese cases. Defnon 2. Le Assumpons 1 2 hold. Then, problem 3 s called a sac poenal game f here s a funcon Π : U R, named he poenal, ha sasfes he followng condon for every player [17]: π u, u π v, u = Πu, u Πv, u u, v U, Q Under Assumpons 1 2, can be shown see, e.g., [17, Lemma 4.4] ha a necessary and suffcen condon for a sac game o be poenal s he followng: π u u = Πu u, Q 5 We can gan nsgh on poenal games by relang 5 o 3 4

4 4 he concep of conservave vecor feld. The followng lemma wll be useful o hs end. Lemma 1. Le Fu = F 1 u,..., F Q u be a vecor feld wh connuous dervaves defned over an open convex se U R Q. The followng condons on F are equvalen: 1 There exss a scalar poenal funcon Πu such ha Fu = Πu, where s he graden. 2 The paral dervaves sasfy F j u u = F u u j, u U,, j = 1,..., Q 6 3 Le a be a fxed pon of U. For any pecewse smooh pah ξ jonng a wh u, we have Πu = u Fξ dξ. a A vecor feld sasfyng hese condons s called conservave. Proof: See, e.g., [22, Theorems 1.4, 1.5 and 1.9]. Le us defne a vecor feld wh componens he paral dervaves of he players ules: π 1 u Fu u 1,..., πq u u Q Le us rewre 7 more compacly as Fu = Πu so ha Lemma 1.1 holds. Then, we have ha π u u = Πu u, Q. Noe ha hs s exacly condon gven by 5. I follows from Lemma 1.2 ha a necessary, suffcen and also easly verfable condon for problem 3 o be a sac poenal game s gven by: 2 π u u u j = 2 π j u u,, j Q 8 uj Fnally, Lemma 1.3 s useful snce we can fnd he poenal funcon Π by solvng he lne negral of he feld: Πu = 1 π ξ λ, u u 7 dξ λ dλ 9 dλ where ξ ξ s a pecewse smooh pah n U ha Q connecs he nal and fnal condons: ξ = a, ξ1 = u. Once we have found Π, can be seen [16] ha necessary condons for u o be an equlbrum of he game 3 are also necessary condons for he followng opmzaon problem: P 1 : u U Πu s.. gu 1 Indeed, opmzaon heorems concernng exsence and convergence can now be appled o game 3. In parcular, reference [16] showed ha he local maxma of he poenal funcon are a subse of he NE of he game. Furhermore, n he case ha all players ules are quas-concave, he maxmum s unque and concdes wh he sable equlbrum of he game. Ths same approach can be exended o dynamc games. Neverheless, nsead of obanng an analogous opmzaon problem, DPG wll yeld an analogous MOCP. IV. DYNAMIC POTENTIAL GAMES WITH CONSTRAINTS Ths secon nroduces he man heorecal conrbuon of he paper: we esablsh condons under whch we can fnd an NE of problem 1 by solvng an alernave MOCP, nsead of havng o solve he se of coupled nfne horzon OCP wh coupled consrans. Frs, we nroduce he defnon of a DPG and show condons for problem 1 o belong o hs class. Then, we nroduce he alernave MOCP and prove ha s soluon s an NE of he game. Defnon 3. Problem 1 s called a DPG f here s a funcon Π : X U N R, named he poenal, ha sasfes he followng condon for every player Q: β π x, u, u, π x, v, u, = = = β Πx, u, u, Πx, v, u, x X, u, v U 11 Noe ha, alhough he poenal funcon Π s defned for he larger se X U N, he local objecve π s only defned over s local subse X U N. Therefore, we only have o check wheher condon 11 s sasfed n each players subse. The followng hree lemmas gve condons under whch problem 1 s a DPG.e., sasfes Defnon 3. Lemma 2. Problem 1 s a DPG f here exss some funcon Π x, u, ha sasfes π x, u, = Π x, u, x m x m π x, u, = Π x, u, u u m X, Q, =,..., 12 Proof: We smply exend o dynamc games he argumen for sac games due o [16, Prop. 1]. From 12 and Assumpon 1 we have: Π x, u, π x, u, =, m X 13 x m u Π x, u, π x, u, = 14 Ths means ha he dfference beween he poenal and each player s uly depends neher on x m nor u. Thus, we can express hs dfference as Π x, u, u, π x, u, u, = Θx, u, u U 15 for some funcon Θ : X U N R. Snce 15 s sasfed for every u U, we can subrac wo versons of 15 wh acons u and v n U. Then, by arrangng erms and summng over all, we oban 11. Condon 12 s usually dffcul o check n pracce because we do no know Π beforehand. Forunaely, here are cases n whch he player s ules have some separable srucure ha allows us o easly deduce ha he game s of he poenal ype, as s explaned n he followng lemma.

5 5 Lemma 3. Problem 1 s a DPG f he uly funcon of every player Q can be expressed as he sum of a erm ha s common o all players plus anoher erm ha depends neher on s own acon, nor on s own sae-componens: π x, u, u, = Π x, u, u, + Θx, u, 16 Proof: By akng he paral dervave of 16 we oban 12. Therefore, we can apply Lemma 2 see also [16, Prop. 1]. However, posng he uly n he separable srucure 16 may be dffcul. We need a more general framework ha allows us o check wheher problem 3 s a DPG when he player s ules have a nonobvous separable srucure. Ths framework s formally nroduced n he followng lemma. Lemma 4. Problem 1 s a DPG f all players ules sasfy he followng condons,, j Q, m X, n X j: 2 π x, u, x m u j = 2 π j x j, u, x n u 17 2 π x, u, x m x n = 2 π j x j, u, x n x m 18 2 π x, u, u u j = 2 π j x j, u, u j 19 u Proof: Under Assumpon 1, we can nroduce he followng vecor feld: F x 1 π 1 x 1, u,,..., x Qπ Q x Q, u,, π1 x 1, u, u 1,..., πq x Q, u, u Q 2 where x π x π, u, = x,u,. From Lemma 2, we can express 2 as x m m X F = Πx, u, 21 From Assumpon 2 and Lemma 1.1, we know ha F s conservave. Hence, Lemma 1.2 esablshes ha he second paral dervaves mus sasfy Inroduce he followng MOCP: P 2 : {u } = U β Πx, u, = s.. x +1 = fx, u,, x gven gx, u, 22 Le us consder he followng assumpon, whch s needed for esablshng equvalence beween a DPG and he MOCP 22. Assumpon 4. The MOCP 22 has a nonempy soluon se. Suffcen and easly verfable condons o sasfy Assumpon 4 are gven by he followng lemma, whch s a sandard resul n opmal conrol heory. Lemma 5. Le Π : X U N [, be a proper connuous funcon. And le any one of he followng condons hold for = 1,..., : 1 The consran ses C are bounded. 2 Πx, u, as x, u coercve. 3 There exss a scalar M such ha he level ses, defned by {x, u, Πx, u, M} =1, are nonempy and bounded. Then, x X, here exss an opmal sequence of acons {u } = ha s soluon o he MOCP 22. Moreover, here exss an opmal polcy φ : X N U, whch s a mappng from saes o opmal acons, such ha when appled over he sae-rajecory {x } =, provdes an opmal sequence of acons {u φ x, } =. Proof: Snce Π s proper, has some nonempy level se. Snce Π s connuous, s bounded level ses are compac. Hence, we can use [23, Prop ] see, also [23, Secons 1.2 and 3.6] o esablsh exsence of an opmal polcy. The man heorecal resul of hs work s ha we can fnd an NE of a DPG by solvng he MOCP 22. Ths s proved n he followng heorem. Theorem 1. If problem 1 s a DPG, under Assumpons 1 4, he soluon of he MOCP 22 s an NE of 1 when he objecve funcon of he MOCP s gven by 1 π ηλ, u, dη m λ Πx, u, = dλ m X x m + π x, ξλ, u dξ λ dλ dλ 23 where ηλ η k λ S, ξλ ξ λ Q, and η- ξ and η1-ξ1 correspond o he nal and fnal saeacon condons, respecvely. The usefulness of Theorem 1 s ha, n order o fnd an NE of 1, nsead of solvng several coupled conrol problems, we can check wheher 1 s a DPG.e., anyone of Lemmas 2 4 holds. If so, we can fnd an NE by compung he poenal funcon 23 and, hen, by solvng he equvalen MOCP 22. Proof: The proof s srucured n fve seps. Frs, we compue he Euler equaon of he Lagrangan of he dynamc game and derve he KKT opmaly condons. Assumpon 3 s requred o ensure ha he KKT condons hold a he opmal pon and ha here exs feasble dual varables [2, Prop ]. Second, we sudy when he necessary opmaly condons of he game become equal o hose of he MOCP. Thrd, we show ha havng he same necessary opmaly condons s suffcen condon for he dynamc game o be poenal. Fourh, havng esablshed ha he dynamc game s a DPG we show ha he soluon o he MOCP whose exsence s guaraneed by Assumpon 4 s also an NE of he DPG. Fnally, we derve he per sage uly of he MOCP as he poenal funcon of a suable vecor feld. We proceed o explan he deals. Frs, for problem 1, nroduce each player s Lagrangan Q: L x,u, λ, µ = β π x, u, =

6 6 = + λ = f x, u, x +1 + µ g x, u, β Φ x, u,, λ, µ 24 where λ λ k S and µ µ c C are he correspondng vecors of mulplers, and we nroduced he shorhand: c=1 Φ x,u,, λ, µ π x, u, + λ f x, u, x +1 + µ g x, u, 25 The dscree me Euler-Lagrange equaons [15, Sec. 6.1] appled o each player s Lagrangan are gven by: Φ x 1, u 1, 1, λ 1, µ 1 x m + Φ x, u,, λ, µ x m =, m X 26 Φ x 1, u 1, 1, λ 1, µ 1 u + Φ x, u,, λ, µ = 27 u Acually, noe ha are he Euler-Lagrange equaons n a more general form han he sandard reduced form. As menoned n Sec. II see also, e.g., [15, Sec. 6.1], [3], n he sandard reduced form, he curren acon can be posed as a funcon of he curren and fuure saes: u = ϕx, x +1,, for some funcon ϕ : X X N U. The reason why we nroduced hs general form of he Euler-Lagrange equaons s ha such funcon ϕ may no exs for an arbrary saeranson funcon f. By subsung 25 no 26 27, and addng he correspondng consrans, we oban he KKT condons of he game for every player Q, he saecomponens m X, and all exra consrans: π x, u, S x m + λ k f k x, u, x m π x, u, u + + C c=1 S µ c g c x, u, x m λ m 1 = 28 λ k f k x, u, u C + µ c g c x, u, u = 29 c=1 x +1 = f x, u,, g x, u, 3 µ, µ g x, u, = 31 Second, we fnd he KKT condons of he MOCP. To do so, we oban he Lagrangan of 22: L Π x,u, γ, δ = β Π x, u, = + γ f x, u, x +1 + δ g x, u, 32 where γ γ k S and δ δ c C c=1 are he correspondng mulplers. Agan, from 32 we derve he Euler-Lagrange equaons, whch, ogeher wh he correspondng consrans, yeld he KKT sysem of opmaly condons for all saecomponens, m = 1,..., S, and all acons, = 1,..., Q: Π x, u, x m Π x, u, u + + S + S γ k C c=1 γ k f k x, u, x m δ c g c x, u, x m f k x, u, u γ m 1 = 33 C + δ c g c x, u, u = 34 c=1 x +1 = f x, u,, g x, u, 35 δ, δ g x, u, = 36 In order for he MOCP 22 o have he same opmaly condons as he game 1, by comparng wh 33 36, we conclude ha he followng condons mus be sasfed Q: π x, u, x m = Π x, u, x m, m X 37 π x, u, u = Π x, u, u 38 λ = γ, µ = δ 39 Thrd, when condons are sasfed, Lemma 2 saes ha problem 1 s a DPG. Fourh, noe ha condon 39 represens a feasble pon of he game. The reason s ha f here exss an opmal prmal varable, hen he exsence of dual varables n he MOCP s guaraneed by suable regulary condons. Snce he exsence of opmal prmal varables of he MOCP s ensured by Assumpon 4, he regulary condons esablshed by Assumpon 3 guaranee ha here exs some γ and δ ha sasfy he KKT condons of he MOCP. Subsung hese dual varables of he MOCP n place of he ndvdual λ and µ n for every Q, resuls n a sysem of equaons where he only unknowns are he user sraeges. Ths sysem has exacly he same srucure as he one already presened for he MOCP n he prmal varables. Therefore, he MOCP prmal soluon also sasfes he KKT condons of he DPG. Indeed, s sraghforward o see ha an opmal soluon of he MOCP s also an NE of he game. Le {u } = denoe he MOCP soluon, so ha sasfes he followng nequaly u U : β Πx, u, u, β Πx, u, u, 4 = = From Defnon 3, we conclude ha he MOCP opmal soluon s also an NE of game 1. The oppose may no be rue n general. Indeed, hs soluon, n whch dual varables are shared beween players, s only a subclass of he possble NE of he game. Neverheless, oher NE ha do no share hs

7 7 propery have been referred o as unsable by [16] for sac games. Ffh, alhough we have shown ha we can fnd an NE of he DPG by solvng a MOCP, we sll need o fnd he objecve of he MOCP. In order o fnd Π, we deduce from 37, 38, 2 and 21 ha he vecor feld 2 can be expressed as F Π x, u, 41 Lemma 1 esablshes ha F s conservave. Thus, he objecve of he MOCP s he poenal of he feld, whch can be compued hrough he lne negral 23. In he nex secons, we show how o apply hs mehodology of solvng DPG hrough an equvalen MOCP o dfferen praccal problems. V. ENERGY DEMAND IN THE SMART GRID AS A LINEAR QUADRATIC DYNAMIC GAME Our frs example consss n a lnear-quadrac-dynamcgame LQDG ha solves a smar grd resource allocaon problem. LQDG are convenen because hey are amenable o analycal and closed form soluons [24, Ch. 6]. Our analyss s novel hough. To he bes of our knowledge, LQDG have no been suded under he easer DPG framework before. A. Energy demand conrol DPG and equvalen MOCP Consder a communy of Q users.e., players ha use he smar grd resources n dfferen acves lke communcaons, heang, lghng, home applances or producon needs. Suppose ha he elecrcal grd has S ypes of energy resources such as rechargeable baeres, coal, fuel, hydroelecrc power or bomass. The sae of he game x R S s he oal amoun of overall resources n he smar grd a me. All players share all componens of he sae-vecor.e., X = X and X = {1,..., S}, Q. The amoun of resources consumed or conrbued by player a me s denoed by he acon vecor u R A, where A s he number of acves. The expendure and conrbuon of each player s weghed by marx B R S A. Also, resources can be auonomously recharged/depleed, whch s modeled by a shared marx C R S S. Thus, he sae ranson of he sysem s f = Cx + Q B u. We consder wo cos erms: unsasfed demand and unbalanced resources. Gven he avalable resources x, every player wll have a arge demand D x ha wans o sasfy, for some demand marx D R A S. The dsuly from an unsasfed demand s modelled by he quadrac form D x u Q D x u, wh demand cos marx Q R A A. In addon, he avalable resources should be jus enough o sasfy he demand. There s a cos for havng oo lle e.g., producvy decrease or oo much e.g., sorage coss resources. Ths cos can be modeled as anoher quadrac form: x x 1 R x x 1, wh unbalanced resources cos marx R R S S. In order o pose he game as a maxmzaon problem, we assume {Q } Q o be negave defne marces, and R a negave semdefne marx hs s represened by Q and R. The dynamc energy demand conrol game s gven by he followng coupled opmal conrol problems: β x {u } x 1 R x x 1 = U = G 3 : + D x u Q D x u 42 Q s.. x +1 = Cx + B u, x gven By defnng augmened sae and acon vecors: x [ x, x 1, ], ũ D x u 43 we can rewre 42 n he sandard lnear-quadrac form: x β R x {u } = U + ũ Q ũ = 44 s.. x +1 = A x B ũ, x gven where A R [ C + Q B D I S [ ] R R R R S S S S ], B [ B S A ] and where I S and S S denoe he deny and null marces of sze S S, respecvely. LQDG games n he form 44 have been presened n [24, Ch. 6], where an NE s found by solvng he sysem of coupled fne horzon OCP, fndng he lm of hs soluon as he horzon ends o nfny, and hen verfyng ha hs lmng soluon provdes a NE soluon for he nfne-horzon game. Here we follow a dfferen and smpler approach. Frs, we show ha problem 44 can be expressed n he separable form 16: π x,ũ = x R x + ũ Q ũ = x R x + ũ p Q p ũ p p Q j Q:j ũ j Q j ũ j 47 We denfy he poenal and separable funcons n 47: Π x, ũ, = x Θũ, = R x + p Q j Q:j ũ p Q p ũ p 48 ũ j Q j ũ j 49 From Lemma 3, we conclude ha problem 42 s a DPG. Noe also ha Assumpons 1 2 hold. Moreover, he objecve n 44 s concave and he sae dynamcs whch s he only equaly consran are lnear. Therefore, Slaer s consran qualfcaon s sasfed and Assumpon 3 holds. In addon, he marces Q, Q, and R make he poenal 49 coercve. Hence, Lemma 5 saes ha Assumpon 4 s sasfed. Snce Assumpons 1 4 hold, Theorem 1 esablshes ha we can fnd an NE of 42 by solvng an equvalen

8 8 MOCP: {u V x β x R x } = U = + ũ P 3 : p Q p ũ p 5 p Q s.. x +1 = A x B ũ, x gven where he cumulave objecve funcon V s known as value funcon n he opmal conrol leraure see, e.g., [23]. Le ũ ũ Q be he vecor of all players augmened acons. Aggregae all players demand marces n a block dagonal marx Q dag Q 1,..., Q Q of sze Q A Q A, and aggregae all players expendure weghng marces n a S Q A hck marx B B1,..., B Q. Then, we can rewre he value and ranson funcons as follows: V x = β ũ Qũ + x R x 51 = x +1 = A x Bũ 52 B. Analycal soluon o he MOCP and smulaon resuls I s well known ha he value funcon sasfes a recursve relaonshp, known as Bellman equaon see, e.g., [23]: V x = β ũ Qũ + x R x + β +1 V x Moreover, for an LQ conrol problem, s known [24, Ch. 6] ha he opmal value funcon can be expressed as a quadrac form of he sae: V x = x P x 54 for some negave semdefne marx P. We can use 54 o fnd a closed form expresson for he sequence of opmal acons as follows. Expand 52 and 54 no 53: V x = β ũ Qũ + x R x + β +1 A x Bũ P A x Bũ 55 Now, we jus have o 55 over ũ. Snce Q and P are negave defne and semdefne marces, respecvely, a necessary and suffcen condon for he maxmum s ũ V x = β Qũ β +1 B P A x Bũ = 56 From 56, we oban an analycal expresson for he opmal acon a any me sep: ũ = β Q + βb PB 1 B PA x 57 If we are also neresed n fndng he opmal value, we can expand 57 no 55 and solae P: P = R + βa PA β 2 A PB Q + βb PB 1 B PA 58 Noe ha 58 s a dscree algebrac Rcca equaon, whch s known o be a conracon mappng f Q, R and he specral radus of A s smaller han one [25, Ch. 5] he analyss can be performed under weaker condons hough [23], [26]. When 58 s a conracon, has a unque soluon P ha can be approxmaed by erang he followng fxed pon equaon, such ha lm n P n = P : P n+1 = R + βa P n A β 2 A P n B Q + βb P n B 1 B P n A 59 We have smulaed he smar grd model for Q = 8 players, S = 4 resources, A = 6 acves for every player, random negave defne marces Q, Q, and random negave semdefne marx R o buld hese negave marces we buld an nermedae marx, e.g., R n, by drawng random numbers from a unform dsrbuon, wh suppor [, 1] for Q and [, 5] for R, and compue R = R n R n. Marces C, B and D are also random wh elemens drawn from he sphercal normal dsrbuon. Fnally, he nal sae was se o a vecor of ones, and dscoun facor β =.9. Fgure 1-Top shows he nsan ules per player over me. Recall ha he ules have been defned as negave coss. Therefore, each player s uly sars beng a negave value and converges o zero wh me. Ths behavour llusraes ha all players aan an NE n whch hey are able o sasfy her demand as well as o hold jus enough avalable resources. Fgure 1-Boom shows he evoluon of he par of he cos correspondng o he ndvdual coeffcens ũ = D x u. These coeffcens represen he msmach among arge demand, D x, and he acual player acves u. We can see ha he agens adjus her acons u o sasfy he arge demand. The equlbrum beween arge demand and players acves s an expeced consequence of he sably of he LQ game n nfne horzon [24, Ch. 6]. Ules Decson Coeffcens User 1 User 2 User 3 User 4 User 5 User 6 User 7 User Tme Tme Fg. 1. Dynamc smar grd scenaro wh Q = 8 players. Top Insan uly values of players. Boom Players decson coeffcens evoluon n me.

9 9 VI. NETWORK FLOW CONTROL: INFINITE HORIZON APPROXIMATED BY A FINITE HORIZON DYNAMIC GAME Several works see, e.g., [27] [3] have consdered nework flow conrol as an opmzaon problem wheren each source s characerzed by a uly funcon ha depends on he ransmsson rae, and he goal s o he aggregaed uly. We generalze he sandard model by consderng ha he nodes are equpped wh baeres ha are depleed proporonally o he ougong flow. In addon we consder several layers of relay nodes, each one wh mulple lnks, so here are several pahs beween source and desnaon. When he baeres are compleely depleed, no more ransmssons are allowed and he game s over. Hence, alhough we formulae he problem as an nfne horzon dynamc game, he effecve me horzon before he baeres deplee s fne. Ths problem has no known analycal soluon, bu he ules are concave. Therefore, he fne horzon approxmaon s convenen because we can solve an equvalen concave opmzaon problem, sgnfcanly reducng he compuaonal load wh respec o oher opmal conrol algorhms e.g., dynamc programmng. A. Nework flow conrol dynamc game and equvalen MOCP Le u a denoe he flow along pah a for user a me. Suppose here are A possble pahs for each player Q, A so ha u u a denoes he -h player s acon vecor. a=1 Le A = Q A denoe he oal number of avalable pahs. Suppose here are S relay nodes. Le x k denoe he baery level of relay node k. The sae of he game s gven by x x k S, such ha all players share all componens of he sae-vecor.e., X = X and X = {1,..., S}, Q. The baery level evolves wh he followng sae-ranson equaon for all componens k = 1,..., S: x k +1 = x k δ u a, x k = Bmax k 6 u a F k where F k denoes he subse of flows hrough node k, B k max s a posve scalar ha sands for he maxmum baery level of node k, and δ s a proporonal facor. Smlar o he sandard sac flow conrol problem, each player nends o a concave funcon Γ : U R of he sum of raes across all avalable pahs. Ths funcon Γ can ake dfferen forms dependng on he scenaro under sudy, lke he square roo [31] or a capacy form. In addon o he ransmsson rae, we nclude he relay nodes baery level n each player s uly, weghed by some posve parameer α. The combnaon of hese wo objecves can be undersood as he player amng o s oal ransmsson rae, whle savng he baeres of he relays. There s some capacy consran of he maxmum aggregaed rae a every relay and desnaon node. Le c max R L denoe he vecor wh maxmum capaces, where L s he number of relays plus desnaon nodes. Le M = [m la ] denoe he L A marx ha defne he aggregaed flows for each relay and desnaon node, such ha elemen m la = 1, f flow node a s aggregaed n node l, and m la = oherwse. The dynamc nework flow conrol game s gven by he followng se of coupled OCP: A S β Γ u a {u } = U + α x k G 4 : Q s.. = x k +1 = x k δ a=1 u a u a F k x k = B k max, x k B k max Mu c max, u a k = 1,..., S, a = 1,..., A 61 Noe ha each player s uly can be expressed n separable form: A S π x, u, Γ u a + α = A Γ Q a=1 u a a=1 + α x k S x k j Q:j A j Γ u ja a=1 62 Therefore, Lemma 3 esablshes ha problem 61 s a DPG, wh poenal funcon gven by: Πx, u, = A S Γ u a + α x k 63 Q a=1 Before applyng Theorem 1, we have o check wheher Assumpons 1 4 are sasfed. We follow [31] and choose Γ ɛ + where ɛ > s only added o avod dfferenably ssues when u a =. Le X and U be open convex ses conanng he Caresan producs of nervals [, Bmax] k and [,, respecvely. I follows ha Assumpons 1 2 hold. Moreover, snce Γ s concave and problem 61 has lnear equaly consrans and concave nequaly consrans, Slaer s condon holds,.e., Assumpon 3 s sasfed. Fnally, snce he consran se n 61 s compac, Lemma 5.1 saes ha Assumpon 4 holds. Hence, Theorem 1 esablshes ha we can fnd an NE of 61 by solvng he followng MOCP: P 4 : {u } = U s.. = β Γ Q x k +1 = x k δ Q A u a a=1 + α u a u a F k x k = B k max, x k B k max Mu c max, u k = 1,..., S S B. Fne horzon approxmaon and smulaon resuls x k 64 As opposed o he LQ smar-grd problem, here s no known closed form soluon for problem 64. Thus, we have o rely on numercal mehods o solve he MOCP. Suppose ha we se he wegh parameer α n Π low enough o ncenvze some posve ransmsson. Evenually, he nodes

10 1 baeres wll be depleed, so he sysem wll ge suck n an equlbrum sae, wh no furher sae ransons. Thus, we can approxmae he nfne-horzon problem 64 as a fne-horzon problem, wh horzon bounded by he mesep a whch all baeres have been depleed. Moreover, n our seng, we have assumed Γ o be concave. Therefore, we can effecvely solve 64 wh convex opmzaon solvers we use he sofware descrbed n [32]. The benef of usng a convex opmzaon solver s ha sandard opmal conrol algorhms are compuaonally demandng when he sae and acon spaces are subses of vecor spaces. For our numercal expermen, we consder Q = 2 players ha share a nework of S = 4 relay nodes, organzed n wo layers see Fgure 2. In hs parcular seng, each player s allowed o use four pahs, A 1 = A 2 = 4. The connecvy marx M can be obaned from Fgure 2. The baery s nalzed o B max = 1 for he four relay nodes, we se he depleng facor δ =.5, dscoun facor β =.9, he wegh α = 1, ɛ =.1 and he vecor of maxmum capaces c max = [.5,.15,.5,.15,.4,.4]. u 11 u 12 u 13 u 14 S 1 N 31 N 42 N 31 N 24 D 1 S 2 D 2 u 21 u 22 u 23 u 24 Fg. 2. Nework scenaro for wo users and wo levels of relyng nodes. Player S 1 ams o ransm o desnaon D 1, whle S 2 ams o ransm o desnaon D 2. They have o share relay nodes N 1,..., N 4. We denoe he L = 6 aggregaed flows as L 1,..., L 6. Fgure 3 shows he evoluon of he L = 6 aggregaed flows, he A = 8 flows and he baery of each of he N = 4 relay nodes. Snce we have ncluded he baery level of he relay nodes n he users ules.e., α >, he users have an exra ncenve o lm her flow rae. Thus, here are wo effecve reasons o lm he flow rae: sasfy he problem consrans and save baery. We can see ha he aggregaed flows wh hgher maxmum capacy are no sauraed L 1 <.5, L 3 <.5, L 4 <.4, and L 6 <.4. The reason s ha he users have lmed her ndvdual flow raes n order o save relays baeres. On he oher hand, he aggregaed flows wh lower maxmum capacy are sauraed L 2 = L 4 =.15 because he capacy consran s more resrcve han he self-lmaon ncenve. When he baeres of he nodes wh hgher maxmum capacy N 1, N 3 are depleed around = 7, he flows hrough hese nodes sop. Ths allows he oher flows u 14, u 24 o ransm a a hgher rae. A hs me, he capacy consran n L 2, L 4 s more resrcve han he self-lmaon ncenve for savng he baeres, so ha he users ransm a he maxmum rae allowed by he capacy consrans noe ha L 2 = L 4 =.15 remans consan. When he baery of every node s depleed, none of he users s allowed o ransm anymore and he sysem eners n an equlbrum sae. We remark ha he soluon obaned s an NE based on an OL game analyss. Fnally, he resuls shown n Fgure 3 have been obaned wh a cenralzed convex opmzaon algorhm, meanng ha should be run off-lne by he sysem desgner, before deployng he real sysem. Alernavely, we could have used he dsrbued algorhms proposed by reference [33], enablng he players o solve he fne horzon approxmaon of problem 64 n a decenralzed manner, even wh he coupled capacy consrans. Aggregaed flow rae Indvdual flow rae Baery level L 1 L 2 L 3 L 4 L 5 L 6 u 11 u 12 u 13 u 14 u 21 u 22 u 23 u 24 Node 1 Node 2 Node 3 Node Tme Fg. 3. Nework flow conrol wh Q = 2 players, S = 4 relay nodes and A 1 = A 2 = 4 avalable pahs per node. Top Aggregaed flow raes a L 1,..., L 6. Mddle Flow for each of he A = 8 avalable pahs. Boom Baery level n each of he S = 4 relay nodes. VII. DYNAMIC MULTIPLE ACCESS CHANNEL: NONSEPARABLE UTILITIES In hs secon, we consder an uplnk scenaro n whch every user Q ndependenly chooses s ransmer power, u, amng o acheve he maxmum rae allowed by he channel [18]. If mulple users ransm a he same me, hey wll nerfere each oher, whch wll decrease her rae, so ha hey have o fnd an equlbrum. Le R denoe he rae acheved by user wh normalzed nose a me : h 2 R u log j Q:j hj 2 u j 65

11 11 where h denoes he fadng channel coeffcen of user. A. Mulple access channel DPG and equvalen MOCP Le x [, Bmax] denoe he baery level for each player Q, whch s dscharged proporonally o he ransmed power u. The sae of he sysem s gven by he vecor wh all ndvdual baery levels: x = x X. Thus, each Q player s only affeced by s own baery, such ha S = Q, X = {} and x = x. Suppose he agens am o s ransmsson rae, whle also savng her baery. Ths scenaro yelds he followng dynamc game: G 5 : Q {u } = U = β R + αx s.. x +1 = x δu, x = B max u P max, x B max 66 where α s he wegh gven for savng he baery, δ s he dschargng facor, and P max and B max denoe he maxmum ransmer power and maxmum avalable baery level for node, respecvely. Problem 66 s a dynamc nfne-horzon exenson of he sac problem proposed n [34]. Insead of lookng for a separable srucure n he players ules, we show ha Lemma 4 holds and, hence, problem 66 s a DPG: 2 π x, u, x u j 2 π x, u, x x j 2 π x, u, u u j = 2 π j x, u, x j u = 2 π j x, u, x j x = 2 π j x, u, u j u = 67 = 68 h 2 h j 2 = p Q hp 2 u p 69 In order o fnd an equvalen MOCP, le us defne X and U as open convex ses conanng he closed nervals [, Bmax] and [, Pmax], respecvely, so ha Assumpons 1 2 hold. Derve he poenal funcon from 23: Πx, u, = log 1 + h 2 u + α x 7 Snce 7 s concave and all equaly and nequaly consrans n 66 are lnear, Assumpon 3 s sasfed hrough Slaer s condon. Moreover, snce he consran se s compac and he poenal s connuous, Lemma 5.1 esablshes ha Assumpon 4 holds. Therefore, Theorem 1 saes ha we can fnd an NE of 66 by solvng he followng MOCP: P 5 : {u } = U β log = + α 1 + x h 2 u s.. x +1 = x δu, x = B max u P max, x B max Q 71 B. Smulaon resuls Smlar o Sec. VI-B, he sysem reaches an equlbrum sae when he baeres have been depleed. Thus, he soluon can be approxmaed by solvng a fne horzon problem. Moreover, snce he problem s concave, we can use convex opmzaon sofware, lke [32]. Alernavely, we could solve he KKT sysem wh an effcen ad-hoc dsrbued algorhm, lke n [18]. We smulaed an scenaro wh Q = 4 users. We se he maxmum baery level B max = 33 for all users, he maxmum power allowed per user P max = 5 for all users, he wegh baery uly facor α =.1, he ransmer power baery depleon facor δ = 1, and he dscoun facor β =.95. The channel gans are h 1 = h 2 = 1.2 h 3 =.514, and h 4 =.38. Fgure 4 shows appealng resuls: he soluon of he MOCP whch s an NE of he game s acually a schedule. In oher words, nsead of creang nerference among users, hey wa unl he users wh hgher channel-gan have depleed her baeres. Power Rae User 1 User 2 User 3 User Tme Tme Fg. 4. Dynamc mulple access scenaro wh Q = 4 users. Top Sequence of ransmer power chosen by every user. Boom Evoluon of he ransmsson raes. VIII. OPTIMAL SCHEDULING: NONSTATIONARY PROBLEM WITH DYNAMIC PROGRAMMING SOLUTION In hs secon we presen he mos general form of he proposed framework, and show s applcably o wo schedulng problems. Frs, one of he games has nonseparable ules, so we have o verfy second order condons Second, neher he equvalen MOCP can be approxmaed by a fne horzon problem, nor he ules are concave. Thus, we canno rely upon convex opmzaon sofware and we have o use opmal conrol mehods, lke dynamc programmng [23]. Fnally, we consder a nonsaonary scenaro, n whch he channel coeffcens evolve wh me. Ths makes he saeranson equaons and he uly for he equal rae problem depend no only on he curren sae, bu also on me. Ths problem was nroduced n he prelmnary paper [19].

12 12 A. Proporonal far and equal rae schedulng games and her equvalen MOCP Le us redefne he rae acheved by user a me, so ha we consder nonsaonary channel coeffcens: R h log u 1 + j Q:j h j 2 72 u j where u s he ransmer power of player, and h s s me-varyng channel coeffcen. We propose wo dfferen schedulng games, namely, proporonal far and equal rae schedulng. 1 Proporonal far schedulng: Proporonal far s a compromse-based schedulng algorhm. I ams o manan a balance beween wo compeng neress: ryng o oal hroughpu whle, a he same me, guaraneeng a mnmal level of servce for all users [35] [37]. In order o acheve hs radeoff, we propose he followng game: G 6 : Q {u } = U = β x s.. x +1 = 1 1 x + R x =, u P max 73 where he sae of he sysem s he vecor of all players average raes x = x. Snce each player ams o Q s own average rae, he sae-componens are unshared among players: S = Q and X = {}. In order o show ha problem 73 s a DPG, we evaluae Lemma 3 wh posve resul, and oban Π from 16: Πx, u, = x 74 Now, we show ha we can derve an equvalen MOCP. I s clear ha Assumpons 1 2 hold. By akng he graden of he consrans of 73 and buldng a marx wh he gradens of he consrans.e., he graden of each consran s a column of hs marx, s sraghforward o show ha he marx s full rank. Hence, he lnear ndependence consran qualfcaon holds see, e.g., [2, Sec ], [21], meanng ha Assumpon 3 s sasfed. Fnally, snce R and x =, we conclude ha here exss some scalar M such ha he level se {x Q x M} s nonempy and bounded, so ha Lemma 5.3 esablshes ha Assumpon 4 s sasfed. Thus, from Theorem 1, we can fnd an NE of DPG 73 by solvng he followng MOCP: P 6 : {u } = U Q β x = s.. x +1 = x =, 1 1 x + R u P max 75 2 Equal rae schedulng: In hs problem, he am of each user s o s rae, whle a he same me keepng he users cumulave raes as close as possble. Le x denoe he cumulave rae of user. The sae of he sysem s he vecor of all users cumulave rae x = x. Agan S = Q and Q X = {}. Ths problem s modeled by he followng game: β 1 αr {u } = U = G 7 : α 2 x x j 76 Q s.. j Q:j x +1 = x + R x =, u P max where parameer α weghs he conrbuon of boh erms. I s easy o verfy ha condons are sasfed. Hence, from Lemma 4, we know ha problem 76 s a DPG. In order o oban an equvalen MOCP, le us defne X and U as open convex ses ha conan he nervals [, and [, Pmax], respecvely. I follows ha Assumpons 1 2 hold. Smlar o he proporonal far schedulng problem 73, Assumpon 3 holds hrough he lnear ndependence consran qualfcaon. Fnally, le us check Assumpon 4 as follows. Derve he poenal Π by negrang 23: Πx, u, = 1 α log 1 + h 2 u Q 1 α j=+1 x x j 2 77 We dsngush wo exreme cases: all players have exacly he same rae.e., x = x j,, j = 1,..., Q; and each player s rae s dfferen from any oher player s rae.e., x x j, j. When all players have exacly he same rae, he erms x x j 2 vansh for all, j pars, and 77 only depends on he acons he sae becomes rrelevan. Snce he acon consran se s compac, exsence of soluon s guaraneed by Lemma 5.1. When each player s rae s dfferen from any oher player s rae, he erm x x j 2 s coercve, so ha 77 becomes coercve oo snce he consran acon se s compac, he erm dependng on u s bounded. Thus, exsence of opmal soluon s guaraneed by Lemma 5.2. Fnally, he case where some player s rae are equal and some are dfferen s a combnaon of he wo cases already menoned. so ha he equal erms vansh and he dfferen erms make 77 coercve. Hence, Theorem 1 saes ha we can fnd an NE of DPG 76 by solvng he followng MOCP: P 7 : {u } = U β 1 α log = Q 1 α j= h 2 u 2 x x j s.. x +1 = x + R, x = u P max 78

13 13 B. Solvng he MOCP wh dynamc programmng and smulaon resuls Alhough Lemma 5 esablshes exsence of opmal soluon o hese MOCP, hese problems are nonconcave and canno be approxmaed by fne horzon problems. Thus, we canno rely on effcen convex opmzaon sofware. In order o numercally solve hese problems, we can use dynamc programmng mehods [23]. Sandard dynamc programmng mehods assume ha he MOCP s saonary. One sandard mehod o cope wh nonsaonary MOCP s o augmen he sae space so ha ncludes he me as an exra dmenson for some me lengh T. Le he augmened sae-vecor a me be denoed by x = x, X X {,..., T }. The sae-ranson equaon n he augmened sae space becomes f : X U X. Snce we are acklng an nfne horzon problem, when augmenng he sae space wh he me dmenson, s convenen o mpose a perodc me varaon: [ ] fx, u, f x, u f < T or f = T Oherwse, could be dffcul o apply compuaonal dynamc programmng mehods. One furher dffculy for solvng MOCP wh connuous sae and acon spaces s ha dynamc programmng mehods are manly derved for dscree sae-acon spaces. Two common approaches o overcome hs lmaon are o use a paramerc approxmaon of he value funcon e.g., consder a neural nework wh npus he connuous sae acon varables ha s raned by mnmzng he error n he Bellman equaon; or o dscreze he connuous spaces, so he value funcon s approxmaed n a se of pons. For smplcy, we follow he dscrezaon approach here. We remark ha may be problemac o fnely dscreze he saeacon spaces n hgh-dmensonal problems hough, snce he compuaonal load ncreases exponenally wh he number of saes. These and oher approxmaon echnques, usually known as approxmae dynamc programmng, are sll an acve area of research see, e.g., [23, Ch. 6], [38]. Inroduce he opmal value funcon for he augmened se: V x max {u β Π x, u, } = U = = β Π x, φ x,, = = β Π x, u, 8 = where φ : X U s he opmal polcy ha provdes he sequence of acons {u φ x } = ha s he soluon o he MOCP, as explaned by Lemma 5. Then, he Bellman opmaly equaon s gven by V x = Π x, u + βv f x, u 81 Among he avalable dynamc programmng mehods, we choose value eraon VI for s reduced complexy per eraon wh respec o polcy eraon PI, whch s especally relevan when he sae-grd has fne resoluon.e., large number of saes. VI s obaned by urnng he Bellman opmaly equaon 81 no an updae rule, so ha generaes a sequence of value funcons V k ha converge o he opmal value.e., lm k V k = V, where V s arbrary. In parcular, a every eraon k, we oban he polcy φ ha s V k polcy mprovemen. Then, we updae he value funcon V k+1 for he laes polcy polcy evaluaon. VI s summarzed n Algorhm 1, where he operaor x denoes he closes pon o x n he dscree grd. Algorhm 1: Value Ieraon for he non-saonary MOCP Inpus: number of saes S, hreshold ɛ Dscreze he augmened space X no a grd of S saes Inalze =, k = and V x s = for s = 1... S whle > ɛ for every sae s = 1 o S do x s he s-h pon on he grd φ x s = arg max u Π x s, u + βv k f x s, u V k+1 x s = Π x s, φ x s + βv k f x s, φ x s end for k = k + 1 = max s V k+1 x s V k x s end whle Reurn: φ x s and V k+1 x s for s = 1,..., S Noe ha he oupu of he value eraon algorhm s a polcy.e., a funcon, raher han a sequence of acons. Ths resul allows o compue he opmal acons of every player from he curren sae a every me-sep of he game. When here s no reason o propose a fne-horzon approxmaon of he game, a polcy s a more praccal represenaon of he soluon han an nfne sequence of acons. We smulae a smple scenaro wh Q = 2 users. The channel coeffcens are snusods wh dfferen frequency and dfferen amplude for each user see Fg. 5. The maxmum ransmer power s P 1 max = P 2 max = 5, wh 2 possble power levels per user, whch amouns o 4 possble acons. We dscreze he sae-space.e., he users raes no a grd of 3 pons per user. The nonsaonary of he envronmen s surmouned by augmenng he sae-space wh T = 2 me seps. Hence, he augmened sae space has a oal of = 18. saes. For he equal-rae problem, he uly funcon uses α =.9. The soluon of he proporonal far game leads o an effcen scheduler see Fgure 6, n whch boh users ry o mnmze nerference so ha hey approach her respecve maxmum raes. For he equal rae problem, we observe ha he agens acheve much lower rae, bu very smlar beween hem see Fgure 7. The rend s ha he user wh a channel wh less gan User 2, red-dashed lne res o acheve s maxmum rae, whle he user wh hgher gan channel User 1, blueconnuous lne reduces s ransmer power o mach he rae of he oher user. In oher words, he user wh poores channel ses a boleneck for he oher user.

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005 Dynamc Team Decson Theory EECS 558 Proec Shruvandana Sharma and Davd Shuman December 0, 005 Oulne Inroducon o Team Decson Theory Decomposon of he Dynamc Team Decson Problem Equvalence of Sac and Dynamc