arxiv: v1 [cs.gt] 15 Jan 2019

Size: px
Start display at page:

Download "arxiv: v1 [cs.gt] 15 Jan 2019"

Transcription

1 Model and algorthm for tme-content rk-aware Markov game Wenje Huang, Pham Vet Ha and Wllam B. Hakell January 16, 2019 arxv: v1 [c.gt] 15 Jan 2019 Abtract In th paper, we propoe a model for non-cooperatve Markov game wth tme-content rk-aware player. In partcular, our model characterze the rk arng from both the tochatc tate tranton and the randomzed tratege of the other player. We gve an approprate equlbrum concept for our rk-aware Markov game model and we demontrate the extence of uch equlbra n tatonary tratege. We then propoe and analyze a mulaton-baed Q-learnng type algorthm for equlbrum computaton, and work through the detal for ome pecfc rk meaure. Our numercal experment on a two player queung game demontrate the worth and applcablty of our model and correpondng Q-learnng algorthm. Keyword: Markov game; tme-content rk preference; fxed pont theorem; Q-learnng 1 Introducton Markov game generalze Markov decon procee MDP) to the mult-player ettng. In the clacal cae, each player eek to mnmze h expected cot. In correpondng equlbrum, no player can decreae h expected cot by changng h trategy. We often want to compute equlbrum to predct the outcome of the game and undertand the behavor of the player. In the preent paper, we drectly account for the rk preference of the player n a Markov game n a general way. Player may be rk-avere and thu gve more attenton to low probablty but hgh cot event than a rk-neutral player would. Model for the rk preference of a ngle agent are well etablhed ee e.g. [2, 53] for the ngle perod ettng and [52, 55] for the dynamc cae). In th paper, we extend thee dea to general um Markov game and extend the framework of Markov rk meaure ee [52, 55]) to the mult-agent ettng. There are two major component to our preent paper n th regard. Frt, we dentfy an approprate rk-aware equlbrum concept and then we argue that uch equlbra ext n tatonary tratege. Second, we provde a practcal equlbrum computaton cheme whch a mulatonbaed Q learnng type algorthm). Such Q-learnng type algorthm model-free whch doe not requre any knowledge on the true model and ytem tranton, and thu can earch for the equlbra purely by obervaton. 1.1 Lterature revew Rk preference Expected utlty theory [59, 18, 58]) a hghly developed framework for modelng preference. Yet, ome experment e.g. [42]) how that real human behavor may volate the ndependence axom of expected utlty theory. Rk meaure a developed n [2, 53]) do not requre the ndependence axom and have favorable properte for optmzaton. Alternatvely, [15, 14] defne preference n term of atfyng a contnuum of target.e. atfcng and apratonal preference). Wenje Huang wenje_huang@u.nu.edu) a Reearch Engneer and Ph.D. Canddate n Department of Indutral Sytem Engneerng and Management at Natonal Unverty of Sngapore. Pham Vet Ha epvh@nu.edu.g) a Potdoctoral Reearch Fellow n Department of Indutral Sytem Engneerng and Management at Natonal Unverty of Sngapore. Wllam B. Hakell ehwb@nu.edu.g) an Atant Profeor n Department of Indutral Sytem Engneerng and Management at Natonal Unverty of Sngapore. 1

2 In the dynamc ettng, [52, 55] develop the cla of Markov a.k.a. neted/terated) rk meaure and etablh ther connecton to tme-contency. Th cla of dynamc rk meaure notable for t recurve formulaton, whch lead to dynamc programmng equaton. Several computatonal cheme for optmzng Markov rk meaure have been propoed. For ntance, n [32, 30, 31], approxmate dynamc programmng and Q-learnng type algorthm for MDP wth Markov rk meaure are developed. In [62], a mulaton-baed ftted value teraton algorthm developed for large-cale mplementaton of th cla of MDP. In contrat to optmzng a Markov rk meaure, one may optmze a ngle rk meaure of the fnal outcome ee [49]). For example, the condtonal value-at-rk of the fnte and nfnte horzon) total cot optmzed n [5]. In [38, 39, 6], the expected utlty of the total cot optmzed. Further, n [6] t hown how to olve th problem for general utlty functon by dong dynamc programmng on an augmented tate pace. In [26], a general law nvarant rk meaure of the total cot optmzed ung the convex analytc method. Rk-aware and robut game Rk-entve game have already been condered n [37, 24, 3, 7]. Here, rk-entvty refer to the pecfc utlty functon 1/θ) ln E [exp θ X)]) where θ > 0 the rk entvty parameter. In [24, 3], the focu on the zero um contnuou tme ettng. In [33], the author model preference a expected utlty. In robut game, there ambguty about the cot or tate tranton probablte of the game. In [1], the author gve a robut equlbrum concept where each player optmze agant the wort-cae expected cot over the range of model ambguty. Th paradgm extended to Markov game n [35], and the extence of robut Markov perfect equlbra demontrated. In both [1, 35], a multlnear ytem formulaton ued to compute the correpondng robut equlbrum. Applcaton of rk-aware game Rk-aware game are not artfcal; rather, they emerge organcally from many real problem. Traffc equlbrum problem wth rk-avere agent are analyzed n [8] wth non-cooperatve game theory. The preference of rk-aware adverare are modeled n Stackelberg ecurty game n [51], and a computatonal cheme for robut defender tratege preented. In [20], the author tudy commodty tradng where the player optmze tme-content neted rk meaure. 1.2 Contrbuton We make the followng contrbuton n th paper: 1. Frt, we develop a model for rk-aware Markov game where agent have tme content preference. Th model pecfcally addree both ource of rk n a Markov game: ) the rk from the tochatc tate tranton and ) the rk from the randomzed tratege of the other player. 2. Second, we propoe a noton of rk-aware Markov perfect equlbra for th game. We how that there ext rk-aware equlbra n tatonary tratege. 3. We create a practcal mulaton-baed Q-learnng type algorthm for computng rk-aware Markov perfect equlbrum, and we how that t converge almot urely. Such Q-learnng type algorthm model-free whch doe not requre any knowledge on the true model and ytem tranton, and thu can earch for the equlbra purely by obervaton. Moreover, trandtonal multlnear formulaton approach ee [35, 1]) for computng equlbrum fal n our model, becaue our model addree both ource of rk, and then wll ncorporate a blnear term n the multlnear formuaton whch lead to computatonal ntractablty. Thu, t a necety to ue an alternatve lke a Q-learnng type algorthm to compute equlbrum. Th paper organzed a follow. Secton 2 revew prelmnare on clacal Markov game. Then, Secton 3 ntroduce our model for rk-aware Markov game. In Secton 4, we certfy the extence of rkaware Markov perfect equlbra. A for computng thee equlbra, Secton 5 develop our Q-learnng type algorthm. We report numercal experment for a queung game n Secton 6 and we conclude the paper n Secton 7. The detaled proof of all our man reult may be found n the Appendce. 2

3 1.3 Notaton We make ue of the followng tandard notaton: the upremum norm on R d. x, y := d =1 x y the Eucldean nner product n R d. X = D Y equalty n dtrbuton. For a fnte et : P ) the et of all probablty dtrbuton on. e denote the contant functon equal to one on. e δ the δ th unt vector n R for δ. A B the matrx Kronecker product for A R m n and B R p q : a 11 B a 1n B A B =..... R mp nq. a m1 B a mn B A B the matrx Hadamard product where [A B], j = A, j B, j for all = 1,..., m and j = 1,..., n. d H A, B) the Haudorff dtance between nonempty ubet A and B of R d wth repect to the Eucldean norm 2, explctly, } d H A, B) := max up nf a b 2, up nf a b 2. b B a A 2 Prelmnare a A Th ecton brefly revew the etup for clacal Markov game ee e.g. [22, 35, 21]). Our game, denoted I, S, A, P, c}, cont of the followng ngredent: Fnte et of player I. Fnte et of tate S. Fnte et of acton A for each player I; mult-acton A := I A ; tate-acton par K := S A. Tranton probablte P, a) P S) for all, a) K. Cot functon c : S A R for all player I. Each round t 0 of the game follow four tep: ) frt, all player oberve the current tate t S; ) econd, each player I chooe a t A all move are multaneou and ndependent, and the correpondng mult-acton a t = ) a t I ); ) thrd, each player I realze cot c t, a t ); and v) fnally, the tate tranton to t+1 accordng to the dtrbuton P t, a t ). We next characterze the player tratege. Let h t = 0, a 0, 1, a 1,..., a t 1, t ) be the htory up to tme t 0 whch nclude the tate t at the begnnng of tme t), and let H t := S A) t S be the et of all poble htore up to tme t 0. b B Defnton 2.1. ) A decon rule for player at tme t a functon d t : H t PA ), where [ d th t ) ] a ) the probablty that player wll chooe a A, condtoned on h t. 3

4 ) A behavoral trategy π for player I a equence of decon rule π := ) d t t 0, and Π the et of all behavoral tratege of player. ) A behavoral trategy π for player I Markov f [ d t h t ) ] a ) = [ d t t ) ] a ), h t H t, a A, t 0. v) A behavoral trategy π for player I tatonary f [ d t h t ) ] a ) = [ d t ) ] a ), h t H t, a A, t 0. We wrte π := π j) to denote the complementary behavoral trategy to player, o that multtratege can be wrtten a π = π, π ). j Let Ω := t 0 H t denote the et of trajectore ω = t, a t ) t 0. For each t 0, we let F t = σ 0, a 0,..., t ) be the natural fltraton on H t o that F 0 = σ 0 ), F t F t+1, and F t F := t 0 F t. It follow that Ω, F) a meaurable pace. Gven any mult-trategy π = π, π ) and ntal tate 0 =, we obtan a probablty dtrbuton P π on Ω, F) and we let E π denote expectaton wth repect to P π. In the clacal ettng, for a dcount factor γ 0, 1), each player objectve to chooe π to mnmze h expected nfnte horzon dcounted cot [ ] J 0 π, π ) := E π, π ) γ t c t, a t ), 2.1) gven the complementary trategy π, where the ubcrpt 0 denote the dependence on the ntal tate va P π 0 ). We now revew the equlbrum pont n lterature under rk-neutral ettng, whch are known a Markov perfect equlbra. Defnton 2.2. [22] [35, Defnton 1] Markov perfect equlbrum) A mult-trategy π = ) π a I Markov perfect equlbrum for I, S, A, P, c} f t=0 J 0 π, π ) J 0 π, π ), π Π, I. Th defnton tate that π an equlbrum pont f and only f no player can mprove h expected nfnte horzon dcounted cot by unlaterally changng h trategy. In other word, each player trategy a bet repone to the other player tratege. 3 Rk-aware Markov game In th ecton we develop our rk-aware Markov game model. Each player face a tream of cot X t = c t, a t ) for all t 0. There are two ource of tochatcty n th cot equence: ) tochatc tate tranton characterzed by the tranton kernel P, a); and ) the randomzed mxed tratege of other player characterzed by π. The key queton : how hould player account for both ource of tochatcty and evaluate the rk of the tal ubequence X t, X t+1,... from the perpectve of tme t? We begn by formalzng ome detal about the rk of fnte equence X t, T := X t, X t+1,..., X T ) before we conder the rk of the nfnte cot equence X 0, X 1,... actually faced by the player. For a reference dtrbuton P on Ω, F), let L t := L Ω, F t, P ) and L t, T := L t L t+1 L T for all 0 t T <. Defnton 3.1. ) A mappng ρ t, T : L t, T L t, called a condtonal rk meaure f: ρ t, T Z t, T ) ρ t, T X t, T ) for all Z t, T, X t, T L t, T uch that Z t, T X t, T. ) A dynamc rk meaure a equence of condtonal rk meaure ρ t, T } T t=0. Gven a dynamc rk meaure ρ t, T } T t=0, we may defne a larger famly of rk meaure ρ t, τ for 0 t τ T va the conventon ρ t, τ X t,..., X τ ) = ρ t, τ X t,..., X τ, 0,..., 0). We now make our key aumpton about player preference. 4

5 Aumpton 3.2. Suppoe the dynamc rk meaure ρ t, T } T t=0 atfe the followng condton: ) Normalzaton) ρ t, T 0, 0,..., 0) = 0. ) Condtonal tranlaton nvarance) For any X t, T L t, T, ρ t, T X t, X t+1,..., X T ) = X t + ρ t, T 0, X t+1,..., X T ). ) Convexty) For any X t, T, Y t, T L t, T and 0 λ 1, ρ t, T λ X t, T + 1 λ)y t, T ) λ ρ t, T X t, T ) + 1 λ)ρ t, T Y t, T ). v) Potve homogenety) For any X t, T L t, T and α 0, ρ t, T α X t, T ) = α ρ t, T X t, T ). v) Tme-contency) For any X t, T, Y t, T L t, T and 0 τ θ T, the condton X k = Y k for k = τ,..., θ 1 and ρ θ, T X θ,..., X T ) ρ θ, T Y θ,..., Y T ) mply ρ τ, T X τ,..., X T ) ρ τ, T Y τ,..., Y T ). Many of thee properte monotoncty, convexty, potve homogenety, and tranlaton nvarance) were orgnally ntroduced for tatc rk meaure n the poneerng paper [2]. They have nce been heavly jutfed n other work ncludng [54, 10, 46]. The next theorem gve a recurve formulaton for dynamc rk meaure atfyng Aumpton 3.2. Th repreentaton the foundaton of [52] and ubequent work on tme-content rk meaure. For th reult, we defne a mappng ρ t : L t+1 L t, where t 0, to be a one-tep condtonal) rk meaure f ρ t X t+1 ) = ρ t, t+1 0, X t+1 ). Theorem 3.3. [52, Theorem 1] Suppoe Aumpton 3.2 hold, then ρ t, T X t, X t+1,..., X T,...) = X t + ρ t X t+1 + ρ t+1 X t ρ T X T ) + )), 3.1) for all 0 t T, where ρ t,..., ρ T are one-tep rk meaure. Now we may conder the rk of an nfnte cot equence. Baed on [52], the dcounted meaure of rk ρ γ t, T : L t, T R defned va ρ γ t, T X t, X t+1,..., X T ) := ρ t, T γ t X t, γ t+1 X t+1,..., γ T X T ). Defne L t, := L t L t+1 for t 0 and ρ γ : L 0, R va ρ γ X 0, X 1,...) := lm T ργ 0, T X 0, X 1,...). To provde our fnal repreentaton reult, we ntroduce the addtonal aumpton that rk preference are tatonary they only depend on the equence of cot ahead, and are ndependent of the current tme). Aumpton 3.4. Statonary preference) For all T 1 and 0, ρ γ 0, T X 0, X 1,..., X T ) = ρ γ, T + X 0, X 1,..., X T ). When Aumpton 3.2 and 3.4 are atfed, the correpondng dynamc rk meaure gven by the recuron: ρ γ X 0, X 1,..., X T,...) = X 0 + ρ 1 γx 1 + ρ 2 γ 2 X ρ T γ T X T ) + )), 3.2) where ρ 1, ρ 2,... are all one-tep rk meaure. Baed on repreentaton 3.2), we may defne the rk-aware objectve for player to be: J 0 π, π ) := ρ c 0, a 0 ) + γ ρ c 1, a 1 ) + γ ρ c 2, a 2 ) + ))). 3.3) Here we ue the ame notaton J 0 π, π ) to preent rk-aware objectve rather than rk neutral one n 2.1). The correpondng bet repone functon for player then: mn π Π J 0 π, π ). 3.4) We let I, S, A, P, c, ρ} denote our correpondng rk-aware game wth preference gven by mappng J π, π ) }. Th formulaton lead to a natural noton of rk-aware equlbrum. Suppoe we replace I all the ρ wth expectaton E n formulaton 3.3) whch lead to formulaton 2.1), then Problem 3.4) wll naturally become rk-neutral Markov game. Thu our formulaton recover the rk-neutral game a a pecal cae. 5

6 a rk- Defnton 3.5. Rk-aware equlbrum n behavoral tratege) A mult-trategy π = ) π aware equlbrum for I, S, A, P, c, ρ} f I J 0 π, π ) J 0 π, π ), π Π, I. The nterpretaton of Defnton 3.5 analogou to Defnton 2.2 n the rk-neutral cae. In a rk-aware equlbrum π, player cannot reduce h rk a meaured by J 0 π, π )) by devatng from h trategy π. 4 Rk-aware Markov perfect equlbra In th ecton we conder equlbra of the rk-aware game I, S, A, P, c, ρ} n tatonary tratege. Frt we ntroduce new notaton to characterze thee equlbra, and then we demontrate the extence of rkaware equlbra n tatonary tratege. Statonary tratege precrbe a player the ame probablte for h choce each tme the player vt a certan tate, no matter what route he follow to reach that tate. However, normal behavor trategy may condton t choce of mxed acton, at any gven tage, on the entre htory; and therefore t mplementaton often a huge tak. Snce only a many decon rule a tate need be remembered, the memoryle property of tatonary tragete conform to real human behavor ee [60]). In addton, tatonary tratege are prevalent n the tudy of tochatc game due to ther mathematcal tractablty ee [60, 22]). 4.1 Characterzaton of tatonary equlbra For th dcuon, we uppoe that each player ha a tatonary polcy π Π where π = d, d,... ) for a decon rule d. In every tage of the game, each player mut evaluate the rk of the random varable c, A π)) + γ v S π)), where A π) the random mult-acton choen from A whch depend on the mult-trategy π), v ome meaure of the future rk for player to be determned hortly baed on recuron 3.2)), and S π) the random next tate vted whch frt depend on π through the random choce of mult-acton a, and then depend on the tranton kernel P, a) after a A realzed). Th random varable defned on the ample pace A S, where we wrte A before S to emphaze that the current tate fxed, then the mult-acton a choen accordng to π, and fnally the game tranton to the next tate accordng to P, a). The terated rk meaure Eq. 3.3) tate that we are ntereted n the tage-we rk of random varable on A S. We may explctly determne the dtrbuton of A π), S π)) n term of π and P. To contnue, we ntroduce ome mplfyng notaton to characterze tatonary tratege π whch correpond to decon rule d ) ). For each player I and tate S, I x P A ) the mxed trategy over acton where x ) a = [ d ) ] a ) ) for all a A. We defne the trategy x := x ) S X := S P A ) of player, the mult-trategy x := x ) I X := IX of all player, the complementary trategy x := x j ) j X := j X j, and the mult-trategy x = ) x I X := I P A ) for all player n tate S. We ometme wrte a mult-trategy a x = u, x ) to emphaze player trategy. We alo ntroduce the followng uccnct notaton for varou probablte: In tate S, the probablty that an acton tuple a = a ) I A choen and then the ytem tranton to tate k Π I x a )) P k, a). The dtrbuton of A π), S π)) on A S for every S gven by the matrx P u, x ) [ := u a ) Π j x j a j )) P k, a) ] a, k) A S, 4.1) where we explctly denote the dependence on the mult-trategy x = ) u, x n tate. 6

7 For tatonary tratege π, we adopt the conventon J u, x ) = J π, π ) ung the above notaton. In lne wth the clacal defnton of Markov perfect equlbrum n [22], we now defne a rk-aware Markov perfect equlbrum. Defnton 4.1. Rk-aware Markov perfect equlbrum) A mult-trategy x X a rk-aware Markov perfect equlbrum for I, S, A, P, c, ρ} f J x, x ) J u, x ), S, u X, I. 4.2) In Defnton 4.1, each player I mplement a rk-aware) tatonary bet repone gven the tatonary complementary trategy x. 4.2 Extence of tatonary equlbra To ae the tage-we rk on A S, each player need an etmate of the future rk tartng from the next tate S. Th etmate the value functon, whch form part of the decrpton of tatonary equlbrum: For each player, the value of the tatonary trategy x X n tate S defned to be v ) := Jx), and v := v ) ) the entre value functon for player for all tate. S The pace of value functon for all player V := I R S, and V equpped wth the upremum norm v := max S, I v ). Gven the value functon v, player face the random varable c, A) + γ v S ) where A the random acton choen accordng to x and S the random next tate. Player bet repone n tate S gven the complementary trategy x may then be expreed a nf ρ c, A) + γ v S )) : A, S ) P u, x )} u. PA ) We ee that the mappng c, A) + γ v S ) on A S fxed, whle the player control the dtrbuton P P A S) through ther mxed tratege. To contnue, we wll be more pecfc about the form of the rk functon ρ }. Let L be the et of all I random varable on A S whch are mappng from A S to R, uch random varable are automatcally bounded nce A S fnte). Alo let P denote a general probablty ) dtrbuton on A S correpondng to current tate S e.g. we wll uually take P to be P u, x ). By the Fenchel-Moreau theorem ee e.g. [23, 53, 25]), convex rk meaure on L wth repect to the underlyng probablty dtrbuton P have the form: ρ X) = up µ a, ) X a, ) α µ), X L, µ MP ) a, ) A S where M P ) PA S) a et of probablty dtrbuton that depend on P, M P ) cloed and convex, and α : P A S) R a convex functon. In our cae, evaluaton of the rk-to-go depend on computng wort-cae expectaton of [ c, A) + γ v S ) ], E A, S ) µ a µ range over M P ). To mplfy th expreon, we defne the horthand C v ) := c, a) + γ v ) ) a, ) A S, 4.3) for the rk-to-go, whch depend on v. Expectaton wth repect to µ M P ) may then be wrtten compactly a µ, C v ) := µ a, [ ) c, a) + γ v ) ]. a, ) A S We next ntroduce the followng aumpton throughout th paper on the rk functon ρ, the et of probablty dtrbuton M P ) } S, I and functon } α, that wll lead to the extence of S, I tatonary equlbrum. 7

8 Aumpton 4.2. ) All ρ are law nvarant, ρ X) = ρ Y ) for all X = D Y, where = D denote equalty n dtrbuton. ) M P ) } S, I P A S) a collecton of et-valued mappng where M P ) are cloed and polyhedral convex. Explctly, we conder M P ) := µ A, m µ + f m P ) h, m, m = 1,..., M, e T µ = 1, µ 0 }, 4.4) where A, m, m = 1,..., M are matrce, f m, m = 1,..., M are lnear functon n P, and h, m, m = 1,..., M are contant. ) } α S, I : P A S) R a collecton of functon. All α are convex and Lpchtz contnuou. v) For the mult-trategy x = ) ) u, x, A, S ) P u, x, and ρ c, A) + γ v S ) ) = max µ, C v ) αµ) }. µ M P) The formulaton 4.4) how the dependence of M P ) on P. In addton, f f m depend lnearly on P, then f m alo depend lnearly on u and x by defnton of P a 4.1). In computatonal term, Aumpton 4.2v) cloe to [35] whch aume polyhedral uncertanty et for the tranton probablte n t robut Markov game model. Th aumpton alo correpond to the one n [20] about repreentaton of agent rk preference. Under Aumpton 4.2v), we may wrte player rk functon ψ u, x, v ) := up µ M P) µ, C v ) α µ) }, 4.5) whch decrbe the rk-to-go for player from tate S under tatonary trategy ) u, x wth value functon v. A value functon correpond to a rk-aware Markov perfect equlbrum when v ) = mn u X J u, x ), S, I, 4.6) x arg mn u X J u, x ), S, I. 4.7) Eq. 4.6) - 4.7) together mply retate Eq. 4.2). However, Eq. 4.6) - 4.7) gve a computatonal recpe that we can encode nto an operator on mult-tratege. In partcular, we defne the operator Φx) := ũ X : ũ arg mn u PA ) ψ u, x, v ), } v ) = mn u PA ) ψ u, x, v ), S, I, 4.8) whch return the et of tratege for every player that are bet repone to all other player tratege. In the defnton of Φx), the condton v ) = mn u PA ) ψu, x, v ) undertood to hold for every S and I. We wh to etablh extence of a fxed pont of Φ, whch correpond to a rk-aware Markov perfect equlbrum. Our proof of extence of uch rk-aware Markov perfect equlbrum draw from [22, 36]. The man dea to apply Kakutan fxed pont theorem to how that th correpondence ha a fxed pont whch concde wth an equlbrum n tatonary tratege. Theorem 4.3. Suppoe Aumpton 4.2 hold, then I, S, A, P, c, ρ} ha an equlbrum n tatonary tratege. Proof. Proof ketch) In Appendx C, we how that Φ atfe the three condton needed to apply Kakutan theorem Theorem C.1): ) Frt, we how that the et Φ x) nonempty and a ubet of X. In Lemma C.7, we how that ψu, x, v ) contnuou n all t argument. In Theorem C.3 and C.4, we then how that Φ x) nonempty and a ubet of X. 8

9 ) Second, we how that the et Φ x) cloed and convex. Th tatement verfed n Lemma C.13. ) Fnally, we how that the correpondence Φ upper emcontnuou. For each mult-trategy, we have already etablhed an operator T x ee C.1) on the pace of value functon, and how that the operator a contracton. Thu, there a unque value functon for each player. Let u defne a mappng from tatonary tratege to value functon va τ x ) := v = v )) S : v ) = mn u PA ) ψ u, x }, v ), S, I. Each τ x ) return the value functon for player correpondng to a bet repone to x. Denote the th element of τ x ) by τx ), let x ) n be a equence of mxed tratege of all player atfyng lm n x ) n = x, and let the correpondng value functon for player be τ x ) ). It hown n n Lemma C.10 that f x ) n x and τx ) n ) v ) a n, then τx ) = v ) for any S). Upper emcontnuty of the correpondence Φ then follow from Lemma C.14 by ung the equalte v ) = ψ y, x, v ) = τ x ) n ) = mn u PA ) ψ u, x, v ), whch are derved from the trangle nequalty and Lemma C.10. We note that n a general-um Markov game ncludng our rk-aware varant), multple Markov perfect equlbra may ext. Each dtnct equlbrum lead to a dfferent unque rk-aware value functon baed on Theorem C.3 and C.4. 5 A Q-Learnng Algorthm In th ecton, we propoe a Q-learnng type algorthm for computng equlbra of the rk-aware game I, S, A, P, c, ρ}: Rk-aware Nah Q-learnng RaNahQL). RaNahQL mulaton-baed o t doe not requre a model for the cot functon c } or the tranton kernel P. I Our algorthm dffer from rk-neutral Q-learnng n two repect: ) t etmate rk wth a tochatc approxmaton type teraton and ) the Q-value update are baed on the Nah equlbra of tage game. Here the tage game are collecton of Q-value array for each player for all mult-acton, that are generated n each teraton of our algorthm. To deal wth tem ), we draw from [32, 30, 31] where multple tochatc approxmaton ntance for both rk etmaton and Q-value update are pated together. To deal wth tem ), we how that the Nah equlbrum mappng for tage game a non-expanve mappng. Then, we can apply tochatc approxmaton type analy to prove the convergence of the algorthm. For th ecton, we aume that our rk meaure ρ } have a pecal form a addle-pont problem to facltate computaton. Aumpton 5.1. Stochatc addle-pont problem) For all I, ρ X) = mn max E [ P G X, y, z) ], 5.1) y Y z Z where: ) Y R d1 and Z R d2 are compact, convex wth dameter D Y and D Z, repectvely. ) G Lpchtz contnuou on L Y Z wth contant K G > 1. ) G convex n y Y and concave n z Z. v) The ubgradent of G on y and z are Borel meaurable and unformly bounded for all X L. For our Q-learnng algorthm, we pecfcally focu on rk meaure that can be etmated by olvng a tochatc addle-pont problem uch a a Problem 5.1). The followng reult, baed on [31, Theorem 3.2], gve pecal condton on G for the correpondng rk functon ρ n 5.1) to have addtonal tructure. Theorem 5.2. Suppoe there a collecton of functon h z } z Z uch that: ) h z P -quare ummable for every y Y, z Z ; ) y h z X y) convex; ) z h z X y) concave; and v) G : L Y Z R gven by G X, y, z) = y + h z X y), then the mnmax rk meaure 5.1) a convex rk meaure. 9

10 We now gve ome example of functon h z } z Z atfyng the condton of Theorem 5.2 uch that the correpondng rk-aware Markov perfect equlbrum ext. Example 5.3. The dtance between any probablty dtrbuton and a reference dtrbuton may be meaured by a φ-dvergence functon, everal example of φ-dvergence functon are hown n Table 3 n Appendx A. We can, n prncple, approxmate convex φ-dvergence functon wth pecewe lnear convex functon of the form ˆφµ) = max j J d j, µ + g j }. Ung the above form of ˆφ, we may then defne a correpondng et of probablty dtrbuton: M P ) = µ : µ = P ξ, B µ = e, µ 0, B P d j ξ + g j ) α e, j J }, 5.2) for contant α 0, 1) for all I. Baed on [50, Lemma 1], the rk meaure correpondng to 5.2) ha the form [ )]} X η ρx) = nf η + b α + b E P φ, 5.3) b 0, η R b where φ the convex conjugate of ˆφ. ) Let φ z denote a famly of φ-dvergence functon parameterzed by z Z that concave n Z, and let ˆφ z and φ z denote the correpondng pecewe lnear approxmaton and t convex conjugate, repectvely. Then, we may defne M z := µ : µ = P ξ, B µ = e, µ 0, B P ˆφ } z ξ ) α e, 5.4) and the rk meaure correpondng to z Z M z [ ρx) = nf η max + b α + b E P φ b 0, η R z Z z Suppoe we chooe h z from Theorem 5.2) to be h z X η) b = φ z X η b ), X η for any η R and b > 0. Aume X ha bounded upport [η mn, η max ], then 5.5) become ρx) = mn max η + E P η [η mn, η max] z Z [h z X η)]}, b )]}. 5.5) whch conform to the mnmax tructure n Eq 5.1). ) To recover CVaR, we let α 0, 1) for all I and chooe the φ-dvergence functon 0 0 x e φx) = 1 α, otherwe, and we take M P ) = µ : µ = P ξ, B µ = e, µ 0, 0 µ P } 1 α. 5.6) If we take the convex conjugate of th φ-dvergence functon and ubttute t nto Eq. 5.3), we obtan ρx) = mn η + 1 α ) 1 E P [max X η, 0}] }, η [η mn, η max] correpondng to h z X η) = 1 α ) 1 max X η, 0} for all z Z. 10

11 5.1 Rk-aware Nah Q-learnng algorthm The RaNahQL update are baed on future equlbrum cot whch depend on all player). In contrat, ngle-agent Q-learnng update are only baed on the player own cot. Thu, to predct equlbrum loe, every player mut mantan and update a model for all other player cot and rk. Let x be a tatonary equlbrum, and let v = ) v be the value functon correpondng to I x defned n 4.6). Then, we let Q, a) := mn max E P, a) G c, A) + γ v S), y, z )},, a) S A, I, 5.7) y Y z Z denote the Q-value correpondng to x and t value functon v. In the cae of multple equlbra, dfferent Nah trategy profle may have dfferent equlbra Q-value. A tage game, nformally, a one hot game. In a mult-agent Q-learnng algorthm, the agent are eentally playng a equence of tage game where the payoff are the current Q-value. In the followng defnton, we abue notaton and momentarly drop the dependence on the tate S. Defnton 5.4. ) A tage game a collecton C ) I of player cot C := c a) : a A} for cot functon c : A R. ) For x = x ) where I x P A ) for all I, C x, x ) := a A I x a )) c a) the expected cot to player. ) A mult-trategy x = x ) conttute a Nah equlbrum for the tage game C ) f I I C x, x ) C u, x ), u P A ), I,.e. f no player can reduce h expected cot by devatng from x. v) Let x = x ) denote a Nah equlbrum of the tage game C ). Then for all I, I I player expected cot n th equlbrum. Nah C j ) j I := C x, x ), Remark 5.5. There are everal method for computng Nah equlbra of tage game. The Lemke-Howon algorthm for two player bmatrx) game propoed n [41]. Th algorthm effcent n practce, yet, n the wort cae the number of pvot operaton may be exponental n the number of the game pure tratege. Recently, [44] gve an algorthm for two player game that acheve polynomal-tme complexty. Polynomal-tme approxmaton method, uch a [16, 27, 45], have been propoed for general um game wth more than two player. Now, we decrbe the pecfc of the tage game n each round for our rk-aware ettng. In each tate S, the correpondng tage game the collecton Q )) I, where Q ) := Q, a) : a A} the array of Q-value for player for all mult-acton. Let x be an equlbrum of the tage game Q )) I, then Nah Q j )) j I := x j a j ) Q, a), I, a A j I gve each player correpondng expected cot n tate wth repect to the Q-value). To open the dcuon of our algorthm, we frt ummarze the man tep of Q-learnng for rk-neutral Markov game a developed n [29]. Let θ t } t 0 denote the tep-ze for updatng the Q-value. For every teraton t 0 and player I: 1. Player oberve the current tate t and then chooe a t A. 2. Player oberve t own cot c t, a t ), the acton taken by all other player player cot c j t, a t ) } j, and latly the next tate t+1 after the tranton. ) a j t, the other j 3. Player compute Nah Q j t )) j I. 11

12 Algorthm 1 Rk-aware Nah Q-learnng Step 0) Intalze: Let n = 1, and t = 1, get the ntal tate 1. Let the learnng agent be ndexed by. For all S and a A, I, let Q j n,t, a) = 0. For n = 1,..., N do Step 1) Chooe a n baed on the exploraton polcy π. Oberve the acton and cot for all player, then oberve a new tate; For t = 1,..., T do Step 2) Compute the Nah Q-value; Compute the rk-aware cot-to-go for all player; Step 3) Update each Q n,t, I ung tochatc approxmaton; Step 4) Stochatc approxmaton of rk meaure by SASP; end for end for Return Approxmated Q-value Q N,T, I. 4. Player update t Q-value accordng to Q t+1 t, a t ) = 1 θ t )Q t t, a t ) + θ t [ c t, a t ) + γ Nah Q j t+1 )) j I ]. RaNahQL buld upon the algorthm n [29] for the rk-aware cae. It an aynchronou algorthm baed on two loop: an outer loop for updatng the Q-value a n the rk-neutral cae) and a new nner loop for etmatng rk whch unque to our ettng). The detal of RaNahQL are gven n Algorthm 1, where: N the number of teraton n the outer loop. T the number of teraton n the nner loop. n, t) an epoch, correpondng to teraton n of the outer loop and teraton t of the nner loop. Q n,t, a) the Q-value etmate for player for tate-acton par, a) K n epoch n N and t T. β 1/2, 1] the learnng rate, β = 1 a lnear learnng rate and β 1/2, 1) a polynomal learnng rate. } θβ n are the tep-ze for the Q-learnng update n the outer loop, for learnng rate β. n 1 λ t,α } t 1 are the tep-ze for the rk etmaton n the nner loop, we take λ t,α = C t α for C > 0 and α 0, 1]. τ : [1, n] [1, n] a functon atfyng τ n) [1, n]. H Y and H Z are potve parameter. y n,t, a), z n,t, a) ) Y Z the etmate of the addle pont of the mnmax problem 5.1) whch repreent player rk, correpondng to tate-acton par, a) K, n epoch n, t). π an exploraton polcy where, n each tate S, all mult-acton a A have potve probablty of beng ampled. We gve further detal on each tep of Algorthm 1 a follow. We wll hortly requre.e., the Eucldean projecton onto Y Z. Step 0: Intalzaton: Π Y Z [y, z)] := arg mn y, z) y, z ) Y Z y, z ) 2, 12

13 Step 0a: Intalze all Q-value Q 1,1, a) for all, a) K and I; Step 0b: Intalze y 0,t, a), z 0,t, a) ) for all t T,, a) K, and I. Step 1: For all, a) K and I, et yn,1, a), zn,1, a) ) = yn 1,T, a), z n 1,T, a)) and Q n,1, a) = Q n 1,T, a). Step 2: All agent oberve the current tate n t : Step 2a: Generate an acton a n from polcy π whch gve ome potve probablty to all acton); Step 2b: Oberve acton a n = a n) I, cot c n t, a n ) } I, and next tate n t+1 P n t, a n ). ) Step 3: Compute Nah Q-value vn 1 n t+1) = Nah Q j n 1,T n t+1) for all I: Step 3a: Compute and ˆq n,t n t, a n ) = G c n t, a n ) + γ v n 1 n t+1), y n,t n t, a n ), z n,t n t, a n ) ), 5.8) y n,t n t, a n ), z n,t n t, a n ) ) = 1 t τ t) + 1 t τ=τ t) j I y n,τ n t, a n ), z n.τ n t, a n ) ), 5.9) for all I. Th tep oberve a new tate and compute the etmated Q-value ˆq n,t; Step 3b: Ue the Q n 1,T at the lat tage for etmaton rather than record all the etmated Q-value for each Q-value for each t T at tage n 1. Step 4: For all, a) K, and I, compute Q n,t, a) = 1 θ n β, a) ) Q n 1,T, a) + θ n β, a) ˆq n,t n t, a n ). 5.10) Th update the ame a n tandard Q-learnng w.r.t. the outer loop. Step 5: Update y n,t+1 n t, a n ), z n,t+1 n t, a n ) ) =Π Y Z y n,t n t, a n ), z n,t n t, a n ) ) for all I, and λ t,α ψ c n t, a n ) + γ v n 1 n t+1), y n,t n t, a n ), z n,t n t, a n ) )}, 5.11) ψ vn 1 n t+1), yn,t n t, a n ), zn,t n t, a n ) ) HY G = yc n t, a n ) + γ vn 1 n t+1), yn,t n t, a n ), zn,t n t, a n )) H Z G zc n t, a n ) + γ vn 1 n t+1), yn,t n t, a n ), zn,t n t, a n )) ). 5.12) Th the rk etmaton tep, t update the current terate of the rk correpondng to each elected tate-acton par. In Step 5.9), 5.11) and 5.12), we ue the tochatc approxmaton for addle-pont problem SASP) algorthm, [47, Algorthm 2.1]. Clacal tochatc approxmaton may reult n extremely low convergence for degenerate objectve.e. when the objectve ha a ngular Hean). However, the SASP algorthm wth a properly choen parameter α 0, 1] preerve a reaonable cloe to On 1/2 )) convergence rate, even when the objectve non-mooth and/or degenerate. Thu, SASP a robut choce for olvng our addle-pont problem 5.1). 13

14 Game 1 Left Rght Up 0, 1 10, 7 Down 7, 10 11, 8 Game 2 Left Rght Up 5, 5 10, 4 Down 4, 10 8, 8 Game 3 Left Rght Up 0, 1 10, 9 Down 7, 10 8, 8 Table 1: Example of Stage Game 5.2 Almot ure convergence analy We now tate the man convergence reult for RaNahQL. We would lke to how convergence of Q n,t to the rk-aware equlbrum Q-value Q for all player. We are ntereted n the followng pecal type of Nah equlbra, whch play a major role n our analy a n [29]. Defnton 5.6. ) [29, Defnton 12] A mult-trategy x X a global optmal pont of C ) I player mnmze h expected cot at x: f every C x) C x ), x X, I. ) [29, Defnton 13] A mult-trategy x X a addle pont of C ) f 1) t a Nah equlbrum I and 2) each player would receve a lower expected cot f at leat one of the other player devate: C x, x ) C x, u ), u X, I. ) A mult-trategy x X a I -mxed pont of C ) I ext an ndex of player I I uch that: f 1) t a Nah equlbrum and 2) there C x) C x ), x X, I, and C x, x ) C x, u ), u X, I\I. A global optmal pont alway a Nah equlbrum, and t can be hown that all global optma have equal value. Addtonally, [29, Lemma 14] how that all addle pont of tage game have equal value. Our defnton of I -mxed pont new, t combne the precedng two dea. In Defnton 5.6, a ubet of player I I mnmze ther expected cot at x. The ret of the player I\I each would receve a lower expected cot when at leat one of the other player devate. Example 5.7. We gve an example of I -mxed pont n Table 1. Player 1 ha choce Up and Down, and Player 2 ha choce Left and Rght. Player 1 lo the frt entry n each cell, and Player 2 are the econd. The frt game ha a unque Nah equlbrum 0, 1), whch a global optmal pont. The econd game alo ha a unque Nah equlbrum 8, 8), whch a addle-pont. The thrd game ha two Nah equlbrum: a global optmum 0, 1), and a mxed pont 8, 8). In equlbrum 8, 8), Player 1 receve a lower cot f Player 2 devate, whle Player 2 receve a hgher cot f Player 1 devate. We ntroduce the followng aumpton for our analy of RaNahQL. Aumpton 5.8. One of the followng hold for all tage game Q n,t )) I for all n and S n Algorthm 1. Condton A. Every Q n,t )) I for all n and S ha a global optmal pont. Condton B. Every Q n,t )) I for all n and S ha a addle pont. Condton C. For any two tage game Q, Q Q n,t )) I for all n and S, we uppoe Q 1 ha a I 1 -mxed pont x and Q 2 ha a I 2 -mxed pont x. Then: ) For I 1 I\I 2 ), then Q x) Q x); ) For I 2 I\I 1 ), then Q x) Q x). Condton C mply tate that there are I -mxed pont for any two tage game. Compared wth [29, Aumpton 3], Condton C n Aumpton 5.8 enable wder applcaton of RaNahQL, even f the ndce I 1 and I 2 of all the tage game Q n,t )) I dffer. 14

15 Remark 5.9. ) Implementaton of RaNahQL complcated by the fact that there mght be multple Nah equlbra for a tage game. In RaNahQL, we chooe a unque Nah equlbrum ether baed on t expected lo, or baed on the order t ranked n a lt of oluton. Such an order determned by the acton equence, whch ha lttle to do wth the equlbrum condton. ) For a two-player game, we calculate Nah equlbra ung the Lemke-Howon method ee [41]), whch can generate equlbrum n a certan order. We now lt the neceary defnton and aumpton for our algorthm, mot of whch are tandard n the tochatc approxmaton lterature. We frt defne a probablty pace Ω, G, P ) where G = σ n t, a n ), n N, t T }, and the fltraton G n t = σ τ, a τ ), < n, τ T } n τ, a n τ ), τ t} }, for t T and n N, wth G 0 t =, Ω} for all t T. Th fltraton neted, G n t G n t+1 for 1 t T 1 and G n T Gn+1 0. The followng aumpton reflect our exploraton requrement. Aumpton ε-greedy polcy) There an ε > 0 uch that the polcy π atfe, for any n N, t T, and all, a) K, P ) n t, a n ) =, a) Gt 1 n ε and P n 1, a n ) =, a) G n 1 ) T ε. In partcular, let x X be a Nah equlbrum of the tage game Q )) I. Then, wth probablty ε 0, 1), the acton a choen randomly from A, and wth probablty 1 ε, the acton a drawn from A accordng to x. Aumpton 5.10 guarantee, by the Extended Borel-Cantell Lemma n [13], that we wll vt every tate-acton par nfntely often wth probablty one. Th aumpton balance exploraton and explotaton n RaNahQL and n Q-learnng more generally. The next aumpton contan our requrement on the tep-ze for the Q-value update. Aumpton For all, a) K and for all n N, t T, the tep-ze for the Q-value update atfy: n=1 θn β, a) = and n=1 θn β, a)2 < for all t T and, a) K a.. Let #, a, n) denote one plu the number of tme, untl the begnnng of teraton n, that the tate-acton par, a) ha been vted, and let N,a denote the et of outer teraton where acton a wa performed n tate. The tep-ze θβ n, a) atfy θn β, a) = 1 f n N,a and θ n [#,a,n)] β β, a) = 0 otherwe. Aumpton 5.11 reflect the aynchronou nature of the Q-learnng algorthm a tated n [19], only a ngle tate-acton par updated when t oberved n each teraton. Our man convergence reult for Algorthm 1 next. Theorem Suppoe Aumpton 5.10, 5.11, and 5.8 hold. For any T 1, there ext 0 < γ 1/K G, uch that Algorthm 1 generate equence } Q n,t uch that n 1 Q n, T Q almot urely a n for all I. Proof. Proof ketch) The complete tep of the proof are preented n Appendx D. ) In Lemma D.2, we how that all mxed pont of a tage game have equal value. Conequently, n Lemma D.3, we how that the mappng from Q-value to Nah equlbrum of the tage game) nonexpanve. ) We how n Lemma D.1 that the Haudorff dtance between the two ubdfferental correpondng to the repreentaton of ρ n 5.1)) and S n,t := G yc + γ v n 1, y n,t, z n,t), G zc + γ v n 1, y n,t, z n,t))}, S n,t := G yc + γ v, y n,t, z n,t), G zc + γ v, y n,t, z n,t))}, bounded by a functon of Q n 1,T Q 2. ) Baed on part ) and ), we how that the gap of all the addle pont etmaton problem yn,t, zn,t) yn,, zn, ) 2 2 and y, z ) yn,, zn, ) 2 2 are bounded by functon of Q n 1,T Q 2 2 and Q 2 2, repectvely, n Lemma D.4 and D.5. v) Fnally, baed on part )-), we etablh that the Q-value n RaNahQL are a well-behaved tochatc approxmaton equence ee e.g. [19, Defnton 7]). Almot ure convergence of Q n, T Q a n for all I then follow. Q n 1 T 15

16 , we can compute the correpondng rk-aware Markov perfect equ- Once we have the Q-value ) Q lbrum and value functon. I Remark We brefly dcu the computatonal complexty of RaNahQL. RaNahQL need to mantan I Q-value and I S rk etmate n term of computng oluton of the correpondng addle-pont problem). In each teraton, RaNahQL update all Q, a) for all, a) S A and I. Addtonally, t update y, a), z, a) ) for all I through SASP. The total number of entre n each array Q S A. Snce RaNahQL ha to mantan the Q-value for every player, the total pace requrement I S A. The torage requrement for the rk etmaton mlar. Therefore, the torage requrement of RaNahQL n term of pace lnear n the number of tate, polynomal n the number of acton, and exponental n the number of player. The algorthm runnng tme domnated by computaton of Nah equlbrum for the Q-functon update. In general, the complexty of equlbrum computaton n matrx game unknown. A mentoned n the prevou ecton, ome commonly ued algorthm for two player game have exponental wort-cae bound, and approxmaton method are typcally ued for n-player game ee [45]). 6 A Queung Control Applcaton We apply our technque to the ngle erver exponental queung ytem from [35]. In th packet wtched network, packet block of data) are routed between erver over lnk hared wth other traffc. The ervce rate of each erver can be et to dfferent level and controlled by a ervce provder Player 1). Packet are routed by a programmable phycal devce, called a router Player 2). A router dynamcally control the flow of arrvng packet nto a fnte buffer at each erver. The rate choen by the ervce provder and router depend on the number of packet n the ytem. In fact, t to the beneft of a ervce provder to ncreae the amount of packet proceed n the ytem. However, uch an ncreae may reult n an ncreae n packet watng tme n the buffer called latency), and router are ued to reduce packet watng tme. Thu, the game theoretcal nature of the problem are becaue the ervce provder and router the have uch competng objectve. The tate pace S = 0, 1,..., S}, where S < the maxmum number of packet allowed n the ytem. Only one packet can be n ervce at each tme, whle the remanng packet wat for ervce n the buffer. The router admt one packet nto the ytem at each tme. Every tme a tate vted, the ervce provder and the router multaneouly chooe a ervce rate µ > 0 and an admon rate λ > 0. Suppoe there are packet n the ytem and the player chooe the acton tuple µ, λ), then the router ncur a holdng cot h) and a cot θµ, λ) aocated wth havng packet erved at rate µ when t admt packet at rate λ. If there are no packet n the ytem, θµ, λ) can be nterpreted a the etup cot of the erver. Thee payoff are modeled a beng pad to the ervce provder, nce the player objectve are n conflct. The ervce provder, n turn, pay the router βµ, λ) whch repreent the reward to the router for choong the rate λ. It can alo be nterpreted a the etup cot of the router. The cot functon for mult-acton a = µ, λ) are then: c 1, a) := βa) θa) and c 2, a) := h) + θa) βa). We aume that the tme untl the admon of a new packet and the next ervce completon are both exponentally dtrbuted wth mean 1/λ and 1/µ, repectvely. We can, therefore, model the number of packet n the ytem a a brth and death proce wth tate tranton probablte: µ/λ + µ), 1 < < S, k = 1, λ/λ + µ), 0 < < S 1, k = + 1, P k, a) := 1, = 0, k = 1, 1, = S, k = S 1. We chooe the followng parameter for our example: S = 30. Each player ha the ame two avalable acton n every tate: 16

17 Servce Provder α 1 ) Router α 2 ) Scenaro Scenaro Scenaro Scenaro Table 2: Rk Tolerance Level α router: frt acton denoted λ) to admt one packet nto the ytem every 10; econd acton denoted λ) to admt one packet every 25. ervce provder: frt acton denoted µ) to erve one packet every 11; econd acton denoted µ) to erve a packet every 20. Holdng cot are exponental h) = a b α for 1 wth a = 1.2 and b = e, and α = 0.2 and h 0) = 0. We et cot: θµ, λ) = θµ, λ) = 110, θµ, λ) = θµ, λ) = 90, βµ, λ) = 60, βµ, λ) = 30, βµ, λ) = 20, and βµ, λ) = 70. In th ettng, the router pay the ervce provder more when the ervce rate hgher. Alo, the router receve hgher reward when both player chooe hgher rate or lower rate. The router receve lower reward when the admon and ervce rate do not match. We conduct three experment, where all rk-aware player ue CVaR. The CVaR for player CVaR α X) := mn η R η + 1 E [max X η, 0}] 1 α where α the rk tolerance for player. When mplementng RaNahQL, we ue the Lemke-Howon method to compute the Nah equlbra of tage game, and we ue the frt-nah learnng agent to update the Q-value baed on the frt Nah equlbrum generated. We run our experment n Matlab R2015a on a computer wth an Intel Core GHz proceor, 8GM RAM, runnng the 64-bt Wndow 8 operatng ytem. Experment I RaNahQL v. Nah Q-learnng): We compare RaNahQL for rk-aware Markov game wth Nah Q-learnng n [29] for rk-neutral Markov game, n term of ther convergence rate. Gven any precon ɛ > 0, we record the teraton count n untl the convergence crteron Q n, T Q 2 ɛ atfed where Q n, T replaced by Q n for Nah Q-learnng). Here we chooe T = 10 and we chooe N = for RaNahQL and N = for Nah Q-learnng, uch that both method have the ame total number of teraton. When ɛ extremely mall e.g., ɛ = 0.001, the total number of teraton for RaNahQL and Nah Q-learnng for the two player are repectvely: Nah Q-learnng, Servce provder), Nah Q-learnng, Router), RaNahQL, Servce provder and Router), whch are relatvely equal. Moreover, Fgure 1 how that the total number of teraton for Nah Q-learnng decreae dramatcally a the ncreae of precon ɛ, whch reveal that RaNahQL more computatonally expenve than Nah Q-learnng n term of achevng the ame convergence crteron. Fgure 2 preent the Markov perfect equlbrum for the rk neutral and rk-aware cae. It how the equlbrum hftng when conderng the rk-awarene of player. It alo how that the both rk-neutral and rk-aware Markov perfect equlbrum are entve to the perturbaton n the ervce rate, and rkaware tratege for both player hghly fluctuate wth the change of tate number of packet n the queung ytem). We alo tudy how the rk tolerance level α See Table 2) affect the rk-aware Markov perfect equlbrum, whch alo how the rk-aware Markov perfect equlbrum fluctuate wth the change of the rk tolerance level of CVaR. Next, we evaluate the dcounted cot under rk-neutral and rk-aware Markov perfect equlbrum n mulaton 1000 ample). The rk tolerance level are elected a α 1 = α 2 = 0.1, for the rk-aware CVaR) method n Table 3 here. Table 3 how that conderng rk awarene wll gnfcantly ncreae the varance of the dcounted cot, whch contrary to reaon. The poble reaon the hgher fluctuaton of rk-aware tratege wth the change of tate number of packet n the queung ytem) than rk neutral tratege. }, 17

18 Fgure 1: Comparon between NahQL and RaNahQL Player Method Mean Varance 5%-CVaR 10%-CVaR Servce Provder Router Rk-neutral e Rk-aware CVaR) Rk-neutral Rk-aware CVaR) Table 3: Smulaton for Rk-neutral Stratege and Rk-aware Stratege α 1 = α 2 = 0.1) In th experment, ncorporatng rk wll help the ervce provder reduce t mean cot, whle ncreae the mean cot of the router. Suppoe we conder an extreme cae when router almot a rk neutral decon maker α 1 = 0.1 and α 2 = 0.95), hown n Table 4. Table 4 how that the mean cot of ervce provder 44.31) lower than that under the rk neutral Markov perfect equlbrum 22.22), and the mean cot of router 59.64) lower than that under the rk-aware Markov perfect equlbrum 37.48). Th reult how that ncorporatng rk preference or not, can help decon maker to reach new equlbrum that further reduce h mean cot than the cae where both player are ether rk-neutral or rk-aware. Experment II RaNahQL v. Multlnear Sytem): In th experment, we conder a pecal cae where the rk only come from th ettng bacally a rk-aware nterpretaton of [36] where the ambguty over the tranton kernel). In th pecal cae, we can compute rk-aware Markov equlbrum Player Method Mean Varance 5%-CVaR 10%-CVaR Servce Provder Rk-aware CVaR) Router Rk-aware CVaR) Table 4: Smulaton for Rk-aware Stratege α 1 = 0.1, α 2 = 0.95) 18

19 Fgure 2: Comparon of Rk Neutral and Rk-aware Markov Perfect Equlbrum 19

20 Fgure 3: Almot Sure Convergence of RaNahQL ung a multlnear ytem a detaled n Appendx E. We evaluate performance n term of the relatve error ) 2 S Nah Q j n, T )) j I v ), n N. S v ) 2 In th experment, we take the rk meaure a 10%-CVaR. The multlnear ytem olved by an nteror pont algorthm wthn maxmum functon evaluaton and maxmum teraton, and t converge to a local optmal oluton n econd. For RaNahQL, we chooe T = 10 and N = , and the total mplementaton tme for RaNahQL econd. The followng Fgure 3 valdate the almot ure convergence of RaNahQL to the ervce provder trategy. For the router, the relatve error large around 190%). One poble reaon that RaNahQL converge to dfferent equlbra a the one olved by multlnear ytem. We ee that RaNahQL poee far uperor computatonal performance than nteror pont algorthm olvng a mult-lnear ytem, nce the relatve error of ervce provder wthn 25% n teraton, and the mplementaton tme wll be econd. Experment III Tme Complexty): In term of tme complexty, n [31, Theorem 4.7] t hown that the ngle-agent veron of RaNahQL ha convergence rate Ω S A lns A/δɛ)/ɛ 2 ) 1/β + ln S A/ɛ)) 1/1 β) ), 6.1) wth probablty 1 δ. In the mult-agent cae, we may replace A wth A n the term 6.1) to get a rough etmate of the tme complexty of RaNahQL although we cannot clam theoretcal convergence rate). Intead, we valdate th conjecture numercally. In th experment, we chooe the precon level ɛ [0.01, 0.1]. Fgure 4 llutrate that the optmalty gap of ervce provder and router under the number of teraton that computed through 6.1), are bounded by ɛ, whch confrm that the order 6.1) an acceptable etmaton of the convergence rate of RaNahQL. 7 Concluon In th paper, we propoe a model for non-cooperatve Markov game wth tme-content rk-aware player. Th model characterze the rk from both the tochatc tate tranton and the randomzed tratege of the other player. We frt propoe an approprate concept for rk-aware Markov perfect equlbrum and then we demontrate the extence of uch rk-aware equlbra n tatonary tratege. The extence of equlbra derved by Kakutan fxed pont theorem. We then analyze the convergence of a mulatonbaed Q-learnng type algorthm for equlbrum computaton, where the rk meaure have a pecal form to 20

Additional File 1 - Detailed explanation of the expression level CPD

Additional File 1 - Detailed explanation of the expression level CPD Addtonal Fle - Detaled explanaton of the expreon level CPD A mentoned n the man text, the man CPD for the uterng model cont of two ndvdual factor: P( level gen P( level gen P ( level gen 2 (.).. CPD factor

More information

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction ECONOMICS 35* -- NOTE ECON 35* -- NOTE Specfcaton -- Aumpton of the Smple Clacal Lnear Regreon Model (CLRM). Introducton CLRM tand for the Clacal Lnear Regreon Model. The CLRM alo known a the tandard lnear

More information

Harmonic oscillator approximation

Harmonic oscillator approximation armonc ocllator approxmaton armonc ocllator approxmaton Euaton to be olved We are fndng a mnmum of the functon under the retrcton where W P, P,..., P, Q, Q,..., Q P, P,..., P, Q, Q,..., Q lnwgner functon

More information

Improvements on Waring s Problem

Improvements on Waring s Problem Improvement on Warng Problem L An-Png Bejng, PR Chna apl@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th paper, we wll gve ome mprovement for Warng problem Keyword: Warng Problem,

More information

Batch RL Via Least Squares Policy Iteration

Batch RL Via Least Squares Policy Iteration Batch RL Va Leat Square Polcy Iteraton Alan Fern * Baed n part on lde by Ronald Parr Overvew Motvaton LSPI Dervaton from LSTD Expermental reult Onlne veru Batch RL Onlne RL: ntegrate data collecton and

More information

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder S-. The Method of Steepet cent Chapter. Supplemental Text Materal The method of teepet acent can be derved a follow. Suppoe that we have ft a frtorder model y = β + β x and we wh to ue th model to determne

More information

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Team Stattc and Art: Samplng, Repone Error, Mxed Model, Mng Data, and nference Ed Stanek Unverty of Maachuett- Amhert, USA 9/5/8 9/5/8 Outlne. Example: Doe-repone Model n Toxcology. ow to Predct Realzed

More information

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters Chapter 6 The Effect of the GPS Sytematc Error on Deformaton Parameter 6.. General Beutler et al., (988) dd the frt comprehenve tudy on the GPS ytematc error. Baed on a geometrc approach and aumng a unform

More information

Scattering of two identical particles in the center-of. of-mass frame. (b)

Scattering of two identical particles in the center-of. of-mass frame. (b) Lecture # November 5 Scatterng of two dentcal partcle Relatvtc Quantum Mechanc: The Klen-Gordon equaton Interpretaton of the Klen-Gordon equaton The Drac equaton Drac repreentaton for the matrce α and

More information

Two Approaches to Proving. Goldbach s Conjecture

Two Approaches to Proving. Goldbach s Conjecture Two Approache to Provng Goldbach Conecture By Bernard Farley Adved By Charle Parry May 3 rd 5 A Bref Introducton to Goldbach Conecture In 74 Goldbach made h mot famou contrbuton n mathematc wth the conecture

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

This appendix presents the derivations and proofs omitted from the main text.

This appendix presents the derivations and proofs omitted from the main text. Onlne Appendx A Appendx: Omtted Dervaton and Proof Th appendx preent the dervaton and proof omtted from the man text A Omtted dervaton n Secton Mot of the analy provded n the man text Here, we formally

More information

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors MULTIPLE REGRESSION ANALYSIS For the Cae of Two Regreor In the followng note, leat-quare etmaton developed for multple regreon problem wth two eplanator varable, here called regreor (uch a n the Fat Food

More information

Batch Reinforcement Learning

Batch Reinforcement Learning Batch Renforcement Learnng Alan Fern * Baed n part on lde by Ronald Parr Overvew What batch renforcement learnng? Leat Square Polcy Iteraton Ftted Q-teraton Batch DQN Onlne veru Batch RL Onlne RL: ntegrate

More information

Estimation of Finite Population Total under PPS Sampling in Presence of Extra Auxiliary Information

Estimation of Finite Population Total under PPS Sampling in Presence of Extra Auxiliary Information Internatonal Journal of Stattc and Analy. ISSN 2248-9959 Volume 6, Number 1 (2016), pp. 9-16 Reearch Inda Publcaton http://www.rpublcaton.com Etmaton of Fnte Populaton Total under PPS Samplng n Preence

More information

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm Start Pont and Trajectory Analy for the Mnmal Tme Sytem Degn Algorthm ALEXANDER ZEMLIAK, PEDRO MIRANDA Department of Phyc and Mathematc Puebla Autonomou Unverty Av San Claudo /n, Puebla, 757 MEXICO Abtract:

More information

Information Acquisition in Global Games of Regime Change (Online Appendix)

Information Acquisition in Global Games of Regime Change (Online Appendix) Informaton Acquton n Global Game of Regme Change (Onlne Appendx) Mchal Szkup and Iabel Trevno Augut 4, 05 Introducton Th appendx contan the proof of all the ntermedate reult that have been omtted from

More information

Statistical Properties of the OLS Coefficient Estimators. 1. Introduction

Statistical Properties of the OLS Coefficient Estimators. 1. Introduction ECOOMICS 35* -- OTE 4 ECO 35* -- OTE 4 Stattcal Properte of the OLS Coeffcent Etmator Introducton We derved n ote the OLS (Ordnary Leat Square etmator ˆβ j (j, of the regreon coeffcent βj (j, n the mple

More information

Variable Structure Control ~ Basics

Variable Structure Control ~ Basics Varable Structure Control ~ Bac Harry G. Kwatny Department of Mechancal Engneerng & Mechanc Drexel Unverty Outlne A prelmnary example VS ytem, ldng mode, reachng Bac of dcontnuou ytem Example: underea

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

The multivariate Gaussian probability density function for random vector X (X 1,,X ) T. diagonal term of, denoted

The multivariate Gaussian probability density function for random vector X (X 1,,X ) T. diagonal term of, denoted Appendx Proof of heorem he multvarate Gauan probablty denty functon for random vector X (X,,X ) px exp / / x x mean and varance equal to the th dagonal term of, denoted he margnal dtrbuton of X Gauan wth

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

The Essential Dynamics Algorithm: Essential Results

The Essential Dynamics Algorithm: Essential Results @ MIT maachuett nttute of technology artfcal ntellgence laboratory The Eental Dynamc Algorthm: Eental Reult Martn C. Martn AI Memo 003-014 May 003 003 maachuett nttute of technology, cambrdge, ma 0139

More information

Discrete Simultaneous Perturbation Stochastic Approximation on Loss Function with Noisy Measurements

Discrete Simultaneous Perturbation Stochastic Approximation on Loss Function with Noisy Measurements 0 Amercan Control Conference on O'Farrell Street San Francco CA USA June 9 - July 0 0 Dcrete Smultaneou Perturbaton Stochatc Approxmaton on Lo Functon wth Noy Meaurement Q Wang and Jame C Spall Abtract

More information

Extended Prigogine Theorem: Method for Universal Characterization of Complex System Evolution

Extended Prigogine Theorem: Method for Universal Characterization of Complex System Evolution Extended Prgogne Theorem: Method for Unveral Characterzaton of Complex Sytem Evoluton Sergey amenhchkov* Mocow State Unverty of M.V. Lomonoov, Phycal department, Rua, Mocow, Lennke Gory, 1/, 119991 Publhed

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Pythagorean triples. Leen Noordzij.

Pythagorean triples. Leen Noordzij. Pythagorean trple. Leen Noordz Dr.l.noordz@leennoordz.nl www.leennoordz.me Content A Roadmap for generatng Pythagorean Trple.... Pythagorean Trple.... 3 Dcuon Concluon.... 5 A Roadmap for generatng Pythagorean

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling Ian Davd Lockhart Bogle and Mchael Farweather (Edtor), Proceedng of the 22nd European Sympoum on Computer Aded Proce Engneerng, 17-2 June 212, London. 212 Elever B.V. All rght reerved. Soluton Method for

More information

Small signal analysis

Small signal analysis Small gnal analy. ntroducton Let u conder the crcut hown n Fg., where the nonlnear retor decrbed by the equaton g v havng graphcal repreentaton hown n Fg.. ( G (t G v(t v Fg. Fg. a D current ource wherea

More information

Method Of Fundamental Solutions For Modeling Electromagnetic Wave Scattering Problems

Method Of Fundamental Solutions For Modeling Electromagnetic Wave Scattering Problems Internatonal Workhop on MehFree Method 003 1 Method Of Fundamental Soluton For Modelng lectromagnetc Wave Scatterng Problem Der-Lang Young (1) and Jhh-We Ruan (1) Abtract: In th paper we attempt to contruct

More information

A NUMERICAL MODELING OF MAGNETIC FIELD PERTURBATED BY THE PRESENCE OF SCHIP S HULL

A NUMERICAL MODELING OF MAGNETIC FIELD PERTURBATED BY THE PRESENCE OF SCHIP S HULL A NUMERCAL MODELNG OF MAGNETC FELD PERTURBATED BY THE PRESENCE OF SCHP S HULL M. Dennah* Z. Abd** * Laboratory Electromagnetc Sytem EMP BP b Ben-Aknoun 606 Alger Algera ** Electronc nttute USTHB Alger

More information

A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS

A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS UPB Sc Bull, Sere A, Vol 77, I, 5 ISSN 3-77 A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS Andre-Hora MOGOS, Adna Magda FLOREA Semantc web ervce repreent

More information

Introduction to Interfacial Segregation. Xiaozhe Zhang 10/02/2015

Introduction to Interfacial Segregation. Xiaozhe Zhang 10/02/2015 Introducton to Interfacal Segregaton Xaozhe Zhang 10/02/2015 Interfacal egregaton Segregaton n materal refer to the enrchment of a materal conttuent at a free urface or an nternal nterface of a materal.

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

On the SO 2 Problem in Thermal Power Plants. 2.Two-steps chemical absorption modeling

On the SO 2 Problem in Thermal Power Plants. 2.Two-steps chemical absorption modeling Internatonal Journal of Engneerng Reearch ISSN:39-689)(onlne),347-53(prnt) Volume No4, Iue No, pp : 557-56 Oct 5 On the SO Problem n Thermal Power Plant Two-tep chemcal aborpton modelng hr Boyadjev, P

More information

Root Locus Techniques

Root Locus Techniques Root Locu Technque ELEC 32 Cloed-Loop Control The control nput u t ynthezed baed on the a pror knowledge of the ytem plant, the reference nput r t, and the error gnal, e t The control ytem meaure the output,

More information

ENTROPY BOUNDS USING ARITHMETIC- GEOMETRIC-HARMONIC MEAN INEQUALITY. Guru Nanak Dev University Amritsar, , INDIA

ENTROPY BOUNDS USING ARITHMETIC- GEOMETRIC-HARMONIC MEAN INEQUALITY. Guru Nanak Dev University Amritsar, , INDIA Internatonal Journal of Pure and Appled Mathematc Volume 89 No. 5 2013, 719-730 ISSN: 1311-8080 prnted veron; ISSN: 1314-3395 on-lne veron url: http://.jpam.eu do: http://dx.do.org/10.12732/jpam.v895.8

More information

The Price of Anarchy in a Network Pricing Game

The Price of Anarchy in a Network Pricing Game The Prce of Anarchy n a Network Prcng Game John Muaccho and Shuang Wu Abtract We analyze a game theoretc model of competng network ervce provder that trategcally prce ther ervce n the preence of elatc

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Improvements on Waring s Problem

Improvements on Waring s Problem Imrovement on Warng Problem L An-Png Bejng 85, PR Chna al@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th aer, we wll gve ome mrovement for Warng roblem Keyword: Warng Problem, Hardy-Lttlewood

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

bounds compared to SB and SBB bounds as the former two have an index parameter, while the latter two

bounds compared to SB and SBB bounds as the former two have an index parameter, while the latter two 1 Queung Procee n GPS and PGPS wth LRD Traffc Input Xang Yu, Ian L-Jn Thng, Yumng Jang and Chunmng Qao Department of Computer Scence and Engneerng State Unverty of New York at Buffalo Department of Electrcal

More information

Multiple-objective risk-sensitive control and its small noise limit

Multiple-objective risk-sensitive control and its small noise limit Avalable onlne at www.cencedrect.com Automatca 39 (2003) 533 541 www.elever.com/locate/automatca Bref Paper Multple-objectve rk-entve control and t mall noe lmt Andrew E.B. Lm a, Xun Yu Zhou b; ;1, John

More information

Module 5. Cables and Arches. Version 2 CE IIT, Kharagpur

Module 5. Cables and Arches. Version 2 CE IIT, Kharagpur odule 5 Cable and Arche Veron CE IIT, Kharagpur Leon 33 Two-nged Arch Veron CE IIT, Kharagpur Intructonal Objectve: After readng th chapter the tudent wll be able to 1. Compute horzontal reacton n two-hnged

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

APPROXIMATE FUZZY REASONING BASED ON INTERPOLATION IN THE VAGUE ENVIRONMENT OF THE FUZZY RULEBASE AS A PRACTICAL ALTERNATIVE OF THE CLASSICAL CRI

APPROXIMATE FUZZY REASONING BASED ON INTERPOLATION IN THE VAGUE ENVIRONMENT OF THE FUZZY RULEBASE AS A PRACTICAL ALTERNATIVE OF THE CLASSICAL CRI Kovác, Sz., Kóczy, L.T.: Approxmate Fuzzy Reaonng Baed on Interpolaton n the Vague Envronment of the Fuzzy Rulebae a a Practcal Alternatve of the Clacal CRI, Proceedng of the 7 th Internatonal Fuzzy Sytem

More information

OPTIMAL COMPUTING BUDGET ALLOCATION FOR MULTI-OBJECTIVE SIMULATION MODELS. David Goldsman

OPTIMAL COMPUTING BUDGET ALLOCATION FOR MULTI-OBJECTIVE SIMULATION MODELS. David Goldsman Proceedng of the 004 Wnter Smulaton Conference R.G. Ingall, M. D. Roett, J. S. Smth, and B. A. Peter, ed. OPTIMAL COMPUTING BUDGET ALLOCATION FOR MULTI-OBJECTIVE SIMULATION MODELS Loo Hay Lee Ek Peng Chew

More information

Separation Axioms of Fuzzy Bitopological Spaces

Separation Axioms of Fuzzy Bitopological Spaces IJCSNS Internatonal Journal of Computer Scence and Network Securty VOL3 No October 3 Separaton Axom of Fuzzy Btopologcal Space Hong Wang College of Scence Southwet Unverty of Scence and Technology Manyang

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Optimal inference of sameness Supporting information

Optimal inference of sameness Supporting information Optmal nference of amene Supportng nformaton Content Decon rule of the optmal oberver.... Unequal relablte.... Equal relablte... 5 Repone probablte of the optmal oberver... 6. Equal relablte... 6. Unequal

More information

STOCHASTIC BEHAVIOUR OF COMMUNICATION SUBSYSTEM OF COMMUNICATION SATELLITE

STOCHASTIC BEHAVIOUR OF COMMUNICATION SUBSYSTEM OF COMMUNICATION SATELLITE IJS 4 () July Sharma & al ehavour of Subytem of ommuncaton Satellte SOHSI HVIOU O OMMUNIION SUSYSM O OMMUNIION SLLI SK Mttal eepankar Sharma & Neelam Sharma 3 S he author n th paper have dcued the tochatc

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

A New Virtual Indexing Method for Measuring Host Connection Degrees

A New Virtual Indexing Method for Measuring Host Connection Degrees A New Vrtual Indexng Method for Meaurng ot Connecton Degree Pnghu Wang, Xaohong Guan,, Webo Gong 3, and Don Towley 4 SKLMS Lab and MOE KLINNS Lab, X an Jaotong Unverty, X an, Chna Department of Automaton

More information

Confidence intervals for the difference and the ratio of Lognormal means with bounded parameters

Confidence intervals for the difference and the ratio of Lognormal means with bounded parameters Songklanakarn J. Sc. Technol. 37 () 3-40 Mar.-Apr. 05 http://www.jt.pu.ac.th Orgnal Artcle Confdence nterval for the dfference and the rato of Lognormal mean wth bounded parameter Sa-aat Nwtpong* Department

More information

Foresighted Resource Reciprocation Strategies in P2P Networks

Foresighted Resource Reciprocation Strategies in P2P Networks Foreghted Reource Recprocaton Stratege n PP Networ Hyunggon Par and Mhaela van der Schaar Electrcal Engneerng Department Unverty of Calforna Lo Angele (UCLA) Emal: {hgpar mhaela@ee.ucla.edu Abtract We

More information

BOUNDARY ELEMENT METHODS FOR VIBRATION PROBLEMS. Ashok D. Belegundu Professor of Mechanical Engineering Penn State University

BOUNDARY ELEMENT METHODS FOR VIBRATION PROBLEMS. Ashok D. Belegundu Professor of Mechanical Engineering Penn State University BOUNDARY ELEMENT METHODS FOR VIBRATION PROBLEMS by Aho D. Belegundu Profeor of Mechancal Engneerng Penn State Unverty ahobelegundu@yahoo.com ASEE Fello, Summer 3 Colleague at NASA Goddard: Danel S. Kaufman

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Curve Fitting with the Least Square Method

Curve Fitting with the Least Square Method WIKI Document Number 5 Interpolaton wth Least Squares Curve Fttng wth the Least Square Method Mattheu Bultelle Department of Bo-Engneerng Imperal College, London Context We wsh to model the postve feedback

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

m = 4 n = 9 W 1 N 1 x 1 R D 4 s x i

m = 4 n = 9 W 1 N 1 x 1 R D 4 s x i GREEDY WIRE-SIZING IS LINEAR TIME Chr C. N. Chu D. F. Wong cnchu@c.utexa.edu wong@c.utexa.edu Department of Computer Scence, Unverty of Texa at Autn, Autn, T 787. ABSTRACT In nterconnect optmzaton by wre-zng,

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

AP Statistics Ch 3 Examining Relationships

AP Statistics Ch 3 Examining Relationships Introducton To tud relatonhp between varable, we mut meaure the varable on the ame group of ndvdual. If we thnk a varable ma eplan or even caue change n another varable, then the eplanator varable and

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Communication on the Paper A Reference-Dependent Regret Model for. Deterministic Tradeoff Studies

Communication on the Paper A Reference-Dependent Regret Model for. Deterministic Tradeoff Studies Councaton on the Paper A Reference-Dependent Regret Model for Deterntc Tradeoff tude Xaotng Wang, Evangelo Trantaphyllou 2,, and Edouard Kuawk 3 Progra of Engneerng cence College of Engneerng Louana tate

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

2.3 Least-Square regressions

2.3 Least-Square regressions .3 Leat-Square regreon Eample.10 How do chldren grow? The pattern of growth vare from chld to chld, o we can bet undertandng the general pattern b followng the average heght of a number of chldren. Here

More information

DEADLOCK INDEX ANALYSIS OF MULTI-LEVEL QUEUE SCHEDULING IN OPERATING SYSTEM USING DATA MODEL APPROACH

DEADLOCK INDEX ANALYSIS OF MULTI-LEVEL QUEUE SCHEDULING IN OPERATING SYSTEM USING DATA MODEL APPROACH GESJ: Computer Scence and Telecommuncaton 2 No.(29 ISSN 2-232 DEADLOCK INDEX ANALYSIS OF MULTI-LEVEL QUEUE SCHEDULING IN OPERATING SYSTEM USING DATA MODEL APPROACH D. Shukla, Shweta Ojha 2 Deptt. of Mathematc

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Modeling of Wave Behavior of Substrate Noise Coupling for Mixed-Signal IC Design

Modeling of Wave Behavior of Substrate Noise Coupling for Mixed-Signal IC Design Modelng of Wave Behavor of Subtrate Noe Couplng for Mxed-Sgnal IC Degn Georgo Veron, Y-Chang Lu, and Robert W. Dutton Center for Integrated Sytem, Stanford Unverty, Stanford, CA 9435 yorgo@gloworm.tanford.edu

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

j=0 s t t+1 + q t are vectors of length equal to the number of assets (c t+1 ) q t +1 + d i t+1 (1) (c t+1 ) R t+1 1= E t β u0 (c t+1 ) R u 0 (c t )

j=0 s t t+1 + q t are vectors of length equal to the number of assets (c t+1 ) q t +1 + d i t+1 (1) (c t+1 ) R t+1 1= E t β u0 (c t+1 ) R u 0 (c t ) 1 Aet Prce: overvew Euler equaton C-CAPM equty premum puzzle and rk free rate puzzle Law of One Prce / No Arbtrage Hanen-Jagannathan bound reoluton of equty premum puzzle Euler equaton agent problem X

More information

A Hybrid Evolution Algorithm with Application Based on Chaos Genetic Algorithm and Particle Swarm Optimization

A Hybrid Evolution Algorithm with Application Based on Chaos Genetic Algorithm and Particle Swarm Optimization Natonal Conference on Informaton Technology and Computer Scence (CITCS ) A Hybrd Evoluton Algorthm wth Applcaton Baed on Chao Genetc Algorthm and Partcle Swarm Optmzaton Fu Yu School of Computer & Informaton

More information

728. Mechanical and electrical elements in reduction of vibrations

728. Mechanical and electrical elements in reduction of vibrations 78. Mechancal and electrcal element n reducton of vbraton Katarzyna BIAŁAS The Slean Unverty of Technology, Faculty of Mechancal Engneerng Inttute of Engneerng Procee Automaton and Integrated Manufacturng

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

A Computational Method for Solving Two Point Boundary Value Problems of Order Four

A Computational Method for Solving Two Point Boundary Value Problems of Order Four Yoge Gupta et al, Int. J. Comp. Tec. Appl., Vol (5), - ISSN:9-09 A Computatonal Metod for Solvng Two Pont Boundary Value Problem of Order Four Yoge Gupta Department of Matematc Unted College of Engg and

More information

MODELLING OF STOCHASTIC PARAMETERS FOR CONTROL OF CITY ELECTRIC TRANSPORT SYSTEMS USING EVOLUTIONARY ALGORITHM

MODELLING OF STOCHASTIC PARAMETERS FOR CONTROL OF CITY ELECTRIC TRANSPORT SYSTEMS USING EVOLUTIONARY ALGORITHM MODELLING OF STOCHASTIC PARAMETERS FOR CONTROL OF CITY ELECTRIC TRANSPORT SYSTEMS USING EVOLUTIONARY ALGORITHM Mkhal Gorobetz, Anatoly Levchenkov Inttute of Indutral Electronc and Electrotechnc, Rga Techncal

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

and decompose in cycles of length two

and decompose in cycles of length two Permutaton of Proceedng of the Natona Conference On Undergraduate Reearch (NCUR) 006 Domncan Unverty of Caforna San Rafae, Caforna Apr - 4, 007 that are gven by bnoma and decompoe n cyce of ength two Yeena

More information

P exp(tx) = 1 + t 2k M 2k. k N

P exp(tx) = 1 + t 2k M 2k. k N 1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

DUE: WEDS FEB 21ST 2018

DUE: WEDS FEB 21ST 2018 HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant

More information

TCOM 501: Networking Theory & Fundamentals. Lecture 7 February 25, 2003 Prof. Yannis A. Korilis

TCOM 501: Networking Theory & Fundamentals. Lecture 7 February 25, 2003 Prof. Yannis A. Korilis TCOM 501: Networkng Theory & Fundamentals Lecture 7 February 25, 2003 Prof. Yanns A. Korls 1 7-2 Topcs Open Jackson Networks Network Flows State-Dependent Servce Rates Networks of Transmsson Lnes Klenrock

More information

Adaptive Centering with Random Effects in Studies of Time-Varying Treatments. by Stephen W. Raudenbush University of Chicago.

Adaptive Centering with Random Effects in Studies of Time-Varying Treatments. by Stephen W. Raudenbush University of Chicago. Adaptve Centerng wth Random Effect n Stde of Tme-Varyng Treatment by Stephen W. Radenbh Unverty of Chcago Abtract Of wdepread nteret n ocal cence are obervatonal tde n whch entte (peron chool tate contre

More information

MODELLING OF TRANSIENT HEAT TRANSPORT IN TWO-LAYERED CRYSTALLINE SOLID FILMS USING THE INTERVAL LATTICE BOLTZMANN METHOD

MODELLING OF TRANSIENT HEAT TRANSPORT IN TWO-LAYERED CRYSTALLINE SOLID FILMS USING THE INTERVAL LATTICE BOLTZMANN METHOD Journal o Appled Mathematc and Computatonal Mechanc 7, 6(4), 57-65 www.amcm.pcz.pl p-issn 99-9965 DOI:.75/jamcm.7.4.6 e-issn 353-588 MODELLING OF TRANSIENT HEAT TRANSPORT IN TWO-LAYERED CRYSTALLINE SOLID

More information