Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information

Size: px

Start display at page:

Download "Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information"

Elisabeth Reeves
5 years ago
Views:

1 Disribued Ficiious Play for Opimal Behavior of Muli-Agen Sysems wih Incomplee Informaion Ceyhun Eksin and Alejandro Ribeiro arxiv: v [cs.g] 5 Feb 206 Absrac A muli-agen sysem operaes in an uncerain environmen abou which agens have differen and ime varying beliefs ha, as ime progresses, converge o a common belief. A global uiliy funcion ha depends on he realized sae of he environmen and acions of all he agens deermines he sysem s opimal behavior. We define he asympoically opimal acion profile as an equilibrium of he poenial game defined by considering he expeced uiliy wih respec o he asympoic belief. A finie ime, however, agens have no enirely congruous beliefs abou he sae of he environmen and may selec conflicing acions. his paper proposes a variaion of he ficiious play algorihm which is proven o converge o equilibrium acions if he sae beliefs converge o a common disribuion a a rae ha is a leas linear. In convenional ficiious play, agens build beliefs on ohers fuure behavior by compuing hisograms of pas acions and bes respond o heir expeced payoffs inegraed wih respec o hese hisograms. In he variaions developed here hisograms are buil using knowledge of acions aken by nearby nodes and bes responses are furher inegraed wih respec o he local beliefs on he sae of he environmen. We exemplify he use of he algorihm in coordinaion and arge covering games. I. INRODUCION Our model of a muli-agen auonomous sysem encompasses an underlying environmen, knowledge abou he sae of he environmen ha he agens acquire, and a sae dependen global objecive ha agens affec hrough heir individual acions. he opimal acion profile maximizes his global objecive for he realized environmen s sae wih he opimal acion of an agen given by he corresponding acion in he profile. he problem addressed in his paper is he deerminaion of suiable acions when he probabiliy disribuions ha agens have on he sae of he environmen are possibly differen. hese no enirely congruous beliefs resul in mismaches beween he acion profiles ha differen agens deem o be opimal. As a consequence, when a given agen chooses an acion o execue, i is imporan for i o reason abou wha he beliefs of oher agens may be and wha are he consequen acions ha oher agens may ake. We propose a soluion based on he consrucion of empirical hisograms of pas acions and he use of bes responses o he uiliy expecaion wih respec o hese hisograms and he sae belief. his algorihmic behavior is shown o be asympoically opimal in he sense ha if agens move Work suppored by he NSF award CAREER CCF and he AFOSR MURI FA C. Eksin is wih he School of Elecrical and Compuer Engineering and School of Biology a he Georgia Insiue of echnology, Alana, GA A. Ribeiro is wih he Deparmen of Elecrical and Sysems Engineering, Universiy of Pennsylvania, Philadelphia, PA ceyhuneksin@gaech.edu. owards a common belief, he acions hey selec are opimal wih respec o he corresponding expeced uiliy. While he deerminaion of opimal behavior in muliagen sysems can be considered from differen perspecives, he caegorizaion beween sysems wih complee and incomplee informaion is mos germane o his paper []. In sysems of complee informaion he environmen is eiher perfecly known or all agens have idenical beliefs. In eiher case, agens can compue he opimal acion profile, and deermine and execue heir corresponding opimal acion. his local compuaion of global soluions is neiher scalable nor robus bu i can be used as an absrac definiion of opimal behavior. his absracion renders he problem equivalen o he developmen of disribued mehodologies o solve opimizaion problems [2] [5], or, more generically, o he deerminaion of Nash equilibria of muliplayer games [6] []. When informaion is incomplee, he fac ha agens have differen beliefs implies ha hey may end up choosing compeing acions even if hey are inen on cooperaing. In a sense, agens are compeing agains uncerainy, bu he manifesaion of ha compeiion is in he form of conflicing ineress arising from cooperaing agens. In his inheren compeiion Bayesian Nash equilibria are he inrinsic mahemaical formulaion of opimal behavior [2], [3]. However, deerminaion of hese equilibria is compuaionally inracable excep for games wih simple beliefs and uiliies [4], [5]. If deerminaion of Bayesian equilibria is inracable, he developmen of approximae mehods is necessary. In fac, deermining game equilibria is also challenging in games of complee informaion. his has moivaed he developmen of ieraive mehods o learn regular as opposed o Bayesian equilibrium acions [], [6]. Of relevance o his paper is he ficiious play algorihm in which agens build beliefs on ohers fuure behavior by compuing hisograms of pas acions and bes respond o heir expeced payoffs inegraed wih respec o hese hisograms [7]. When informaion is complee, ficiious play converges o equilibria of zero sum [6], some oher specific games wih wo players [8], and muliplayer games wih aligned ineress as deermined by a poenial funcion [6]. Recen variaions of ficiious play have been developed o expand he class of games for which equilibria can be compued [9], [9] and o guaranee convergence o equilibria wih specific properies [8], [20], [2]. Recenly, a varian of he ficiious play ha operaes in a disribued seing where agens observe relevan informaion from agens ha are adjacen in a nework is shown o converge in poenial games [7].

2 2 In his paper, we consider a nework of agens wih aligned ineress. Agens have differen and ime varying beliefs on he sae of he environmen ha, as ime progresses, converge o a common belief. he asympoic opimal behavior is herefore formulaed as he Nash equilibria of he poenial game defined by he expeced uiliy wih respec o he asympoic belief. he goal is o design a disribued mechanism ha converges o a Nash equilibrium of his asympoic game (Secion II). he soluions ha we propose are variaions of he ficiious play algorihm ha ake ino accoun he disribued naure of he muli-agen sysem and he fac ha he sae of he environmen is no perfecly known. In a game of incomplee informaion, expeced payoff compuaion in ficiious play consiss of inegraing he payoff wih respec o boh he local belief on he sae of he environmen and he local beliefs on he behavior of oher agens. In a neworked seing only local pas hisories can be available and agens need o reason abou he behavior of non-neighboring agens based on pas observaions of is neighbors only. In poenial games wih symmeric payoffs, which are known o admi consensus Nash equilibria, we le agens share heir acions wih heir neighbors, consruc empirical hisograms of he acions aken by neighbors, and bes respond assuming ha all agens follow he average populaion empirical disribuion (Secion III). his mechanism is shown o converge o a consensus Nash equilibrium when he convergence of individual beliefs on he sae of he environmen is a leas of linear order (heorem ). When he poenial game is no necessarily symmeric, agens share heir empirical beliefs on he behavior of oher agens wih neighbors in addiion o heir own acions. Agens keep hisograms on he behavior of ohers by averaging he hisograms of neighbors wih heir own (Secion IV). Convergence o a Nash equilibrium follows under he same linear convergence assumpion for he beliefs on he sae of he environmen (heorem 2). We numerically analyze he ransien and asympoic equilibrium properies of he algorihms in he beauy cones and he arge covering games (Secion V). In he beauy cones game, a eam of robos radeoffs beween moving oward a arge direcion and moving in coordinaion wih each oher. In he arge covering game, a eam of robos coordinaes o cover a given se of arges and receive payoffs from covering a arge ha is inversely proporional o he disance o heir posiions. We observe ha Nash equilibrium sraegies are successfully deermined in boh cases. Noaion: For any finie se X, we use (X) o denoe he space of probabiliy disribuions over X. For a generic vecor x X n, x i denoes he ih elemen and x i denoes he vecor of elemens of x excep he ih elemen, ha is, x i = (x,..., x i, x i+,..., x n ). We use o denoe he Euclidean norm of a space. n denoes a n column vecor of all ones. II. OPIMAL BEHAVIOR OF MULI-AGEN SYSEMS WIH INCOMPLEE INFORMAION We consider a group of n agens i N := {,..., n} ha play a sage game over a discree ime index wih simulaneous moves and incomplee informaion. he imporan feaures of his game are he acions a i ha agens ake a ime, an underlying unknown sae of he world θ, local uiliy funcions u i (a, θ) ha deermine he payoffs ha differen agens receive when he group plays he join acion a := {a,..., a n } and he sae of he world is θ, and ime varying local beliefs µ i ha assign probabiliies o differen realizaions of he sae of he world. We assume ha he sae of he world is chosen by naure from a space Θ and ha acions are chosen from a common, finie, and ime invarian se so ha we have a i A := {,..., m} for all imes and agens i. Local payoffs are hen formally defined as funcions u i : A n Θ R. o emphasize he global dependence of local payoffs we wrie payoff values as u i (a, θ) = u i (a i, a i, θ) where, we recall, a i := {a j } j i collecs he acions of all agens excep i. We assume ha he uiliy values u i (a, θ) are finie for all acions a and sae realizaions θ. he beliefs µ i (Θ) are probabiliy disribuions on he space Θ. In general, we allow agen i o mainain a mixed sraegy σ i (A) defined as a probabiliy disribuion on he acion space A such ha σ i (a i ) is he probabiliy ha i plays acion a i A. he join mixed sraegy profile σ := {σ,..., σ n } n (A) is defined as he produc disribuion of all individual sraegies and he mixed sraegy of all agens excep i is wrien as σ i := {σ j } j i n (A). he uiliy associaed wih he join mixed sraegy profile σ is hen given by u i (σ, θ) = u i (σ i, σ i, θ) = u i (a, θ)σ(a). () a A n We furher assume ha here exiss a global poenial funcion u : A n θ R aking values u(a, θ) such ha for all pairs of acion profiles a = {a i, a i } and a = {a i, a i}, sae realizaions θ Θ, and agens i, he local payoffs saisfy u i (a i, a i, θ) u i (a i, a i, θ) = u(a i, a i, θ) u(a i, a i, θ). (2) he exisence of he poenial funcion u is a saemen of aligned ineress because for a given θ he join acion ha maximizes u is a pure Nash equilibrium sraegy of he game defined by he u i uiliies [6]. he moivaion for considering a poenial game is o model a sysem in which agens play o achieve he game equilibrium in a disribued manner and end up finding he acion a ha would be seleced by a cenral coordinaion agen ha maximizes he global payoff u. We emphasize, however, ha he game may have oher equilibria ha are no opimal. he fundamenal problem addressed in his paper is ha he sae of he world θ is unknown o he agens and ha differen agens have differen beliefs µ i on he sae of he world. As a consequence, payoffs u i (a, θ) canno be evaluaed bu esimaed and, moreover, esimaes of differen

3 3 agens are differen. o explain he implicaions of his laer observaion consider he opposie siuaion in which he agens aggregae heir individual beliefs in a common belief µ. In ha case, he payoff esimae u i (σ; µ) := u i (σ, θ) dµ, (3) θ Θ can be evaluaed by all agens if we assume ha he payoff funcions u i are known globally. Assuming global knowledge of payoffs is no always reasonable and i is desirable o devise mechanisms where agens operae wihou access o he payoff funcions of oher agens; see, e.g., [7]. Sill, an imporan implicaion of considering he payoffs in (3) is ha opimal behavior is characerized by Nash equilibria. Specifically, for he game defined by he uiliies in (3), a Nash equilibrium a ime is a sraegy profile σ such ha no agen has an ineres o deviae unilaerally given he common belief µ. I.e., a sraegy σ = {σi, σ i } such ha for all agens i i holds u i (σ i, σ i; µ) u i (σ i, σ i; µ), (4) for all oher possible sraegies σ i. Given he exisence of he poenial funcion u as saed in (2), he Nash equilibrium in (4) wih he aggregae belief µ is a proxy for he maximizaion of he global payoff u(a; µ) := u(a, θ) dµ. θ Θ In ha sense, i represens he bes acion ha he agens could collecively ake if hey all had access o common informaion. For fuure reference we use Γ(µ) o represen he game wih players N, acion space A and payoffs u i (a; µ), Γ(µ) := {N, A n, u i (a; µ)}. (5) he game Γ(µ) is said o have complee informaion. When agens have differen beliefs, he equilibrium sraegies of (4) canno be used as a arge behavior because agens lack he abiliy o deermine if a sraegy σ i ha hey may choose saisfies (4) or no. Indeed, while he complee informaion game serves as an omniscien reference, agens can only evaluae heir expeced payoffs wih respec o heir local beliefs µ i, u i (σ; µ i ) = u i (σ i, σ i ; µ i ) = u i (σ i, σ i, θ) dµ i. θ Θ (6) Comparing (3) and (6) we see ha he fundamenal problem of having differen beliefs µ i a differen agens is ha i lacks informaion needed o evaluae he expeced payoff u j (σ; µ j ) of agen j and, for ha reason, he game is said o have incomplee informaion. A way o circumven his lack of informaion is for agen i o keep a belief νj i on he sraegy profile of player j. If we group hese beliefs o define he join belief ν i i := {νi j } j i ha agen i has on he acions of ohers, i follows ha agen i can evaluae he payoff he can expec of differen sraegies σ i as u i (σ i, ν i; i µ i ) = u i (σ i, ν i, i θ) dµ i. (7) θ Θ I is hen naural o sugges ha agen i should choose he sraegy σ i ha maximizes he expeced payoff in (7). Such sraegy can always be chosen o be an individual acion ha is ermed he bes response o he beliefs ν i i on he acions of ohers and he belief µ i on he sae of he world, a i argmax u i (a i, ν i; i µ i ). (8) a i A For fuure reference we also define he corresponding bes expeced uiliy ha agen i expecs o obain a ime by playing he bes response acion in (8), v i (ν i i; µ i ) := max a i A u i(a i, ν i i; µ i ). (9) We emphasize ha v i (ν i i ; µ i) is no he uiliy acually aained by agen i a ime. ha uiliy depends on he acual sae of he world and he acions acually aken by ohers and is explicily given by u i (a i, a i, θ). In his paper we consider agens ha selec bes response acions as in (8) and focus on designing decenralized mechanisms o consruc he beliefs νj i on he acions of ohers so ha he acions a i aain desirable properies. In paricular, we assume ha here is an underlying sae learning process so ha he local beliefs µ i on he sae of he world converge o a common belief µ in erms of oal variaion. I.e., we suppose ha V(µ i, µ) = 0 for all i N, (0) where he oal variaion disance V(µ i, µ) := sup B B(Θ) µ i (B) µ(b) beween disribuions µ i and µ is defined as he maximum absolue difference beween he respecive probabiliies assigned o elemens B of he Borel se B(Θ) of he space Θ. he desirable propery ha we ask of he process ha builds he beliefs νj i on he acions of ohers is ha he acions a i approach one of he Nash equilibrium sraegies defined by (4) as he disribuions µ i converge o he common disribuion µ. he learning process ha we propose for he beliefs νj i is based on building empirical hisograms of pas plays as we explain in secions III and IV. We will show in secions III-A and IV-A ha his procedure yields bes responses ha approach a Nash equilibrium as long as he convergence of µ i o µ is sufficienly fas. We pursue his developmens afer a perinen remark. Remark he game of incomplee informaion defined by he payoffs in (6) has equilibria ha do no necessarily coincide wih he equilibria of he complee informaion game Γ(µ). I is easy o hink ha he bes response a i in (8) yields he bes possible uiliy u i for agen i. In fac, i is possible for agen i o do beer by reasoning ha oher agens are also playing bes response o heir beliefs. Sraegies ha yield an equilibrium poin of his sraegic reasoning are defined as he Bayesian Nash equilibria of he incomplee informaion game see [3], [5] for a formal definiion. We uilize he bes responses in (8) because deermining Bayesian Nash equilibria requires global knowledge of payoffs and informaion srucures.

4 4 a 8 8 a 7 a θ 5 a 4 a a 3 a 2 ˆ i, µ i Fig.. Disribued ficiious play wih observaion of neighbors acions. Agens form beliefs on he sraegies of ohers by keeping a hisogram of he average empirical disribuion using he observaions of acions of heir neighbors. E.g., Agen updaes an esimae ˆ i of he average empirical disribuion by observing he acions a 2, a 3, a 4, a 5 played by agens 2, 3, 4, and 5 a ime [cf. (6)]. I hen selecs he bes response in (8) which assumes all oher agens are playing wih respec o he mixed sraegy νj i = ˆ i given is belief µ i on he sae θ. III. DISRIBUED FICIIOUS PLAY IN SYMMERIC POENIAL GAMES We begin by considering he paricular case of symmeric poenial games o illusrae conceps, mehods, and proof echniques. In a symmeric game, agens payoffs are permuaion invarian in ha we have u i (a i, a j, a i\j, θ) = u j (a j, a i, a j\i, θ) for all pairs of agens i and j. I follows from his assumpion ha he game admis a leas one consensus Nash equilibrium sraegy and ha, as a consequence, we can uilize a variaion of ficiious play [6], [7] in which agens form beliefs on he acions of ohers by keeping a hisogram of acions hey have seen aken by oher agens in pas plays. Formally, le f i R m denoe he empirical hisogram of acions aken by i unil ime and define he vecor indicaor funcion Ψ(a i ) = [Ψ (a i ),..., Ψ m (a i )] : A {0, } m such ha he kh componen is Ψ k (a i ) = if and only if a i = k and Ψ k (a i ) = 0 oherwise. Given he definiion of he vecor indicaor funcion Ψ(a i ) i follows ha he empirical disribuion f i of acions aken by i up unil ime > is f i := Ψ(a is ). () s= he expression in () is simply a vecor arrangemen of he average number of imes ha each of he m possible plays k {,..., m} has been chosen by i. Since he game, being symmeric, admis a leas one symmeric Nash equilibrium, he hisogram of he empirical play of he populaion as a whole is also of ineres. Using he definiion of he vecor indicaor Ψ(a i ) his empirical disribuion is wrien as := [ ] Ψ(a is ). (2) n s= In convenional ficiious play, agens play bes responses o he composie empirical disribuion in (2). Here, however, we assume ha agens are par of a conneced nework G wih node se N and edge se E. Agen i can only inerac wih neighboring agens j N i := {j N : (j, i) E}. A ime, he acions of neighboring agens j N i become known o agen i eiher hrough explici communicaion or implici observaion. In his seing, agen i canno keep rack of he empirical disribuion in (2) because i only observes is neighbors acions a Ni := {a j : j N i } see Fig.. Wha is possible for agen i o compue is an esimae of (2) uilizing he informaion i has available. his esimae is buil by averaging he plays of neighbors so ha if we wrie i s esimae of as ˆ i i follows ha for 2 i holds ˆ i = N i [ ] Ψ(a js ). (3) j N i s= In he disribued ficiious play algorihm considered here, agen i has access o he sae belief µ i and he esimae on he average empirical disribuion ˆ i in (3). Agen i proceeds o selec he bes response acion a i ha maximizes is expeced payoff [cf. (8)] assuming ha all oher agens play wih respec o he esimaed average empirical disribuion. I.e., he acion played by agen i is compued as per (8) wih νj i = ˆ i for all j i. Observe ha compuaion of he hisograms in () - (3) does no require keeping he hisory of pas plays a is for s <. Indeed, he empirical disribuion f i in () can be expressed recursively as f i+ = f i + ) (Ψ(a i ) f i, (4) Likewise, we can also wrie he populaion s empirical disribuion in (2) recursively as + = + [ Ψ(a j ) n ], (5) j= and he same rearrangemen permis wriing he esimae ˆ i ha i keeps of he populaion s empirical disribuion as he recursion ˆ i + = ˆ i + [ N i j N i Ψ(a j ) ˆ i ]. (6) In all hree cases he recursions are valid for and we have o make f i = = ˆ i = 0 for he recursions in (4) - (6) o be equivalen o () - (3). However, subsequen convergence analyses rely on (4) - (6) and allow for arbirary iniial disribuions ˆ i. his is imporan because, in general, agen i may have side informaion on he acions a j ha oher agens are o chose a ime =. We can now summarize he behavior of agen i as follows:

5 5 (i) A ime updae he sae s belief o µ i. (ii) Play he bes response acion a i in (8) wih ν i j = ˆ i for all j i, (iii) Learn he acions a i of neighbors, eiher hrough observaion or communicaion, and updae ˆ i + as per (6) wih he empirical hisogram ˆ i iniialized o an arbirary disribuion. We show in he following ha when all agens follow his behavior, heir sraegies converge o a consensus Nash equilibrium of he symmeric poenial game Γ(µ) [cf. (5)]. A. Convergence Game equilibria are no unique in general. Consider hen he sage game Γ(µ) wih a given common belief µ on he sae of he world θ. he he se of Nash equilibria of he game Γ(µ) conains all he sraegies ha saisfy (4), K(µ) = {σ : u i (σ ; µ) u i (σ i, σ i; µ), for all i, σ i }. (7) In he disribued ficiious play process described by (8) and (6) agens assume ha all oher agens selec acions from he same disribuion. herefore, i is reasonable o expec convergence no o an arbirary equilibrium bu o a consensus equilibrium in which all agens play he same sraegy. Define hen he se of consensus equilibria of he game Γ(µ) as he subse of he Nash equilibria se K(µ) defined in (7) for which all agens play according o he same sraegy, C(µ) = {σ = {σ,..., σ n} K(µ) : σ =... = σ n}. (8) We emphasize ha no all poenial games admi consensus Nash equilibria, bu he symmeric poenial games considered in his secion do have a nonempy se of consensus Nash equilibria; see e.g., [7]. We prove here ha he bes response acions in (8) are evenually drawn from a consensus equilibrium sraegy if he local empirical hisograms ˆ i + are updaed according o (6) and he local sae beliefs µ i converge o he common belief µ in he sense saed in (0). In he proof of his resul we make use of he following assumpions on he nework opology and he sae learning process. Assumpion he nework G(N, E) is srongly conneced. Assumpion 2 For all agens i N, he local beliefs µ i converge o a common belief µ a a rae faser han log /, ( ) log V (µ i, µ) = O. (9) Assumpion is a simple conneciviy requiremen o ensure ha he acions aken by any node evenually become known o all oher agens. Assumpion 2 requires ha agens reach he common belief µ fas enough. his assumpion is fundamenal o subsequen proofs bu is no difficul o saisfy see Remark 2. We noe ha he common belief µ is an arbirary belief on he sae θ o which all agens converge bu is no necessarily he opimal Bayesian aggregae of he informaion ha differen agens acquire abou he sae of he world. Validiy of hese wo assumpions guaranees convergence of he bes response acions in (8) as we formally sae nex. heorem Consider a symmeric poenial game Γ(µ) and he disribued ficiious play updaes where a each sage agens bes respond as in (8) wih local beliefs νj i = ˆ i for all j i formed using (6) and belief µ i. If Assumpions and 2 are saisfied and he iniial esimaed average beliefs are he same for all agens, i.e., if ˆ i = ˆ j for all i N and j N, hen he average empirical disribuion (A) converges o a sraegy ha is an elemen σi (A) of a sraegy profile σ n (A) ha belongs o he se of consensus Nash equilibria C(µ) of he symmeric poenial game Γ(µ). I.e., σi = 0 (20) where σi σ, σ C(µ), and denoes he L 2 norm on he Euclidean space. Proof: See Appendix B. Since in a consensus Nash equilibrium all agens play according o he same sraegy, σi = σj for all i and j, heorem also means ha he n-uple of he populaion s average empirical disribuion is a consensus Nash equilibrium sraegy profile σ C(µ). Noice ha his resul is no equivalen o showing ha each agen s empirical frequency f i is a consensus Nash equilibrium sraegy. However, i s model of oher agens ˆ i converges o he average empirical disribuion. In paricular, we have ˆ i = O(log /) by Lemma 5. Hence, agens do learn o bes respond o he consensus equilibrium sraegy. In order for agen i s individual empirical frequency f i o converge o he consensus Nash equilibrium sraegy, he uiliy funcion should be such ha agens are no indifferen beween wo acions when hey bes respond in (8) o he equilibrium sraegies of ohers as we show nex. Corollary In a disribued ficiious play wih acion sharing, if he poenial funcion u( ) is such ha for ν i j = for all j i he maximizing acion is unique asympoically, ha is, here exiss a A such ha u(a, ν i; i µ) u(a, ν i; i µ) ɛ (2) for ɛ > 0 and for all a A \ a hen each agen learns o play according o an empirical frequency ha is in equilibrium wih ohers empirical frequencies for any symmeric poenial game Γ(µ), ha is, Proof: See Appendix C. f i = 0. (22) he condiion in (2) says ha here exiss a single disinc acion a ha sricly maximizes he expeced uiliy asympoically when oher agens follow. We obain he resul

6 6 in (22) by leveraging he fac ha asympoically agens esimaes of he average empirical disribuion ˆ i converge o and here exiss a finie ime afer which acion a will be chosen. he resul above implies ha agens evenually play according o a consensus Nash equilibrium acion. Noe ha he responses of agens during he disribued ficiious play depend on boh he sae learning process and he process of agens forming heir esimaes on he average empirical disribuion. he resuls in his secion reveal ha hese wo processes can be designed independenly as long as boh processes converge a a fas enough rae. We make use of his separaion in he nex secion o design a disribued ficiious play process ha converges o an equilibrium sraegy of any poenial game. Remark 2 he log / convergence rae in Assumpion 2 is saisfied by various disribued learning mehods including averaging, e.g., [22], [23]; decenralized esimaion, e.g., [5], [24] [28]; social learning models, e.g., [29], [30]; and Bayesian learning, e.g., [3] [34]. In he way of illusraion, averaging models have agen i sharing is previous belief on he sae wih is neighbors and updaing is belief by a weighed averaging of observed disribuions ha follow he recursion µ i (θ) = j N i w ij µ j (θ), (23) for some se of doubly sochasic weighs w ij. Convergence o a common disribuion follows an exponenial rae O(c ) if all he informaion available o agens are privae observaions a ime = [23], [35]. In Bayesian learning we can do away wih communicaion alogeher and assume ha agens keep acquiring privae informaion on θ ha hey incorporae in he local beliefs µ i using Bayes law. If he local signals are informaive, all agens converge o an aomic belief wih all probabiliy in he rue sae of he world. Linear meaning O(/) raes of convergence can be achieved wih mild assumpions on he rae of novel informaion [3]. Bayesian updaes uilizing neighbors beliefs are also possible, if compuaionally cumbersome, and also achieves O(/) convergence wih mild assumpions [32] [34]. IV. DISRIBUED FICIIOUS PLAY IN GENERIC POENIAL GAMES For generic poenial games we consider a disribued ficiious play process in which agens communicae he hisograms hey keep on he oher agens wih heir neighbors. I.e., Agen i shares is enire belief ν i i wih is neighbors a each ime sep in addiion o is acion a i. When compared o he disribued ficiious play wih acion observaions, he addiional informaion communicaed allows agens o keep disinc beliefs on oher agens as we explain in he following. Agen i can keep rack of he individual empirical hisograms of is neighbors {f j } j Ni by (4) using observaions of he acions of is neighbors {a j } j Ni, ha is, νj+ i = f j+ for j N i. Oherwise, agen i can keep an esimae of he empirical hisogram of is non-neighbors j / N i by averaging he esimaes of is neighbors on he non-neighboring agen j {ν k j } k N i. I.e. he esimae of agen i on j / N i is given by ν i j+ = k N w i jkν k j (24) where wjk i > 0 if and only if k N i and k N wi jk =. Noe ha in his belief formaion, agen i keeps a separae belief on each individual and has he correc esimae of he empirical frequency of is neighbors. Besides he difference in belief updaes he disribued ficiious play is idenical o he behavior described in Secion III. o summarize, a ime agen i updaes is belief on he sae µ i, plays wih respec o he bes response acion a i in (8) wih beliefs ν i i, observe acions and beliefs of neighbors {a j, ν j j } j N i, and updae νj+ i for j i by (4) if j N i or by (24) if j / N i. In he following, we show he convergence of he empirical frequencies o a Nash equilibrium sraegy for any poenial game when agens follow he behavior described above. A. Convergence Nex, we presen he main resul of he paper ha shows ha he bes responses in he disribued ficiious play wih hisogram sharing converge o a Nash equilibrium sraegy (7) of any poenial game Γ(µ) given he same assumpions on nework conneciviy and on convergence of he sae learning process as in heorem. heorem 2 Consider a poenial game Γ(µ) and he disribued ficiious play updaes where a each sage agens bes respond as in (8) wih local beliefs ν i i formed using (4) if j N i or using (24) if j / N i, and sae belief µ i. If Assumpions and 2 are saisfied hen he empirical frequencies of agens f := {f j } j N n (A) defined in () converge o a Nash equilibrium sraegy of he poenial game wih common sae of he world beliefs µ, ha is, min f σ 0 (25) σ K(µ) where K(µ) is he se of Nash equilibria of he game Γ(µ) (7). Proof: See Appendix D. he above resul implies ha when agens share heir beliefs on ohers hisograms and based on his informaion keep an esimae of he empirical disribuion of each agen, heir responses converge o a Nash equilibrium of he poenial game as long as heir beliefs on he sae reach consensus a a belief µ fas enough. heorem 2 generalizes heorem o he class of poenial games given ha agens in addiion o heir acions communicae heir beliefs on ohers wih heir neighbors. V. SIMULAIONS We explore he effecs of he nework conneciviy, he sae learning process and he payoff srucure on he per-

7 Acion values Acion values (a) Fig. 2. Posiion of robos over ime for he geomeric (a) and small world neworks (b). Iniial posiions and nework is illusraed wih gray lines. Robos acions are bes responses o heir esimaes of he sae and of he cenroid empirical disribuion for he payoff in (27). Robos recursively compue heir esimaes of he sae by sharing heir esimaes of θ wih each oher and averaging heir observaions. heir esimaes on he cenroid empirical disribuion is recursively compued using (6). Agens align heir movemen a he direcion 95 while he arge direcion is θ = 90. (b) ime ime (a) Fig. 3. Disribued ficiious play acions of robos over ime for he geomeric (a) and small world neworks (b). Solid lines correspond o each robos acions over ime. he doed dashed line is equal o value of he sae of he world θ = 90 and he dashed line is he opimal esimae of he sae given all of he signals which is equal o 96.. Agens reach consensus in he movemen direcion 95 faser in he small-world nework han he geomeric nework. (b) formance of he algorihm in wo games named he beauy cones game, and he arge covering game. A. Beauy cones game A nework of n = 50 auonomous robos wan o move in coordinaion and a he same ime follow a arge direcion θ = [0, 80 ] in a wo dimensional opology. Each robo receives an iniial noisy signal relaed o he arge direcion θ, π i (θ) = θ + ɛ i (26) where ɛ i is drawn from a zero mean normal disribuion wih sandard deviaion equal o 20. Acions of robos deermine heir direcion of movemen and belong o he same space as θ bu are discreized in incremens of 5, i.e., A = (0, 5, 0,..., 80 ). he esimaion and coordinaion payoff of robo i is given by he following uiliy funcion ( u i (a, θ) = λ(a i θ) 2 ( λ) a i ) 2 a j (27) n j i where λ (0, ) gauges he relaive imporance of esimaion and coordinaion. he game is a symmeric poenial game and hence admis a consensus equilibrium for any common belief on θ [4]. In he following numerical seup, we choose θ o be equal o 90. We assume ha all robos sar wih a common prior on he cenroid empirical disribuion in which hey believe ha each acion is drawn wih equal probabiliy. hey follow he disribued ficiious play updaes wih acion sharing described in Secion III. Sae learning process is chosen as he averaging model in which robos updae heir beliefs on he sae θ using (23) wih iniial beliefs formed based on he iniial privae signal wih signal generaing funcion in (26). In Figs. 2 and 3, we plo robos posiions and heir acions,

8 Acion values ime (a) (b) Fig. 4. Locaions (a) and acions (b) of robos over ime for he sar nework. here are n = 5 robos and arges. In (a), he iniial posiions of he robos are marked wih squares. he robos final posiions a he end of 00 seps are marked wih a diamond. he posiions of he arges are indicaed by. Robos follow he hisogram sharing disribued ficiious play presened in Secion IV. he sars in (a) represen he posiion of he robos a each sep of he algorihm. he solid lines in (b) correspond o he acions of robos over ime. All arges are covered by a single robo before 00 seps. respecively. In Fig. 2, we assume ha robo i moves wih a displacemen of 0.0 meers in he chosen direcion a i a sage. Figs. 2(a) and 3(a) correspond o he behavior in a geomeric nework when robos are placed on a meer meer square randomly and connecing pairs wih disance less han 0.3 meer beween hem. Figs. 2(b) and 3(b) correspond o he behavior in a small-world nework when he edges of he geomeric nework are rewired wih random nodes wih probabiliy 0.2. he geomeric nework illusraed in Fig. 2(a) has a diameer of g = 5 wih an average lengh among users equal o 2.5. he small world nework illusraed in Fig. 2(b) has a diameer of r = 4 wih an average lengh among users equal o 2. We observe ha he agens reach consensus a he acion 95 in boh neworks bu he convergence is faser in he small-world nework (39 seps) han he geomeric nework (78 seps). We furher invesigae he effec of he nework srucure in convergence ime by considering 50 realizaions of he geomeric nework and 50 small-world neworks generaed from he realized geomeric neworks wih rewire probabiliy of 0.2. he average diameer of he realized geomeric neworks was 5. and he average diameer of he realized small-world neworks was 4.. he mean of he average lengh of he realized geomeric neworks was 2.27 while he same value was.96 for he realized small-world neworks. We considered a maximum of 500 ieraions for each nework. Among 50 realizaions of he geomeric nework, he disribued ficiious play behavior failed o reach consensus in acion wihin 500 seps in 8 realizaions while for smallworld neworks he number of failures was 5. he average ime o convergence among he 50 realizaions was 228 seps for he geomeric nework whereas he convergence ook 00 seps for he small-world nework on average. In addiion, Diameer is he longes shores pah among all pairs of nodes in he nework. he average lengh is he average number of seps along he shores pah for all pairs of nodes in he nework. convergence ime for he small-world nework is observed o be shorer han he corresponding geomeric nework in all of he runs excep one. B. arge covering game n auonomous robos wan o cover n arges. he posiion of a arge k := {,..., n} on he wo dimensional space is denoed by θ k R 2 and are no known by he robos. Robo i sars from an iniial locaion x i R 2 and makes noisy observaions s ik0 of he locaion of arge k coming from normal disribuion wih mean θ k and sandard deviaion equal o σi where I is he 2 2 ideniy marix and σ > 0 for all k. A each sage robos choose one of he arges, ha is, A = and receives a payoff from covering ha arge ha is inversely proporional o is disance from he arge if no oher robo is covering i, ha is, he payoff of robo i from covering arge k (,..., n) a i = k is given by u i (a i = k, a i, θ) = (a j = k) = 0 h(x i, θ k ) j i (28) where ( ) is he indicaor funcion and h( ) is a reward funcion inversely proporional o he disance beween he arge and he robo s iniial posiion x i, e.g., x i θ k 2. he firs erm in he muliplicaion above is one if no one else chooses arge k oherwise i is zero. he second erm in he muliplicaion decreases wih growing disance beween robo i s iniial posiion x i and he arge k s posiion θ k. When all of he robos sar from he same locaion, ha is, x i = x for all i N, he game wih payoffs above can be shown o be a poenial game by using he definiion of poenial games in (2). Furhermore, he game is symmeric. In his seup, we would like each robo o assign iself o

9 global uiliy values 9 a single arge differen from he res of he robos, ha is, we are ineresed in convergence o a pure sraegy Nash equilibrium in which each robo picks a single acion similar o he arge assignmen games considered in [20]. Observe ha he arge covering game can no have a pure consensus equilibrium sraegy. o see his, assume ha all robos cover he same arge hen hey all receive a payoff of zero. Any robo ha deviaes o anoher arge receives a posiive payoff. herefore, here canno be a pure consensus sraegy equilibrium. As a resul, insead of he acion sharing scheme, we consider he hisogram sharing disribued ficiious play by which i is possible bu no guaraneed ha he robos converge o a pure sraegy Nash equilibrium. In he numerical seup, we consider n = 5 robos wih he payoffs in (28) and n arges. he locaions of arges are respecively given as follows θ = (, ), θ 2 = (, ), θ 3 = (, ), θ 4 = (, ), θ 5 = (0, ). We consider he case ha he iniial posiions of robos are differen from each oher wih he reward funcion h(x i, θ k ) = x i θ k 2. Specifically, he iniial posiions of he robos equal o x = ( 0., 0.), x 2 = (0., 0.), x 3 = ( 0., 0.), x 4 = (0., 0.), and x 5 = (0, 0.). Robos make noisy observaions s ik for all k afer each sep. he observaions have he same disribuion as s ik0 wih σ = 0.2 meers. We assume ha he robos updae heir beliefs on he posiions of arges using he Bayes rule based on he observaions. Figs. 4(a)-(b) shows he movemen of robos and acions of robos over ime, respecively, for he sar nework. We assume ha robos move by a disance of 0.02 meers along he esimaed direcion of he arge hey choose a each sep of he disribued ficiious play. he esimaed direcion is a sraigh line from he curren posiion o he esimaed posiion of he chosen arge. I.e., he robos make observaions and decisions in every 0.02 meers. Finally, we assume ha he robo covers he arge if i is 0.05 meers away from a arge and no oher robo covers i. In figs. 4(a)-(b), we observe ha each robo comes o 0.05 meers neighborhood of a arge wihin 00 seps. Furhermore, he robos cover all of he arges, ha is, hey converge o a pure Nash equilibrium. Nex, we compare he disribued ficiious play algorihm o he cenralized (opimal) algorihm. In he cenralized algorihm, a he beginning of each sep agens aggregae heir signals and hen ake he acion o maximize he expeced global objecive defined as he sum of he uiliies of all (28). Since here exiss muliple equilibria in he complee informaion arge coverage game, i is no guaraneed ha he disribued ficiious play algorihm converges o he opimal equilibrium a each run. For his purpose, we considered 50 runs of he algorihm where in each run signals are generaed from differen seeds. We assume ha he algorihm has converged when each arge is covered by a robo wihin 0.05 meers from he arge. In Fig. 5, we plo he evoluion of he global uiliy wih respec o ime for he disribued ficiious play algorihm runs wih he bes and he wors final payoff, and for he cenralized algorihm. he bes final configuraion overlaps wih he final cenralized soluion which is given by Wors Bes Cenral ime Fig. 5. Comparison of he disribued ficiious play algorihm wih he cenralized opimal soluion. Bes and wors correspond o he runs wih he highes and lowes global uiliy in he disribued ficiious play algorihm. he algorihm converges o he global opimal poin in 40 runs ou of a oal of 50 runs. a = [, 2, 3, 4, 5] resuling in a global objecive value of he wors final configuraion is given by a = [, 5, 3, 4, 2] resuling in a global objecive value of VI. CONCLUSION his paper considered he opimal behavior of muli-agen sysems wih uncerainy on he sae of he world. he fundamenal problem of ineres was o have a model of opimal agen behavior given a global objecive ha depends on he sae and acions of all he agens when agens have differen beliefs on he sae. We posed he seup as a poenial game wih he global objecive as he common uiliy of he muli-agen sysem and se he opimal behavior as a Nash equilibrium sraegy of he game when agens have common beliefs on he sae of he environmen. We presened a class of disribued algorihms based on he ficiious play algorihm in which agens reach an agreemen on heir sae beliefs asympoically hrough an exogenous process, build beliefs on he behavior of ohers based on informaion from neighbors and bes respond o heir expeced uiliy given heir beliefs on he sae and ohers. We considered wo informaion exchange scenarios for he algorihm where in he firs scenario agens communicaed heir acions. For his scenario we showed ha when he agens keep rack of he populaion s average empirical frequency of acions as a belief on he behavior of every oher individual, heir behavior converges o a consensus Nash equilibrium of any symmeric poenial game wih common beliefs on he sae. In he second scenario we considered agens exchanging heir enire beliefs on ohers in addiion o heir acions. For his scenario we proposed averaging of he observed hisograms as a model for keeping beliefs on he behavior of ohers and showed ha heir empirical frequency converges o a Nash equilibrium of any poenial game. We exemplified he algorihm in a coordinaion game a symmeric poenial game and a arge covering game a poenial game. In hese examples, we observed ha he diameer of he nework is influenial in convergence rae where he shorer he diameer is, he faser he convergence is.

10 0 APPENDIX A DEFINIIONS AND INERMEDIAE CONVERGENCE RESULS We define noions ha relae closeness of a sraegy o he se of consensus Nash equilibria of he game Γ(µ). he disance of a sraegy σ n (A) from he se of consensus Nash equilibria C(µ) is given by d(σ, C(µ)) = min σ g. (29) g C(µ) he se of consensus sraegies ha are ɛ away from he consensus Nash equilibrium se (8) is he ɛ-consensus Nash equilibrium sraegy se, ha is, C ɛ (µ) = {σ n (A) : u i (σ ; µ) u i (σ i, σ i; µ) ɛ, for all σ i (A), for all i, σ = σ 2 = = σ n } (30) for ɛ > 0. We define he δ-consensus neighborhood of C(µ) as B δ (C(µ)) = { σ n (A) : d(σ, C(µ)) < δ, σ = σ 2 = = σ n }. (3) Noe ha he δ consensus neighborhood is defined as he se of consensus sraegies ha are close o he se C(µ). We can similarly define he ɛ Nash equilibrium se K ɛ (µ) and δ neighborhood of K(µ) in (7) as B δ (K(µ)) by removing he agreemen consrain on he equilibrium sraegies [7]. he following inermediae resuls can be found in Appendix B in [7]. hey are saed here for compleeness. Lemma If he processes g n (A) and h n (A) are such ha g i h i = O(log /) for all i N and he sae learning process for all i N generaes esimae beliefs {{µ i } =0} i N ha saisfy Assumpion 2, hen for a poenial payoff u in (2) he following is rue for he expeced uiliy of bes response behavior v( ) in (9), v(g i ; µ i ) v(h i ; µ) = O( log ). (32) Proof: he proof is deailed in Lemma 4 in [7]. he proof follows by firs making he observaion ha he expeced uiliy defined in () for he poenial funcion is Lipschiz coninuous, and second using he definiion of he Lipschiz coninuiy o bound he difference beween he bes response expeced uiliies in (9) for g i, µ i and h i, µ by he disance beween g i, µ i and h i, µ muliplied by he Lipschiz consan. Lemma 2 If α = O( log < for all > 0, α β = β < as. ) and β + 0 hen = Proof: Refer o he proof of Lemma 5 in [7]. Lemma 3 Denoe he n-uple of he average empirical disribuion wih n := {,..., }. If for any ɛ > 0 he following holds #{ : n / C ɛ (µ)} = 0 (33) hen d( n, C(µ)) = 0 where d(, ) is he disance defined in (29). Proof : By Lemma 7 in [7], (33) implies ha for a given δ > 0 here exiss an ɛ > 0 such ha #{ : n / B δ (C(µ))} = 0 (34) Using above equaion, he resul follows by Lemma in [36]. Lemma 4 For he poenial game wih funcion u( ) in (2) and expeced bes response uiliy v( ) (9), consider a sequence of disribuions f n (A) for =, 2,... and a common belief on he sae µ (Θ). Define he process β := n v(f i; µ) u(f i, f i ; µ) for =, 2,.... If = β = 0 (35) hen d(f, K(µ)) = 0 where f = {f,..., f n }. Proof: By Lemma 6 in [7], he condiion (35) implies ha for all ɛ > 0 #{ : f / K ɛ (µ)} = 0. (36) By Lemma 7 in [7], (36) implies ha for all δ > 0 he following is rue #{ : f / B δ (K(µ))} = 0 (37) he above convergence resul yields desired convergence resul by Lemma in [36]. APPENDIX B PROOF OF HEOREM Before we prove he heorem, we presen an inermediae resul ha shows he convergence rae of he belief of agen i on he populaion s average empirical disribuion ˆ i o he rue average empirical disribuion of he populaion i. Lemma 5 Consider he disribued ficiious play in which he cenroid empirical disribuion of he populaion evolves according o (5) and agens updae heir esimaes on he empirical play of he populaion ˆ i according o (6). If he nework saisfies Assumpion and he iniial beliefs are he same for all agens, i.e., ˆ i = for all i N, hen ˆ i converges in norm o a he rae O(log /), ha is, ˆ i = O( log ) Proof: See Appendix A in [7] for a proof. Observe ha he above resul is rue irrespecive of he game ha he agens are playing and uncerainy in he sae.

11 he proof leverages on he fac ha he change in he cenroid empirical disribuion is a mos / by he recursion in (5). hen by averaging observed acions of neighbors in a srongly conneced nework he beliefs of agen i on he cenroid empirical disribuion evolves faser han he change in he cenroid empirical disribuion. Proof heorem : Given he recursion for he average empirical disribuion in (5), we can wrie he expeced uiliy for he poenial funcion u( ) when all agens follow he cenroid empirical disribuion + and have idenical beliefs µ as follows ( ( ) ) u( +; n µ) = u n + Ψ(a i ) n n ; µ (38) where n := {,..., } n (A) is he n-uple of he average populaion empirical disribuion. Define he average bes response sraegy a ime Ψ(a ) := n n Ψ(a i). By he muli-lineariy of he expeced uiliy, we expand he above expeced uiliy as follows [36] u( +; n µ) = u( n ; µ)+ u( Ψ(a n ), ; µ) u( n, ; µ) + δ 2 (39) where he firs order erms of he expansion are explicily wrien and he remaining higher order erms are colleced o he erm δ/ 2. Consider he oal uiliy erm in (39) where agen i is playing wih respec o he average bes response sraegy a ime + Ψ(a ) and remaining agens use he average n n empirical disribuion, n u( Ψ(a ), ; µ). By he definiion of he average bes response sraegy, we wrie he erm in consideraion as u ( n Ψ(a ), ; µ ) ( ) n = u Ψ(a i ), ; µ. n (40) he following equaliy can be shown by using he mulilineariy of expecaion and permuaion invariance of he uiliy [7], u( Ψ(a ), n ; µ) = u(ψ(a i ), n ; µ). (4) he above equaliy means ha he oal expeced uiliy when agens play wih he average bes response sraegy a ime Ψ(a n ) agains he average empirical disribuion a ime is equal o he oal expeced uiliy when agens bes respond o he average populaion empirical disribuion a ime. We subsiue in he above equaliy (4) for he corresponding erm in (39) o ge he following u( +; n µ) = u( n ; µ)+ n u(ψ(a i ), ; µ) u( n, ; µ) + δ 2. (42) We can upper bound he righ hand side by adding δ / 2 o he lef hand side. u( +; n µ) u( n ; µ) + δ u(ψ(a i ), 2 n ; µ) u( n, ; µ) (43) Define L i := v i (ν i i ; µ n i) u(ψ(a i ), ; µ) where νj i = ˆ i for j i. Noe ha since agens have idenical payoffs, we can drop he subindex of he expeced uiliy of agen i when i bes responds o he sraegy profile of ohers v i ( ) defined in Secion II o wrie i as v( ). Now we add and subrac n L i/ o boh sides of he above equaion o ge he following inequaliy, u( n +; µ) u( n ; µ) + δ + 2 v(ν i; i µ i ) u(ψ(a i ), v(ν i; i µ i ) u(, n ; µ) n ; µ). (44) Summing he inequaliies above from ime = o ime =, we ge u( n +; µ) u( + + n δ ; µ) = = = v(ν i; i µ i ) u(, L i n ; µ). (45) Nex we define he following erm ha corresponds o he inside summaion on he righ hand side of he above inequaliy, α := v(ν i; i µ i ) u(, n ; µ). (46) he erm α capures he oal difference beween expeced uiliy when agens bes respond o heir beliefs on he average populaion empirical disribuion νj i = ˆ i and heir beliefs on θ µ i, and when hey follow he curren cenroid empirical disribuion wih common beliefs on he sae µ. Noe ha by Lemma 5 and Assumpion 2 he condiions of Lemma are saisfied, ha is, v(ν i i ; µ i) n u(, ; µ) = O(log /). By he assumpion ha uiliy value is finie and Lemma, he lef hand side of (45) is

12 2 finie. ha is, here exiss a B > 0 such ha B + = α. (47) for all > 0. Nex, we define he following erm n β := v( ; µ) u( n, ; µ) (48) ha capures he difference in expeced payoffs when agens n bes respond o he cenroid empirical disribuion for ohers given he common asympoic belief µ, and when everyone follows he curren cenroid empirical disribuion wih common beliefs on he sae µ. When we consider he difference beween α and β, he following equaliy is rue by Lemma, α β = v(ν i; i µ i ) v( n ; µ) = O( log ). (49) Furher β 0. Hence, he condiions of Lemma 2 are saisfied which implies ha he following holds = β <. (50) From he above equaion i follows by he Kronecker s Lemma ha [37, hm ] β = 0. (5) = he above convergence resul implies ha by Lemma 6 in [7], for any ɛ > 0, he number of cenroid empirical frequencies away from he ɛ consensus NE is finie for any ime, ha is, #{ : n / C ɛ (µ)} = 0. (52) he relaion above implies ha he disance beween he empirical frequencies and he se of symmeric NE diminishes by Lemma 3, ha is, d( n, C(µ)) = 0. (53) where d(, ) is he disance defined in (29). he resul in (20) follows from above. APPENDIX C PROOF OF COROLLARY Denoe he n uple of he average empirical disribuion n by. By he Lipschiz coninuiy of he n mulilinear uiliy expecaion we have ha u(a i, ; µ) u(a i, ν i i ; µ n i) K (, µ) (ν i i, µ i) for all a i where νj i = ˆ i for all j i and K 0 is he Lipschiz consan. By Lemma 5 and Assumpion 2, we have n (, µ) (ν i i, µ i) = O(log /). hen using (2), we have for all a A \ a u(a, ν i i; µ i ) u(a, ν i i; µ i ) ɛ δ (54) for ν i j = ˆ i for all j i, δ 0 and δ = O(log /). herefore, here exiss a finie ime > 0 such ha ɛ δ > 0 for all >. his means ha a is he bes response acion of i afer ime. hen he empirical frequency of i N f i converges o Ψ(a ) which implies (22). APPENDIX D PROOF OF HEOREM 2 Before we prove he heorem, we firs analyze he convergence rae of he hisogram sharing presened in Secion IV where we defined νj i = f j if j N i or νj i is given by (24) if j / N i. Denoe he lh elemen of νj i by νi j (l). Define he marix ha capures populaion s esimae on j s empirical disribuion, ˆFj := [νj,..., νn j ] R n m. he lh column of ˆF j represens he populaion s esimae on j s lh local acion denoed by ˆF j (l) := [νj (l),..., νn j (l)] R n. Observe ha j s esimae of he frequency of is own acion l is correc, ha is, ν j j (l) = f j(l). Define he vecor x jl R n where is jh elemen is equal o he empirical frequency of agen j aking acion l A, ha is, x jl (j) = f j (l), and is oher elemens are zero. Furher define he weighed adjacency marix for belief updae on he frequency of agen j s lh acion W jl R n n wih W jl (i, k) = wjk i for all i and k. We remind ha wi jk is he weigh ha i uses o mix agen j N i s belief on agen k / N i s empirical disribuion in (24). Also noe ha here are m weigh marices W jl each corresponding o one acion l A. he marix W jl is row sochasic, ha is, he sum of row elemens of W jl is equal o one for each row by k N i wjk i = and we have ha W jl(i, j) = for j N i i. he laer condiion on Wjl is by he fac ha if j N i, j sends is acion o is neighbor i and hence νj i = f j. Given hese definiions we can wrie a linear recursion for populaion s esimae of j s empirical frequency of is lh acion ˆF j+ (l) = W jl ( ˆF j (l) + x jl+ x jl ). (55) Noe ha if he above linear sysem converges o he rue empirical frequency of f j (l) in all of is elemens hen i implies ha all agens learned is rue value. Nex, we analyze he linear updae in (55) and show he convergence of he belief of agen i on he populaion s empirical disribuion ν i i o he rue average empirical disribuion of he res of he populaion f i a rae O(log /). Lemma 6 Consider he disribued ficiious play in which he empirical disribuion of agen j f j evolves according o (4) and agen i updaes is esimae on he empirical play of he populaion ν i i according (4) if j N i or using (24) if j / N i. If he nework saisfies Assumpion and he iniial beliefs are he same for all agens, i.e., ν i j = f j for all i N, hen ν i j converges in norm o f j a he rae O(log /), ha is, ν i j f j = O( log ) for all j N.

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he