Indian Buffet Game With Negative Network Externality and Non-Bayesian Social Learning

Size: px

Start display at page:

Download "Indian Buffet Game With Negative Network Externality and Non-Bayesian Social Learning"

Regina Zoe Martin
6 years ago
Views:

1 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL Indan Buffet Game Wth Negatve Network Externalty and Non-Bayesan Socal Learnng Chunxao Jang, Member, IEEE, Yan Chen, Senor Member, IEEE, Yang Gao, Student Member, IEEE, and K. J. Ray Lu, Fellow, IEEE Abstract In a dynamc system, how to perform learnng and make decsons are becomng more and more mportant for users. Although there are some works n socal learnng-related lterature regardng how to construct belef for an uncertan system state, few studes have been conducted on ncorporatng socal learnng wth decson makng. Moreover, users may have multple concurrent optons on dfferent obects/resources and ther decsons usually negatvely nfluence each other s utlty, whch makes the problem even more challengng. In ths paper, we propose an Indan Buffet Game to study how users n a dynamc system learn about the uncertan system state and make multple concurrent decsons by not only consderng the current myopc utlty, but also the nfluence of subsequent users decsons. We analyze the proposed Indan Buffet Game under two dfferent scenaros: 1 on customers requestng multple dshes wthout budget constrant and 2 wth budget constrant. For both cases, we desgn recursve best response algorthms to fnd the subgame perfect Nash equlbrum NE for customers and characterze specal propertes of the NE profle under homogeneous settng. Moreover, we ntroduce a non-bayesan socal learnng algorthm for customers to learn the system state, and theoretcally prove ts convergence. Fnally, we conduct smulatons to valdate the effectveness and effcency of the proposed algorthms. Index Terms Decson makng, game theory, Indan Buffet Game, negatve network externalty, non-bayesan socal learnng. I. INTRODUCTION IN A dynamc system, users are usually confronted wth uncertanty about the system state when makng decsons. For example, n the feld of wreless communcatons, when choosng channels to access, users may not know exactly the channel capacty and qualty. Besdes, users have to consder others decsons snce a large number of users sharng a same Manuscrpt receved March 13, 2014; revsed June 4, 2014; accepted August 2, Date of publcaton December 12, 2014; date of current verson March 13, Ths work was supported n part by Proect , Proect , and Proect , n part by the Natonal Scence Foundaton of Chna, n part by the Natonal Basc Research Program of Chna 973 Program under Grant 2013CB329105, and n part by the Post-Doctoral Scence Foundaton funded proect. Ths work was done durng C. Jang s vst at the Unversty of Maryland. Ths paper was recommended by Assocate Edtor F.-Y. Wang. C. Jang s wth the Department of Electronc Engneerng, Tsnghua Unversty, Beng , Chna e-mal: chx.ang@gmal.com. Y. Chen, Y. Gao, and K. J. R. Lu are wth the Department of Electrcal and Computer Engneerng, Unversty of Maryland, College Park, MD USA. Color versons of one or more of the fgures n ths paper are avalable onlne at Dgtal Obect Identfer /TSMC channel wll nevtably decrease the average data rate and ncrease the end-to-end delay. Such phenomenon s known as negatve network externalty [1], [2],.e., the negatve nfluence of other users behavors on one user s reward, due to whch users tend to avod makng the same decsons wth others to maxmze ther own utltes. Smlar phenomenon can be found n our daly lfe such as selectng onlne cloud storage servce and choosng WF access pont [3]. Therefore, how users n a dynamc system learn the system state and make best decsons by consderng the nfluence of others decsons becomes an mportant research ssue n many felds [4] [7]. Although users n a dynamc system may only have lmted knowledge about the uncertan system state, they can construct a probablstc belef regardng the system state through socal learnng. In the socal learnng lterature [8] [13], dfferent knds of learnng rules were studed where the essental obectve s to learn the true system state eventually. In most of these exstng works, the learnng problem s typcally formulated as a dynamc game wth ncomplete nformaton and the man focus s to study whether users can learn the true system state at the equlbra. However, all of the prevous works assumed that users utlty functons are ndependent of each other,.e., they dd not consder the concept of network externalty, whch s ndeed a common phenomenon n dynamc systems and can nfluence users utltes and decsons to a large extent. To study the socal learnng problem wth negatve network externalty, n [14] [16], we have proposed a general framework called Chnese Restaurant Game. The concept s orgnated from Chnese Restaurant Process [17], whch s used to model unknown dstrbutons n the nonparametrc learnng methods n the feld of machne learnng. In the Chnese Restaurant Game, there are fnte tables wth dfferent szes and fnte customers sequentally requestng tables to be seated. Snce customers do not know the exact sze of each table, whch affects each other s utlty, they have to learn the table szes accordng to some external nformaton. Moreover, when requestng one table, each customer should take nto account the subsequent customers decsons due to the lmted dnng space n each table,.e., the negatve network externalty. Then, the Chnese Restaurant Game s extended to a dynamc populaton settng n [18], where customers arrve at and leave the restaurant wth a Posson process. Wth the general Chnese Restaurant Game theoretc framework, one c 2014 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

2 610 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL 2015 s able to analyze the socal learnng and strategc decson makng of ratonal users n a network. One underlyng assumpton n the Chnese Restaurant Game s that each customer can only choose one table. However, n many real applcatons, users can have multple concurrent selectons, e.g., moble termnals can access multple channels at once and users can have multple cloud storage servces. To tackle such a challenge, smlar to the Chnese Restaurant Game whch ntroduced the strategc behavors nto the nonstrategc Chnese Restaurant Process, n ths paper, we propose a new game called Indan Buffet Game, whch ntroduces strategc behavors nto the nonstrategc Indan Buffet Process [19]. In the Indan Buffet Process, there exst an nfnte number of dshes and each customer can order a number of dshes followng Posson process. Such a stochastc process defnes a probablty dstrbuton for use as a pror n probablstc models. By ncorporatng customers ratonal behavors, we extend the Indan Buffet Process nto the Indan Buffet Game and study what s the customers optmal decsons when choosng the dshes n a sequental manner. The proposed Indan Buffet game s an deal framework to study the multple dshes selecton problem by ntegratng socal learnng and strategc decson makng wth negatve network externalty. Whle lots of works have been done regardng the smultaneous decson makng problem [7], [20], the sequental decson makng problem has not been well nvestgated n the lterature, whereas a user has to consder both the prevous users decsons and predct the subsequent users decsons. Furthermore, n practce, the sequental decson makng scenaros are even more prevalng, especally n the feld of wreless communcaton where synchronzaton among all users s qute dffcult. Therefore, n ths paper, we focus on sequental decson makng analyss and try to provde some nsghts and results on ths problem. We wll dscuss two cases: Indan Buffet Game wthout budget constrant and wth budget constrant, where, wth budget constrant means the number of dshes each customer can requre s lmted, and, wthout budget constrant means no lmtaton. The man contrbutons of ths paper can be summarzed as follows. 1 We propose a general framework, Indan Buffet Game, to study how users make multple concurrent selectons under uncertan system states. Specfcally, such a framework can reveal how users learn the uncertan system state through socal learnng and make optmal decsons to maxmze ther own expected utltes by consderng negatve network externalty. 2 In the learnng stage of the Indan Buffet Game, we propose a non-bayesan socal learnng algorthm for customers to learn the dsh states. Moreover, we prove theoretcally the convergence of the proposed non-bayesan socal learnng algorthm to the true belef and show wth smulatons the fast convergence rate. 3 For the case wthout budget constrant, we show that the multple concurrent dshes selecton problem can be decoupled nto a seres of ndependent Indan Buffet Games. We then desgn a recursve best response algorthm to fnd the subgame perfect Nash equlbrum NE of the elementary Indan Buffet Game. We show that, under the homogeneous settng, the NE profle exhbts a threshold structure. 4 For the case wth budget constrant, we desgn a recursve best response algorthm to fnd the correspondng subgame perfect NE. We then show that, under the homogeneous settng, the NE profle exhbts an equalsharng property. The rest of ths paper s organzed as follows. The system model s descrbed n Secton II. Whle, the Indan Buffet Game wthout and wth budget constrant are dscussed n detals n Sectons III and IV, respectvely. In Secton V, we gve the theoretcal proof of the convergence of the proposed non-bayesan learnng rule. Fnally, we show smulaton results n Secton VI and draw the concluson n Secton VII. II. SYSTEM MODEL A. Indan Buffet Game Formulaton Let us consder an Indan buffet restaurant whch provdes M dshes denoted by r 1, r 2,...,r M. There are N customers labeled wth 1, 2,...,N sequentally requestng dshes for a meal. Each dsh can be shared among multple customers and each customer can select multple dshes. We assume that all N customers are ratonal n the sense that they wll select dshes whch can maxmze ther own utltes. In such a case, the multple dshes selecton problem can be formulated to be a noncooperatve game, called Indan Buffet Game, as follows. 1 Players: N ratonal customers n the restaurant. 2 Strateges: Snce each customer can request multple dshes, the strategy set can be defned as X = {, {r 1 },...,{r 1, r 2 },...,{r 1, r 2,...,r M }} 1 where each strategy s a combnaton of dshes and means no dsh s requested. Obvously, customers strategy set s fnte wth 2 M elements. We denote the strategy of customer as d = d,1, d,2,...,d,m, where d, = 1 represents customer requests dsh r and otherwse we have d, = 0. The strategy profle of all customers can be denoted by a M N matrx as follows 1 : d 1,1 d 2,1 d N,1 d 1,2 d 2,2 d N,2 D = d 1, d 2,...,d N = d 1,M d 2,M d N,M 3 Utlty Functon: The utlty of each customer s determned by both the qualty of the dsh and the number of customers who share the same dsh due to the negatve network externalty. The qualty of one dsh can be nterpreted as the delcousness or the sze. Let q Q denote the qualty of dsh r where Q s the qualty space, and N denote the total number of customers requestng dsh r. Then, we can defne the utlty functon of customer requestng dsh r as u, q, N = g, q, N c, q, N 3 where g, s the gan functon and c, s the cost functon. Note that the utlty functon s an ncreasng 1 In ths paper, the bold symbols represent vectors, the bold captal symbols represent matrxes, the subscrpt denotes the customer ndex, subscrpt denotes the dsh ndex, and the superscrpt t denotes tme slot ndex.

3 JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 611 functon n terms of q, and a decreasng functon n terms of N, whch can be regarded as the characterstc of negatve network externalty snce the more customers request dsh r, the less utlty customer can obtan. Note that f no dsh s requested, the utlty s zero for a customer. We hereby defne the dsh state θ ={θ 1,θ 2,...,θ M }, where θ denotes the state of dsh r. The dsh state can be nterpreted as how good the ngredent or cookng s. s the set of all possble states, whch s assumed to be fnte. The dsh state keeps unchanged along wth tme untl the whole Indan buffet restaurant s remodeled. The aforementoned qualty of dsh r, q, s assumed to be a random varable followng the dstrbuton f θ, whch means that the state of the dsh θ determnes the dstrbuton of the dsh qualty q. Note that the dsh qualty s the same for all users n a certan tme slot, whch follows dstrbuton f θ and there s a realzaton n each tme slot. The dsh state θ M s unknown to all customers, e.g., they do not know exactly whether the dsh s delcous or not before requestng and tastng. Nevertheless, they may have receved some advertsements or gathered some revews about the restaurant. Such nformaton can be regarded as some knds of sgnals related to the true state of the restaurant. In such a case, customers can estmate θ through those avalable nformaton,.e., the nformaton they know n advance and/or gather from other customers. In the Indan Buffet Game model, we dvde the system tme nto tme slots and assume that the dsh qualty q wth = 1, 2,...,M vares ndependently from tme slot to tme slot followng the correspondng condtonal dstrbutons f θ. In each tme slot, customers make sequental decsons on whch dshes to request. There are manly two ssues to be addressed n the Indan Buffet Game. Frst, snce the states are unknown, t s very mportant to desgn an effectve socal learnng rule for customers to learn from others and ther prevous outcomes. Second, gven customers knowledge about the state, we should characterze the equlbrum that ratonal customers wll adopt n each tme slot. In ths paper, to ensure farness among customers, we assume that customers have dfferent orders of selectng dshes at dfferent tme slots. In the sequel, the customer ndex 1, 2,...,N means the dsh request order of them,.e., customer means the th customer. In such a case, t s suffcent for customers to only consder the expected utltes at current tme slot. Moreover, although each customer can request more than one dsh, the total number of requests s subect to the followng budget constrant: M d, L, = 1, 2,...,N. 4 =1 A specal case of 4 s that L M, whch s equvalent to the case wthout budget constrant where customers can request as many dshes as possble. In Sectons III and IV, we wll dscuss the Indan Buffet Game under two scenaros: wthout budget constrant L M and wth budget constrant L < M, respectvely. B. Tme Slot Structure of Indan Buffet Game Snce the dsh state θ M s unknown to customers, we ntroduce the concept of belef to descrbe ther uncertanty Fg. 1. Tme slot structure of the Indan Buffet Game. about the state [22]. Let us denote the belef as P t ={p t, = 1, 2,...,M}, where p t = {p t θ, θ } represents customers estmaton about the probablty dstrbuton regardng the state of dsh r at tme slot t. Snce customers can obtan some pror nformaton about the dsh state, we assume that all customers start wth a pror belef p 0, θ for every state θ. Note that the pror belefs of all customers can be dfferent, however, snce all customers share ther belef nformaton wth each other, they wll have the same belef after the frst belef updatng. In ths secton, we wll dscuss the proposed socal learnng algorthm,.e., how customers update ther belef P t at each tme slot, and leave the convergence and performance analyss n Secton V. In Fg. 1, we llustrate the tme slot structure of the proposed Indan Buffet Game. At each tme slot t {1, 2...}, there are three phases: 1 decson-makng phase; 2 dsh sharng phase; and 3 socal learnng phase. 1 Phase 1 Decson Makng: In ths phase, customers sequentally make decsons on whch dshes to request and broadcast ther decsons to others, or smply put, everyone knows what others are gettng. For customer, hs/her decson s to maxmze hs/her expected utlty at current tme slot, based on the belef at current tme slot P t, the decsons of the prevous 1 customers {d 1, d 2,...,d 1 }, and hs/her predctons of the subsequent N customers decsons. 2 Phase 2 Dsh Sharng: In the second phase, each customer requests hs/her desred dshes, receves a utlty u, q, N accordng to the dsh qualty q and the number of customers N sharng the same dsh as defned n 3. Notce that snce N s known to all customers after the decson makng phase, the customers who requested dsh r at tme slot t can nfer the dsh qualty q from the receved utlty. Let us denote such nferred nformaton as s t, Q, s, f θ, whch serves as the sgnal n the learnng procedure. On the other hand, the customers who have not requested r at tme slot t, cannot nfer the dsh qualty q and thus have no nferred sgnal by themselves. Such an asymmetrc structure,.e., not every customer nfers sgnals, makes the learnng problem dfferent from the tradtonal socal learnng settngs and thus poses more challenges on learnng the true state. 3 Phase 3 Socal Learnng: Accordng to the observed/ nferred sgnals n the second phase, customers can update the belef through the proposed non-bayesan socal learnng rule. As llustrated n Fg. 2, there are manly two steps n the proposed socal learnng rule. In the frst step, each customer updates hs/her local ntermedate belef on θ, μ t,, and then reveals ths ntermedate belef to others. Snce sharng the belef can help each customer to get more nformaton and thus enhance the learnng performance, we assume the customers have the ncentve to do so. In the second step, each customer

4 612 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL 2015 Fg. 2. Non-Bayesan learnng rule for each dsh. combnes hs/her ntermedate belef wth other customers ntermedate belefs n a lnear manner. 2 Based on the Bayes theorem [23], the customer s ntermedate belef on the state of dsh r, μ t θ, θ }, can be calculated by, ={μ t, μ t, θ = f s t, θ f s t, θ p t 1 θ p t 1 θ, θ. 5 From 5, we can see that when customer has requested r at tme slot t, he/she wll ncorporate the correspondng sgnal nto hs/her ntermedate belef μ t,. Otherwse, he/she wll use the prevous belef p t 1. Then, each customer lnearly combnes hs/her ntermedate belef wth others customers ntermedate belefs as follows: p t θ = 1 N N =1 [ d t, μt, θ + 1 d t, ] p t 1 θ θ, and = 1, 2,...,M 6 where d t, s the strategy of customer at tme slot t. One may have already notced that, dfferent from the Chnese Restaurant Game where each customer can only choose one table [16], customers n the Indan Buffet Game can request multple dshes concurrently. Moreover, there are another two sgnfcant dfferences between those two games. One dfference s that each customer n the Chnese Restaurant Game can only make decson once whle customers n the Indan Buffet Game can make decsons repeatedly,.e., oneshot game versus repeated game, due to whch customers n the Indan Buffet Game can learn from ther prevous experences and an effectve learnng rule that can guarantee convergence s requred. The other dfference s the learnng rule. In the Chnese Restaurant Game, customers sequentally make decsons and then reveal ther sgnals to subsequent customers, where the Bayesan socal learnng rule s used for customers to combne the sgnals from prevous customers. However, n the Indan Buffet Game, nstead of revealng sgnals, customers only reveal ther ntermedate belef and the non-bayesan socal learnng rule s appled to generate the fnal belef. 2 Note that the state learnng processes,.e., the belef update, of all dshes are ndependent. III. INDIAN BUFFET GAME WITHOUT BUDGET CONSTRAINT In ths secton, we study the Indan Buffet Game wthout budget constrant, whch s correspondng to the case where L M n 4. When there s no budget constrant, customers should request all dshes that can gve them postve expected utlty to maxmze ther total expected utltes. We wll frst show that wthout budget constrant, the decson of whether to request one dsh or not s ndependent from other dshes,.e., the Indan Buffet Game that selects multple concurrent dshes s decoupled nto a seres of elementary Indan Buffet Games that select a sngle dsh. Although n elementary Indan Buffet Game each customer can only choose one dsh, whch s smlar to the Chnese Restaurant Game model [14], the dfference s that all customers cooperatvely estmate the dsh state n the Indan Buffet Game model, nstead of sequental learnng n the Chnese Restaurant Game model. We present a recursve algorthm that characterzes the subgame perfect equlbrum of the Indan Buffet Game wthout budget constrant. Fnally, we also dscuss the homogeneous case where customers have the same form of utlty functon to gan more nsghts. To show the ndependence among dfferent dshes, we frst defne the best response of a customer gven other customers actons. Let us defne n ={n,1, n,2,...,n,m } wth n, = d k, 7 k = beng the number of customers except customer choosng r. Let P = {p 1, p 2,...,p M }, where p = {p θ, θ } s the customer s belef regardng the state of dsh r at current tme slot. 3 Gven P and n, the best response of customer, d = d,1, d,2,...,d,m, can be wrtten as M d = BR P, n = arg max d, U, 8 d {0,1} M =1 where U, s customer s expected utlty of requestng dsh r gven belef P, whch can be calculated by U, = u, q, n, + d, f q θ p θ 9 Q where Q s the qualty/sgnal set and q Q. Snce there s no constrant for the optmzaton problem n 8 and U, s only related wth d,, we can rewrte 8 by M d = BR P, n = arg max d, U,. 10 =1 d {0,1} M From 10, we can see that the optmal decson on one dsh s rrelevant to the decsons on others, whch leads to the ndependence among dfferent dshes. In such a case, we have d, = arg max d, U,. 11 d, {0,1} The ndependence property enables us to smplfy the analyss by decouplng the orgn Indan Buffet Game nto M elementary Indan Buffet Games. The elementary Indan 3 Snce we dscuss the Indan Buffet Game n one tme slot, the superscrpt t s omtted n Sectons III and IV.

5 JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 613 Buffet Game s defned as the case when there s only one dsh for the customers, and thus, the decson for each customer s bnary,.e., to request or not to request. In the remanng of ths secton, we wll focus on the analyss of the elementary Indan Buffet Game and dscard the dsh ndex for notatonal smplfcaton. As a result, we can rewrte the best response of customer based on the latest belef nformaton pθ as follows: d = BR p, n = arg max d U 12 d {0,1} { 1, f U = = Q u q, n + 1f q θpθ>0 0, otherwse. A. Recursve Best Response Algorthm In ths secton, we study how to solve the best response defned n 12 for each customer. From 12, we can see that customer needs to know n to calculate the expected utlty U n order to decde whether to request the dsh or not. However, snce the customers make ther decsons sequentally, customer does not know the decsons of those who are after hm/her and thus needs to predct the subsequent customers decsons based on the belef and known nformaton. Let m denote the number of customers who wll request the dsh after customer, then we can wrte the recursve form of m as m = m +1 + d Let m d =0 and m d =1 represent m under the condton of d = 0 and d = 1, respectvely. Denoted by n = 1 k=1 d k s the number of customers choosng the dsh before customer. Then, the estmated number of customers choosng the dsh excludng customer can be wrtten as follows: ˆn d =0 = n + m d =0 14 ˆn d =1 = n + m d =1. 15 Note that ˆn d =0 and ˆn d =1 are dfferent from n snce the values of d +1, d +2,...,d N are estmated nstead of true observatons. Accordng to 15, we can compute the expected utlty of customer when d = 1as U d =1 = u q, n + m d =1 + 1f q θpθ. 16 Q Snce the utlty of customer s zero when d = 0, the best response of customer can be obtaned as { d 1, f U = d =1 > , otherwse. Wth 17, we can fnd the best response of customer gven belef p, current observaton n and predcted number of subsequent customers choosng the dsh, m d =1. Topredct m d =1, customer needs to predct the decsons of all customers from + 1toN. When t comes to customer N, snce he/she knows exactly the decsons of all the prevous customers, he/she can fnd the best response wthout makng any predcton,.e., m N = 0. Along ths lne, t s ntutve to desgn a recursve algorthm to predct m d =1 by consderng Algorthm 1 BR_EIBGp, n, f Customer == N then //******For customer N******// f U N = Q u Nq, n N + 1f q θpθ > 0 then d N 1 else dn 0 end f m N 0 else //******For customer 1, 2,...,N 1******// //***Predctng***// d +1, m +1 BR_EIBGp, n + 1, + 1 m m +1 + d +1 //***Makng decson***// f U = Q u q, n + m + 1f q θpθ > 0 then d 1 else d+1, m +1 BR_EIBGp, n, + 1 m m +1 + d +1 d 0 end f end f return d, m all possble decsons of customers from + 1toN and sequentally updatng m = m +1 + d +1. In Algorthm 1, we show the recursve algorthm BR_EIBGp, n, that descrbes how to predct m d =1 and fnd the best response d for customer, gven current belef p and observaton n. Moreover, n order to obtan a correct predcton of m n the recurson procedure, we calculate and return m d =0 when the best response of customer s 0. Note that n Algorthm 1, we have assumed that the functonal form of the utltes of all users are known by each other. In case that the utlty functon forms are unknown, users can take expectaton over the types of users based on some emprcal user-type dstrbuton, whch s not the focus of ths paper. Moreover, as for the complexty of Algorthm 1, t enumerates all possble combnatons of the choces made by customers n a dynamc programmng manner. Dfferent from the exhaustve search wth the complexty order of O2 N, Algorthm 1 s wth order ON 2, where N s the total number of customers. In the followng, we wll prove that the acton profle specfed n BR_EIBGp, n, s a subgame perfect NE for the elementary Indan Buffet Game. B. Subgame Perfect NE In ths secton, we wll show that Algorthm 1 leads to a subgame perfect NE for the elementary Indan Buffet Game. In the followng, we frst gve the formal defntons of NE, subgame, and subgame perfect NE. Defnton 1: Gven the belef p ={pθ, θ }, the acton profle d ={d1, d 2,...,d N } s a NE of the N-customer elementary Indan Buffet Game f and only f {1, 2,...,N}, d = BR p, k = d k as gven n 12. Defnton 2: A subgame of the N-customer elementary Indan Buffet Game conssts of the followng three elements: 1 t starts from customer wth = 1, 2,...,N; 2 t has the belef p at the current tme slot; and 3 t has the current observaton, n, whch s the decsons of the prevous customers.

6 614 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL 2015 Defnton 3: A NE s a subgame perfect NE f and only f t s a NE for every subgame. Accordng to the above defntons, we show n the followng theorem that the acton profle derved by Algorthm 1 s a subgame perfect NE of the elementary Indan Buffet Game. Theorem 1: Gven the belef p ={pθ, θ }, the acton profle d ={d1, d 2,...,d N }, wth d beng determned by BR_EIBGp, n, and n = 1 k=1 d k, s a subgame perfect NE for the elementary Indan Buffet Game. Proof: We frst show that dk s the best response of customer k n the subgame startng from customer, 1 k N. If k = N, we can see that BR_EIBGp, n N, N assgns the value of dn drectly as { dn 1, f = UN = Q u Nq, n N + 1f q θpθ>0 18 0, otherwse. Snce n N = n N,wehavedk = BR kp, n k n the case of k = N accordng to 12,.e., dk s the best response of customer k. If k < N, suppose dk s the best response of customer k derved by BR_EIBGp, n k, k. Ifdk = 0, denotng d k = 1as the contradcton, we can see from BR_EIBGp, n k, k that U k d k =1 = u k q, n k + m k + 1 f q θpθ Q < 0 = U k d k =0 19 whch means customer k has no ncentve to devate from dk = 1 gven the predcton of other customers decsons. If dk = 1, denotng d k = 0 as the contradcton, we can see from BR_EIBGp, n k, k that U k d k =0 = 0 < U k d k =1 = u k q, n k + m k + 1f q θpθ 20 Q whch means that customer k has no ncentve to devate from dk = 0 gven the predcton of other customers decsons. Therefore, dk = BR_EIBGp, n k, k s the best response of customer k n the subgame of the elementary Indan Buffet Game startng wth customer. Moreover, snce the statement s true for k satsfyng k N, we know that {d, d +1,...,d N } s the NE for the subgame startng from customer. Therefore, accordng to the defnton of subgame perfect NE, we can conclude that Theorem 1 s true. C. Homogeneous Case From the prevous secton, we know that a recursve procedure s requred to determne the best responses of the elementary Indan Buffet Game. Ths s due to the fact that one customer needs to predct the decsons of all subsequent customers to determne the best response of a certan customer. In ths secton, we smplfy the game wth a homogeneous settng to derve a more concse best response. In the homogeneous case, we assume that all customers have the same form of utlty functon,.e., u q, n = uq, n, for all, q, n. Under such a settng, the equlbrum can be characterzed n a much smpler way as shown n the followng lemma. Lemma 1: In the N-customer elementary Indan Buffet Game under a homogeneous settng, f d ={d1, d 2,...,d N } s the NE acton profle specfed by BR_EIBG, then we have d = 1 f and only f 0 n, where n = N k=1 dk. Proof: Suppose the best response of customer, d = 0. Then, accordng to Algorthm 1, we have U = uq, n + m d =1 + 1f q θpθ Q The predcton of m under the condton of d = 1 reles on the recursve estmatons of all subsequent customers decsons. In partcular, we have m d =1 = d +1 d =1 + m +1 d =1, where the value of d +1 d =1 can be computed as follows: { 1, f U+1 d +1 d =1 = d =1 > , otherwse wth U +1 d =1 = uq, n m +1 d =1 + 1f q θpθ. Q 23 Snce n m +1 d =1 + 1 n + m d =1 + 1 and uq, n s a decreasng functon n terms of n, wehaved +1 d =1 = 0 accordng to 21 and 23. Followng the same argument, we can show that d k d =1 = 0 for all k = + 1, + 2,...,N. Therefore, we have m d =1 = N k=+1 d k d =1 = Then, let us consder the best response of customer + 1, whch can be calculated by { d+1 1, f = U+1 > , otherwse where U +1 = uq, n +1 + m +1 d+1 =1 + 1f q θpθ. 26 Q Snce n +1 = n + d, m d =1 = 0 and m +1 d+1 =1 0, we have n +1 + m +1 d+1 =1 + 1 n + m d = Accordng to 21 and 26, and the decreasng property of utlty functon n terms of the number of customers sharng the same dsh, we have d+1 = 0. Followng the same argument, we can show that f d = 0, then dk = 0 for all k {+1, +2,...,N}. Snce all decsons can take values of ether 0 or 1, we have d = 1 f and only f 0 N k=1 dk. Ths completes the proof. From Lemma 1, we can see that there s a threshold structure n the NE of the elementary Inda Buffet Game wth the homogeneous settng. The threshold structure s emboded n the fact that f d = 0, then dk = 0, k {+1, +2,...,N}, and f d = 1, then dk = 1, k {1, 2,..., 1}. Theresult can be easly extended to the Indan Buffet Game wthout budget constrant under the homogeneous settng as shown n the followng theorem. Theorem 2: In the M-dsh and N-customer Indan Buffet Game wthout budget constrant, f all the customers have the same utlty functons, there s a threshold structure n the NE

7 JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 615 matrx D denoted by 2,.e., for any row {1, 2,...,M} of D, there s a T {1, 2,...,N} satsfyng that { d, 1, < = T 27 0, T. Proof: Ths theorem drectly follows by extendng Lemma 1 nto M ndependent dshes case. IV. INDIAN BUFFET GAME WITH BUDGET CONSTRAINT In ths secton, we study the Indan Buffet Game wth budget constrant, whch s correspondng to the case wth L < M n 4. Unlke the prevous case, when there s a budget constrant for each customer, the selecton among dfferent dshes s no longer ndependent but coupled. In the followng, we wll frst dscuss a recursve algorthm that can characterze the subgame perfect NE of the Indan Buffet Game wth budget constrant. Then, we dscuss a smplfed case wth a homogeneous settng to gan more nsghts. A. Recursve Best Response Algorthm In the budget constrant case, we assume that each customer can at most request L dshes at each tme slot wth L < M. In such a case, the best response of customer can be found by the followng optmzaton problem: M d = BR P, n = arg max d, U, d {0,1} M s.t. =1 M d, L < M 28 =1 where U, = u, q, n, + d, f q θ p θ. 29 Q From 28, we can see that customer s decson on dsh r s coupled wth all other dshes, and thus 28 cannot be decomposed nto M subproblems. Nevertheless, we can stll fnd the best response of each customer by comparng all possble combnatons of L dshes. Let = {φ 1, φ 2,...,φ H } denote the set of all combnatons of l 1 l L dshes out of M dshes, where H = L l=1 CM l = L l=1 M!/l!M l! and φ h = φ h,1,φ h,2,...,φ h,m s one possble combnaton wth φ h, representng whether dsh r s requested, that s φ h = 1, 1,...,1, 0, 0,...,0 }{{}}{{} 30 l M l means the customer requests dshes r 1, r 2,...,r l 1 l L. In other words, s the canddate strategy set of each customer wth constrant L. Let us defne customer s observaton of prevous customers decsons as n ={n,1, n,2,...,n,m } 31 where n, = 1 k=1 d k, s the number of customers choosng dsh r before customer. Letm denote the subsequent customers decsons after customer, we have ts recursve form as m = m +1 + d Then, let m d =φ h = { } m,1 d =φ h, m,2 d =φ h,...,m,m d =φ h 33 wth m, d =φ h be the predcted number of subsequent customers who wll request dsh r under the condton that d = φ h, where d = d,1, d,1,...,d,m and φ h. In such a case, the predcted number of customers choosng dfferent dshes excludng customer s ˆn d =φ h = n + m d =φ h. 34 Accordng to the defntons above, we can get customer s expected utlty by obtanng dsh r when d = φ h as U, d =φ h = φ h, u, q, n, + m, d =φ h + φ h, Q f q θ p θ. 35 Then, the total expected utlty customer can obtan wth d = φ h s the sum of U, d =φ h over all M dshes, that s U d =φ h = M U, d =φ h. 36 =1 In such a case, we can fnd the optmal φ h whch maxmzes customer s expected utlty U d =φ h as follows: φ h = arg max{u d =φ h } 37 φ h whch s the best response of customer. To obtan the best response n 37, each customer needs to calculate the expected utltes defned n 35, whch n turn requres to predct m, d =φ h,.e., the number of customers who choose dsh r after customer. When t comes to customer N who has already known all the prevous customers decsons, no predcton s requred. Therefore, smlar to Algorthm 1, gven belef P ={p 1, p 2,...,p M } at current tme slot and the current observaton n ={n,1, n,2,...,n,m }, we can desgn another recursve best response algorthm BR_IBGp, n, for solvng the Indan Buffet Game wth budget constrant n Algorthm 2. As one can see, customer N only needs to compare the expected utltes of requestng all M dshes and choose L or less than L dshes wth hghest postve expected utltes. Note that max L means fndng the hghest L values. For other customers, each one of them needs to frst recursvely predct the followng customers decsons, and then makes hs/her own decson based on the predcton and the current observatons. B. Subgame Perfect NE Smlar to the elementary Indan Buffet Game, we frst gve the formal defntons of NE and subgame of the Indan Buffet Game wth budget constrant. Defnton 4: Gven the belef P = {p 1, p 2,...,p M },the acton profle D ={d 1, d 2,...,d N } s a NE of the M-dsh and N-customer Indan Buffet Game wth budget constrant L, f and only f d = BR P, k = d k as defned n 28 for all. Defnton 5: A subgame of the M-dsh and N-customer Indan Buffet Game wth budget constrant L conssts of the followng three elements: 1 t starts from customer wth

8 616 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL 2015 Algorthm 2 BR_IBGP, n, f Customer == N then //******For customer N******// for = 1toM do U, = Q u N,q, n N, + 1f q θ p θ end for ={ 1, 2,..., L } arg max L {U, } {1,2,...,M} for = 1toM do f U, > 0&& then d N, 1 else dn, 0 end f end for m N = 0 else //******For customer 1, 2,...,N 1******// //***Predctng***// for φ h = φ 1 to φ H do d +1, m +1 BR_IBGP, n + φ h, + 1 m m +1 + d +1 U φ h = M φ h, f q θ p θ end for //***Makng decson***// φ h arg max{u φ h } φ h Q u,q, n, + m, + φ h, d +1, m +1 BR_IBGP, n + φ h, + 1 d φ h m m +1 + d +1 end f return d, m = 1, 2,...,N; 2 t has the belef P at current tme slot; and 3 t has the current observaton, n, whch comes from the decsons of the prevous customers. Based on Defntons 3 5, we show n the followng theorem that the acton profle obtaned by Algorthm 2 s a subgame perfect NE of the Indan Buffet Game wth budget constrant. Theorem 3: Gven the belef P = {p 1, p 2,...,p M }, the acton profle D ={d 1, d 2,...,d N }, where d determned by BR_IBGP, n, and n = 1 k=1 d k, s a subgame perfect NE for the Indan Buffet Game. Proof: The proof of ths theorem s smlar to that of Theorem 1, the detals of whch are omtted due to the page lmtaton. The proof outlne s that frst to show, k such that 1 N and k N, d k s the best response of customer k n the subgame startng from customer by analyzng two cases: k = N and k < N. Then, we can know that {d, d +1,...,d N } s the NE for the subgame startng from customer. Fnally, accordng to the defnton of subgame perfect NE, we can conclude that Theorem 3 s true. C. Homogenous Case In the homogenous case, we assume that all customers utlty functons are the same,.e., u, q, n = uq, n; and all dshes are n the same state,.e., the dsh state θ ={θ,θ,...,θ}. Under such crcumstances, we can fnd some specal propertes n the NE acton profle D of the Indan Buffet Game wth budget constrant. Frst, let us defne a parameter n T whch satsfes uq, nf q θpθ > 0, f n n T Q uq, nf q θpθ 0, f n > n T. Q 38 From 38, we can see that n T s the crtcal value such that the utlty of n T customers sharng a certan dsh s postve but becomes nonpostve wth one extra customer,.e., each dsh can be requested by at most n T customers. In the followng theorem, we wll show that, under the homogeneous settng, all dshes wll be requested by nearly equal number of customers,.e., the equal-sharng s acheved. Theorem 4: In the M-dsh and N-customer Indan Buffet Game wth budget constrant L, fallm dshes are n the same states and all N customers have the same utlty functon, the NE matrx D denoted by 2 satsfes that, for all dshes {r, = 1, 2,...,M} N d, = n T, f n T NL M NL =1 M or NL M, Proof: We prove ths theorem by contradcton as follows. Case 1: n T NL/M. f nt NL M. 39 Suppose that there exsts a NE D that contradcts 39. That s, there s a dsh r such that N=1 d, > n T or N=1 d, < n T.From38, we know that each dsh can be requested by at most n T customers, whch means that only N=1 d, < n T may hold. If N =1 d, < n T NL/M, we have M N=1 =1 d, < NL, whch means that there exsts at least one customer who requests less than L dshes,.e., M=1 d < L. However, accordng to 38, we have Q uq, N =1 d, + 1f q θpθ > 0, whch means that the utlty of customer can ncrease f he/she requests dsh r,.e., hs/her utlty s not maxmzed unless D s not a NE. Ths contradcts our assumpton. Therefore, we have N =1 d, = n T for all dshes when n T NL/M. Case 2: n T NL/M. Smlar to the arguments n Case 1, we can- not have M=1 N=1 d, < NL, whch means that M=1 N=1 d, = NL. Let us assume that there exsts a NE D that contradcts 39. Snce M N=1 =1 d, = NL, there s a dsh r 1 wth N =1 d, 1 < NL M and a dsh r 2 wth N =1 d, 2 > NL/M. In such a case, we have N=1 d, 2 > N =1 d, 1 + 1, whch leads to N u q, d, f q θpθ Q =1 > N u q, d, 2 f q θpθ. 40 Q =1 From 40, we can see that the customer who has requested dsh r 2 can obtan hgher utlty by unlaterally devatng hs/her decson by requestng dsh r 1. Therefore, D s not a NE of the Indan Buffet Game wth budget constrant L, and thus we have N =1 d, = NL/M or NL/M, when n T NL/M. Ths completes the proof of the theorem.

9 JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 617 V. NON-BAYESIAN SOCIAL LEARNING In the prevous two sectons, we have analyzed the proposed Indan Buffet Game and characterzed the correspondng equlbrum. From the analyss, we can see that the equlbrum hghly depends on customers belef P ={p, = 1, 2,...,M},.e., the estmated dstrbuton of the dsh state θ = {θ, = 1, 2,...,M}. The more accurate the belef s, the better best response customers can make and thus the better utlty customers can obtan. Therefore, t s very mportant for customers to mprove ther belef by explotng ther nferred sgnals. In ths secton, we wll dscuss the learnng process n the proposed Indan Buffet Game. Specfcally, we propose an effectve non-bayesan socal learnng algorthm that can guarantee customers to learn the true system state. In socal learnng, the users ratonalty and the way they process sgnal nformaton s the essental dfference between non-bayesan learnng and Bayesan learnng, whle the mplementaton dfference between them s that customers exchange ther sgnal nformaton n Bayesan learnng rule nstead of exchangng the ntermedate belef nformaton n our proposed non-bayesan learnng rule. The motvaton of desgnng the non-bayesan learnng rule s that customers can frst dstrbutedly process ther own sgnals and then cooperatvely estmate the belef regardng the dsh state, whch can greatly decrease the computatonal cost of each customer. Note that snce the learnng process of one dsh state θ are ndependent of others, n the rest of ths secton, we omt the dsh ndex for notaton smplfcaton. Suppose the true dsh state s θ, gven customers belef at tme slot t, p t ={p t θ, θ }, ther belef at tme slot t + 1, p t+1 ={p t+1 θ, θ }, can be updated by p t+1 θ = 1 N [ ] d t+1 μ t+1 θ + 1 d t+1 p t θ N =1 41 where d t+1 = 1 or 0 s customer s decson, and μ t+1 θ s the ntermedate belef updated by the Bayesan learnng rule for customers who have requested the dsh and nferred some sgnal s t+1 f θ, that s μ t+1 θ = f st+1 θp t θ f, θ. 42 st+1 θp t θ Defnton 6: A learnng rule has the strong convergence property f and only f the learnng rule can learn the true state n probablty such that { p t θ 1, p t θ = θ as t. 43 0, By reorganzng some terms, we can rewrte the non-bayesan learnng rule n 41 as p t+1 θ = p t θ + 1 N d t+1 f s t+1 θ 1 p t θ N =1 s t+1 44 wth s t+1 = f s t+1 θ p t θ. 45 s the estmaton of the probablty dstrbuton of the sgnal s t+1 at the next tme slot. Wth s t+1, we can defne a weak convergence, compared wth the strong convergence n 43, as follows. Defnton 7: A learnng rule has the weak convergence property f and only f the learnng rule can learn the true state n probablty such that From 45, we can see that s t+1 s = f s θp t θ f s θ, s Q, as t. 46 Notce that the weak convergence s suffcent for the proposed Indan Buffet Game snce the obectve of learnng s to fnd an accurate estmate of the expected utltes of customers and thus derve the true best response. Accordng to 9, we can see that the sgnal dstrbuton f q θ p θ s a suffcent statstc of the expected utlty functon. Therefore, f t can be shown that the proposed socal learnng algorthm has the weak convergence property, then we are able to derve the true best response for customers n the proposed Indan Buffet Game. In the followng theorem, we wll show and prove that the proposed learnng algorthm n 41 ndeed has the weak convergence property. We wll also show wth smulaton that the proposed learnng algorthm n 41 has the strong convergence property. Theorem 5: In the Indan Buffet Game, suppose that the true dsh state s θ, all customers update ther belef p usng 41 and ther pror belef p 0 satsfes p 0 θ >0, then, the belef sequence {p t θ} ensures a weak convergence,.e., for s Q s = f s θp t θ f s θ, as t. 47 Proof: See the Appendx. VI. SIMULATION RESULTS In ths secton, we conduct smulatons to verfy the performance of the proposed non-bayesan socal learnng rule and recursve best response algorthms. We smulate an Indan Buffet restaurant wth fve dshes {r 1, r 2, r 3, r 4, r 5 } and fve possble dsh states θ {1, 2, 3, 4, 5}. Each dsh s randomly assgned wth a state. After requestng a specfc dsh r, customer can nfer the qualty of the dsh and a sgnal s, {1, 2, 3, 4, 5} obeyng the condtonal dstrbuton that f s, θ = { w, f s, = θ 1 w/4, f s, = θ. 48 The parameter w can be nterpreted as the qualty of the sgnal or customers detecton probablty. When the sgnal qualty w s close to 1, the customers nferred sgnal s more lkely to reflect the true dsh state. Note that w must satsfy w 1/5; otherwse, the true state can never be learned correctly. Wth the sgnals, customers can update ther belef P cooperatvely at the next tme slot and then make ther decsons sequentally. Once the th customer makes the dsh selecton, he/she reveals hs/her decsons to other customers. After all customers make ther decsons, they begn to share the correspondng dshes they have requested. The customer s utlty of requestng dsh r s gven by u, = γ s, R N c 49

10 618 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL 2015 TABLE I NE MATRIX D where γ s the utlty coeffcent for customer snce dfferent customers may have dfferent utltes regardng same reward, s, s a realzaton of dsh qualty, as well as the sgnal nferred by customer, R s the bass award of requestng each dsh as R = 10, N s the overall number of customers requestng dsh r and c s the cost of requestng dsh r as {c = 1, }. From 48 and 49, we can see that by requestng dsh wth hgher level of state, e.g., θ = 5, customers can obtan hgher utltes. However, the dsh state s unknown to customers and they have to estmate t through socal learnng. On the other hand, we can also see that the more customers requestng a same dsh, the less utlty each customer can obtan, whch manfests the negatve network externalty. A. Indan Buffet Game Wthout Budget Constrant In ths secton, we evaluate the performance of the proposed best response algorthm for Indan Buffet Game wthout budget constrant. We frst smulate the homogenous case to verfy the threshold property of the NE matrx,.e., Theorem 2, and the mpact of dfferent decson makng orders on customers utltes,.e., makng decsons earler may have advantage. Then, we compare the performance of the proposed best response algorthm,.e., Algorthm 1, wth the performance of other algorthms n heterogeneous settngs. For the homogenous case, we set all customers utlty coeffcents as γ = 1. The customers pror belef regardng the dsh state starts wth a unform dstrbuton,.e., {p 0 θ = 0.2,,θ}. The dsh state s set as = [1, 2, 3, 4, 5],.e., θ =, n order to verfy dfferent threshold structures for dfferent dsh states as llustrated n Theorem 2. At each tme slot, we let customers sequentally make decsons accordng to Algorthm 1 and then update ther belef accordng to the non-bayesan learnng rule. The game s played tme slot by tme slot untl customers belef P t converges. In the frst smulaton, we set the number of customers as N = 10 to specfcally verfy the threshold structure of NE matrx. Table I shows the NE matrx D derved by Algorthm 1 after customers belef P t converges, where each column contans one customer s decsons {d,, } and each row contans all customers decsons on one specfc dsh r,.e., {d,, }. From the table, we can see that once a customer does not request some specfc dsh, all the subsequent customers wll not request that dsh, whch s consstent wth the concluson n Theorem 2. Moreover, snce requestng the dsh wth hgher level of state, e.g., θ 5 = 5, can obtan more utltes, we can see that most customers decded to request dsh r 5. From Table I, we can see that customers who make decsons early have advantage, e.g., customer 1 can request all dshes whle customer 8 can only request one dsh. Therefore, n the Fg. 3. Each customer s utlty n homogenous case wthout budget constrant. second smulaton of the homogenous case, we dynamcally adust the order of decson makng to ensure the farness. In ths smulaton, we assume that there are fve customers wth a common utlty coeffcent γ = 0.4. In Fg. 3, weshow all customers utltes along wth the smulaton, where the order of decson makng changes every 100 tme slots. In the frst 100 tme slots, where the order of decson makng s , we can see that customer 1 obtans the hghest utlty and customer 4 and 5 receve 0 utlty snce they have not requested any dsh. In the second 100 tme slots, we reverse the decson makng order to , whch results customer 1 and 2 recevng 0 utlty. Therefore, by perodcally changng the order of decson makng, we can nfer that the expected utltes of all customers wll be the same after a perod of tme. For the heterogeneous case, we randomze each customer s utlty coeffcent γ between 0 and 1 and set ther pror belef as {p 0 θ = 0.2,,θ}. In ths smulaton, we compare the performance n terms of customers socal welfare, whch s defned as the total utltes of all customers, among dfferent knds of algorthms lsted as follows. 1 Best Response: The proposed recursve best response algorthm n Algorthm 1 wth non-bayesan learnng. 2 Myopc: At each tme slot, customer requests dshes accordng to current observaton n ={n,, } wthout socal learnng. 3 Learnng: At each tme slot, each customer requests dshes purely based on the updated belef P t usng non-bayesan learnng rule wthout consderng the negatve network externalty. 4 Random: Each customer randomly requests dshes. For the myopc and learnng strateges, customer s expected utlty of requestng dsh r can be calculated by U m, = U l, = Q Q u, q, n, + d, f q θ p 0 θ 50 u, q, d, f q θ p t θ. 51

11 JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 619 TABLE II NE MATRIX D Fg. 4. Socal welfare comparson wthout budget constrant. Wth these expected utltes, both myopc and learnng algorthms can be derved by 8. We can see that the myopc strategy does not take nto account socal learnng whle the learnng strategy does not nvolve negatve network externalty. In the smulaton, we average these four algorthms over hundreds of realzatons. Fg. 4 shows the performance comparson result, where the x-axs s the sgnal qualty w varyng from 0.5 to 0.95 and y-axs s the socal welfare averaged over hundreds of tme slots. From the fgure, we can see wth the ncrease of sgnal qualty, the socal welfare keeps ncreasng for all algorthms. Moreover, we can also see that our best response algorthm performs the best whle the learnng algorthm performs the worst. Ths s because, wth the learnng algorthm, customers can gradually learn the true dsh states and then request the dsh wthout consderng other customers decsons. In such a case, too many customers may request the same dshes and each customer s utlty s dramatcally decreased due to the negatve network externalty. For the myopc algorthm, although customers can not learn the true dsh states, by consderng other customers decsons, each customer can avod requestng dshes whch have been over-requested. Therefore, we can conclude that our proposed best response algorthm acheves the best performance through consderng the negatve network externalty and usng socal learnng to estmate the dsh state. B. Indan Buffet Game Wth Budget Constrant In ths secton, we evaluate the performance of the proposed best response algorthm for Indan Buffet Game wth budget constrant L = 3. Smlar to the prevous secton, we start from the homogenous case, where all customers utlty coeffcents are set as γ = 1. In the frst smulaton, we set all dsh states as θ = 5 to verfy the property of the NE matrx llustrated n Theorem 4. Table II shows the NE matrx D derved by Algorthm 2. We can see that each dsh has been requested by N L/M = 10 3/5 = 6 customers, whch s consstent wth the concluson n Theorem 4. In the second smulaton, we dynamcally change the order of customers sequental decson makng and llustrate each customer s utlty along wth smulaton n Fg. 5, from whch we can see Fg. 5. Each customer s utlty n homogenous case wth budget constrant. smlar phenomenon to the Indan Buffet Game wthout budget constrant. For the heterogeneous case, we randomze each customer s utlty coeffcent γ wthn [0, 1] and compare the performance of our proposed best response algorthm,.e., Algorthm 2, wth myopc, learnng and random algorthms n terms of customers socal welfare. For the myopc, learnng and random algorthms, same budget constrant s adopted,.e., each customer can at most request 3 dshes. Fg. 6 shows the performance comparson result, from whch we can see that our best response algorthm performs the best whle the learnng algorthm performs the worst. C. Non-Bayesan Socal Learnng Performance In ths secton, we evaluate the performance of the proposed non-bayesan socal learnng rule. At the begnnng of the smulaton, we randomze the states of fve dshes and assgn customers pror belef regardng each dsh state wth unform dstrbuton,.e., {p θ = 0.2,,θ}. After requestng the chosen dshes, each customer can nfer sgnals followng the condtonal dstrbuton defned n 48 wth sgnal qualty w = 0.6. Fg. 7 shows the learnng curve of the Indan Buffet Game wth and wthout budget constrant, respectvely. The y-axs s the dfference between customers belef at each tme slot P t and the true belef P o ={p = p θ = 1, p θ = θ = 0, }, whch can be calculated by P t P o 2. From the fgure, we can see that

12 620 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL 2015 Fg. 6. Socal welfare comparson wth budget constrant. Fg. 8. Socal welfare comparson n the applcaton of relay selecton. Fg. 7. Performance of the non-bayesan socal learnng rule. customers can learn the true dsh states wthn 15 tme slots. Moreover, the convergence rate of the case wthout budget constrant s faster than that of the case wth budget constrant. Ths s because, due to the budget constrant, each customer requests fewer dshes at each tme slot and thus can nfer fewer sgnals regardng the dsh state, whch wll nevtably slow down the customers learnng speed. D. Applcaton n Relay Selecton of Cooperatve Communcaton In ths secton, we dscuss an applcaton of the Indan Buffet Game n the relay selecton of cooperatve communcatons. In the applcaton, we consder a wreless network wth N source nodes or users, whch am at sendng ther messages to the destnaton nodes. There are M potental relay nodes wth dfferent relay capabltes, gven the dfferent channel condtons, transmsson power, and processng speed constrants. Each source node can select at most L relays n each tme slot that help them relay ther messages to the destnaton node. All the source nodes are assumed to be ratonal,.e., each of them selects the relays that can maxmze ts own expected data rate. Frst, the more source nodes select the same relay n one tme slot, the less throughput can be obtaned by each source node,.e., the negatve externalty. Second, snce the source nodes may not exactly know the capacty of each relay node, they need to estmate the relay state by learnng from the hstory and the current sgnals that reflect the relay propertes. Thrd, the source nodes are not necessarly synchronzed, whch means that they may make the relay selecton n a sequental manner. Consderng these three propertes, we can see that the proposed Indan Buffet Game s an deal tool to formulate the relay selecton problem, where the source nodes are the players and the relay nodes are correspondng to the dshes n the restaurant. In the smulaton, we set fve source nodes and fve relay nodes wth fve possble relay state θ {1, 2, 3, 4, 5}. Each source node can receve a sgnal on the capacty of the selected relay nodes after data transmsson, whch obeys the dstrbuton f s, θ as n 48. Assumng that the source nodes share the same relay n a tme-dvson manner, we can defne the utlty of the th source node selectng relay node as u, = s,ḡ, N c 52 where ḡ, s the gan of the th source node by selectng relay whch depends on the channel gan, N s the total number of source nodes sharng relay and c s the cost of selectng relay whch can be consdered as the prce of relay servce. Wth the utlty defnton, we can conduct smulaton to evaluate the performance of relay selecton wth Indan Buffet Game and compare t wth that of myopc strategy, random strategy and learnng strategy. Fg. 8 shows the socal welfare comparson results, ncludng two cases: budget-constrant case where each source node can at most select L = 3 relays and wthout-constrant-case where each source node can at most select L = 5 relays. From the fgure, we can see that, smlar to the prevous comparson results, our best response algorthm has the best performance n both cases. Therefore, our proposed Indan Buffet Game can be well appled n the relay selecton of cooperatve communcatons.

13 JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 621 VII. CONCLUSION In ths paper, we proposed a general framework, called Indan Buffet Game, to study how users make multple concurrent decsons under uncertan system state. We studed the Indan Buffet Game under two dfferent scenaros: customers request multple dshes wthout budget constrant and wth budget constrant, respectvely. We desgned best response algorthms for both cases to fnd the subgame perfect NE, and dscussed the smplfed homogeneous cases to better understand the proposed Indan Buffet Game. We also desgned a non-bayesan socal learnng rule for customers to learn about the dsh state and theoretcally prove ts convergence. Smulaton results show that our proposed algorthms acheve much better performance than myopc, learnng and random algorthms. Moreover, the proposed non-bayesan learnng algorthm can help customers learn the true system state wth faster convergence speed. We would lke to emphasze that our focus n ths paper s to study the nteractons and decson makng behavors of agents n an uncertan negatve-externalty envronment wth the concept of socal learnng, whch s a fundamental problem exstng n the sgnal processng, wreless communcaton networks, and socal networks. Meanwhle, our work n ths paper also has lmtatons due to some assumptons n the analyss. One s that all the customers are assumed to share ther belef nformaton wth each other, gven that sharng the belef nformaton can enhance the learnng performance based on socal learnng. We assume that the ratonal customers have the ncentve to do so. In addton, one of our ongong works s to study a more general scenaro where each user does not reveal hs/her belef nformaton and only acton nformaton can be observed. The other man assumpton s that each customer knows all others utlty functon forms n Algorthms 1 and 2. Note that Algorthms 1 and 2 are ust desgned to fnd the NE of the Indan Buffet Game. When each user have no knowledge about other users utlty functon forms, he/she can take an expectaton over the types of users accordng to some emprcal user-type dstrbuton, whch s also one of our ongong works. The proposed Indan Buffet Game can be appled n varous felds. One mportant applcaton s the relay selecton problem n cooperatve communcatons, as shown n Secton VI-D, Indan Buffet Game can well model and solve the problem of such relay selecton n cooperatve communcatons. Moreover, the multchannel sensng and access problem n cogntve rado networks can also be modeled by Indan Buffet Game, where each secondary user needs to learn the prmary channel state and tres to access the channel wth least secondary users,.e., consderng the negatve externalty [24]. For those problems, exstng algorthms or models ether only consder the negatve externalty or only consder the socal learnng, wthout ntegratng the learnng and decson makng wth negatve externalty together. In such a case, the performance of those algorthms perform worse than the proposed Indan Buffet Game model, whch can be seen from the comparson results n Fgs. 4, 6, and 8. Therefore, the research value of Indan Buffet Game model le n that t presents a general tool for agents to make optmal sequental decsons when confronted wth uncertan system state and negatve externalty characterstc. APPENDIX PROOF OF THEOREM 5 Proof of Lemma Let us frst defne a probablty trple, F, P θ for some specfc dsh state θ, where s the space contanng sequences of realzatons of the sgnals s t Q, F s the σ -feld generated by,.e., a set of subsets of, and P θ s the probablty measure nduced over sample paths n,.e., P θ = t=1 f θ. Moreover, we use E θ [ ] to denote the expectaton operator assocated wth measure P θ, and defne F t as the smallest σ -feld generated by the past hstory of all customers observatons up to tme slot t. To prove the weak convergence n 46, we start by showng that the belef sequence {p t θ } converges to a postve number as t by the followng lemmas. Lemma 2: Suppose the true dsh state s θ, all customers update ther belef p accordng to the non-bayesan learnng rule n 41 and ther pror belef p 0 satsfes p 0 θ >0, then, the belef sequence {p t θ } converges to a postve number as t. Proof: From 41 and 42, we can see that f p t θ > 0, then p t+1 θ > 0. Snce the pror belef satsfes p 0 θ >0, accordng to the method of nducton, we have the belef sequence {p t θ } > 0. Accordng to 44, for the true dsh state θ,wehave p t+1 θ = p t θ + 1 N d t+1 f s t+1 θ 1 N =1 s t+1 p t θ. 53 By takng expectaton over F t on both sdes of 53, we have [ ] E θ p t+1 θ F t = p t θ + 1 N E θ N =1 d t+1 f s t+1 θ 1 F t p t θ. 54 s t+1 Accordng to the tme slot structure shown n Fg. 1, each customer s decson at tme slot t + 1 s made accordng to the belef updated at the end of last tme slot t. In such a case, when gven all the hstory nformaton up to tme slot t, F t, the decson d t+1 s ndependent from the sgnal s t+1 whch s receved at the end of tme slot t + 1. In such a case, we can separate the expectaton n the second term of 54 as E θ d t+1 f s t+1 θ 1 s t+1 [ = E θ d t+1 ] F t E θ F t f s t+1 θ 1 F t. 55 s t+1 [ In 55, E θ d t+1 ] Ft 0 snce d t+1 can only be 1 or 0. Moreover, snce gx = 1/x s a convex functon, accordng

14 622 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 4, APRIL 2015 to Jensen s nequalty, we have f s t+1 θ F t E θ s t+1 = Q f E θ f s t+1 s t+1 θ f s t+1 1 s t+1 θ F t 1 θ = s t+1 In such a case, the equaton n 55 s nonnegatve, whch means that n 54 E θ [ p t+1 θ F t ] p t θ. 57 Snce customers belef p t θ s bounded wthn nterval [0, 1], accordng to the martngale convergence theorem [25], we can conclude that the belef sequence {p t θ } converges to a postve number as t. Proof of Theorem 5 Proof: Let N t+1 denote the set of customers who request the dsh at tme slot t+1. In such a case, we can rewrte 53 as p t+1 θ 1 f s t+1 θ = N t+1 p t θ 58 N t+1 s t+1 where means the cardnalty. By takng logarthmc operaton on both sdes of 58 and utlzng the concavty of the logarthm functon, we have log p t+1 θ log p t θ + 1 N t+1 N t+1 log f s t+1 θ. 59 s t+1 Then, by takng expectaton over F t on both sdes of 59, we have [ ] E θ log p t+1 θ F t log p t θ 1 f s t+1 θ log N t+1 F t. 60 N t+1 E θ s t+1 As to the left hand of 60, accordng to Lemma 2, we know that p t θ wll converge as t, and thus E θ E θ [ log p t+1 θ F t ] log p t θ As to the rght hand of 60, t follows: f s t+1 θ s t+1 log s t+1 F t = E θ log f s t+1 θ F t log E θ s t+1 f s t+1 θ F t = In such a case, combnng 61 and 62, as t, we have 1 f s t+1 θ 0 log N t+1 F t N t+1 E θ s t+1 By squeeze theorem [26], we have for N t+1, as t f s t+1 E θ θ log s t+1 F t = f s t+1 θ f s t+1 θ log s t+1 Accordng to Gbbs nequalty [27], the 64 converges to 0 f and only f as t s t+1 f s t+1 θ, s t+1 Q. 65 Ths completes the proof of the theorem. REFERENCES [1] M. L. Katz and C. Shapro, Technology adopton n the presence of network externaltes, J. Polt. Econ, vol. 94, no. 4, pp , [2] W. H. Sandholm, Negatve externaltes and evolutonary mplementaton, Rev. Econ. Stud., vol. 72, no. 3, pp , [3] Y. Yang, Y. Chen, C. Jang, C. Wang, and K. J. R. Lu, Wreless access network selecton game wth negatve network externalty, IEEE Trans. Wreless Commun., vol. 12, no. 10, pp , Oct [4] H. V. Zhao, W. S. Ln, and K. J. R. Lu, Behavor Dynamcs n Meda- Sharng Socal Networks. Cambrdge, U.K.: Cambrdge Unv. Press, [5] Y. Chen and K. J. R. Lu, Understandng mcroeconomc behavors n socal networkng: An engneerng vew, IEEE Sgnal Process. Mag., vol. 29, no. 2, pp , Mar [6] D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar, Bayesan learnng n socal networks, Rev. Econ. Stud., vol. 78, no. 4, pp , [7] V. Krshnamurthy and H. V. Poor, Socal learnng and Bayesan games n multagent sgnal processng: How do local and global decson makers nteract? IEEE Sgnal Process. Mag., vol. 30, no. 3, pp , May [8] V. Bala and S. Goyal, Learnng from neghbours, Rev. Econ. Stud., vol. 65, no. 3, pp , [9] D. Gale and S. Karv, Bayesan learnng n socal networks, Games Econ. Behav., vol. 45, no. 11, pp , [10] D. Acemoglu and A. Ozdaglar, Opnon dynamcs and learnng n socal networks, Dyn. Games Appl., vol. 1, no. 1, pp. 3 49, [11] L. G. Epsten, J. Noor, and A. Sandron, Non-Bayesan learnng, B.E. J. Theor. Econ., vol. 10, no. 1, pp. 1 16, [12] B. Golub and M. O. Jackson, Nave learnng n socal networks and the wsdom of crowds, Amer. Econ. J. Mcroecon., vol. 2, no. 1, pp , [13] A. Jadbabae, P. Molav, A. Sandron, and A. Tahbaz-Saleh, Non-Bayesan socal learnng, Games Econ. Behav., vol. 76, no. 1, pp , [14] C.-Y. Wang, Y. Chen, and K. J. R. Lu, Chnese restaurant game, IEEE Sgnal Process. Lett., vol. 19, no. 12, pp , Dec [15] B. Zhang, Y. Chen, C.-Y. Wang, and K. J. R. Lu, Learnng and decson makng wth negatve externalty for opportunstc spectrum access, n Proc. IEEE Glob. Commun. Conf. Globecom, Anahem, CA, USA, 2012, pp [16] C.-Y. Wang, Y. Chen, and K. J. R. Lu, Sequental Chnese restaurant game, IEEE Trans. Sgnal Process., vol. 61, no. 3, pp , Feb [17] D. Aldous, I. Ibragmov, J. Jacod, and D. Aldous, Exchangeablty and related topcs, Lect. Notes Math., vol. 1117, no. 12, pp , 1985.

JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 623 [18] C. Jang, Y. Chen, Y.-H. Yang, C.-Y. Wang, and K. J. R.

Ghahraman, The Indan buffet process: An ntroducton and revew, J. Mach. Learn. Res., vol. 12, pp. 1185 1224, Feb. 2011. [20] A.

Sheremeta, Smultaneous decson-makng n compettve and cooperatve envronments, Econ. Inqury, vol. 51, no. 2, pp. 1311 1323, 2013. [22] T. M. Mtchell, Machne Learnng. New York, NY, USA: McGraw-Hll, 1987.

Wreless Commun., vol. 13, no. 4, pp. 2176 2188, Apr. 2014. [25] R. Isaac, Shorter notes: A proof of the martngale convergence theorem, Proc. Amer. Math. Soc., vol. 16, no. 4, pp. 842 844, 1965.

15 JIANG et al.: INDIAN BUFFET GAME WITH NEGATIVE NETWORK EXTERNALITY AND NON-BAYESIAN SOCIAL LEARNING 623 [18] C. Jang, Y. Chen, Y.-H. Yang, C.-Y. Wang, and K. J. R. Lu, Dynamc Chnese restaurant game: Theory and applcaton n cogntve rado networks, IEEE Trans. Wreless Commun., vol. 13, no. 4, pp , Apr [19] T. L. Grffths and Z. Ghahraman, The Indan buffet process: An ntroducton and revew, J. Mach. Learn. Res., vol. 12, pp , Feb [20] A. Aradllas-Lopez, Semparametrc estmaton of a smultaneous game wth ncomplete nformaton, J. Econ., vol. 157, no. 2, pp , [21] A. Savkhn and R. M. Sheremeta, Smultaneous decson-makng n compettve and cooperatve envronments, Econ. Inqury, vol. 51, no. 2, pp , [22] T. M. Mtchell, Machne Learnng. New York, NY, USA: McGraw-Hll, [23] R. Swnburne, Bayes s Theorem. Oxford, U.K.: Oxford Unv. Press, [24] C. Jang, Y. Chen, and K. J. R. Lu, Mult-channel sensng and access game: Bayesan socal learnng wth negatve network externalty, IEEE Trans. Wreless Commun., vol. 13, no. 4, pp , Apr [25] R. Isaac, Shorter notes: A proof of the martngale convergence theorem, Proc. Amer. Math. Soc., vol. 16, no. 4, pp , [26] J. Stewart, Chapter 15.2 lmts and contnuty, n Multvarable Calculus. Emeryvlle, CA, USA: Key Currculum Press, [27] L. M. Surhone, M. T. Tennoe, and S. F. Henssonow, Gbbs Inequalty. Saarbrücken, Germany: Betascrpt, Yang Gao S 12 receved the B.S. degree n electronc engneerng from Tsnghua Unversty, Beng, Chna, n He s currently pursung the Ph.D. degree n the Department of Electrcal and Computer Engneerng, Unversty of Maryland, College Park, College Park, MD, USA. Hs current research nterests nclude strategc behavor analyss and ncentve mechansm desgn for crowdsourcng and socal computng. Mr. Gao was the recpent of the Slver Medal of the 21st Natonal Chnese Physcs Olympad, the Honor of the Excellent Graduate of Tsnghua Unversty n 2009, the Unversty of Maryland Future Faculty Fellowshp n 2012, and the IEEE Globecom 2013 Best Paper Award. Chunxao Jang S 09 M 13 receved the B.S. Hons. degree n nformaton engneerng from the Beng Unversty of Aeronautcs and Astronautcs Behang Unversty, Beng, Chna, and the Ph.D. Hons. degree from Tsnghua Unversty THU, Beng, n 2008 and 2013, respectvely. From 2011 to 2013, he vsted the Sgnals and Informaton Group, Department of Electrcal and Computer Engneerng, Unversty of Maryland, College Park, College Park, MD, USA, wth Prof. K. J. R. Lu. He s currently a Postdoctoral Fellow wth the Electrcal Engneerng Department, THU, wth Prof. Y. Ren. Hs current research nterests nclude the applcatons of game theory and queung theory n wreless communcaton and networkng and socal networks. Dr. Jang was the recpent of the Best Paper Award from the IEEE GLOBECOM n 2013, the Beng Dstngushed Graduated Student Award, the Chnese Natonal Fellowshp, and the Tsnghua Outstandng Dstngushed Doctoral Dssertaton n Yan Chen S 06 M 11 SM 14 receved the bachelor s degree from the Unversty of Scence and Technology of Chna, Hefe, Chna, the M.Phl. degree from the Hong Kong Unversty of Scence and Technology, Hong Kong, and the Ph.D. degree from the Unversty of Maryland, College Park, College Park, MD, USA, n 2004, 2007, and 2011, respectvely. Hs current research nterests nclude data scence, network scence, game theory, socal learnng and networkng, as well as sgnal processng, and wreless communcatons. Dr. Chen was the recpent of multple honors and awards, ncludng the Best Paper Award from the IEEE GLOBECOM n 2013, Future Faculty Fellowshp, and the Dstngushed Dssertaton Fellowshp Honorable Menton from the Department of Electrcal and Computer Engneerng, n 2010 and 2011, respectvely, a fnalst of Dean s Doctoral Research Award from the A. James Clark School of Engneerng, Unversty of Maryland, n 2011, and the Chnese Government Award for Outstandng Students Abroad n K. J. Ray Lu F 03 was as a Dstngushed Scholar-Teacher at the Unversty of Maryland, College Park, College Park, MD, USA, n 2007, where he s the Chrstne Km Emnent Professor of Informaton Technology. He leads the Maryland Sgnals and Informaton Group conductng research encompassng broad areas of sgnal processng and communcatons wth recent focus on cooperatve and cogntve communcatons, socal learnng and network scence, nformaton forenscs and securty, and green nformaton and communcatons technology. He s an ISI hghly cted author. Dr. Lu was the recpent of the IEEE Sgnal Processng Socety 2009 Techncal Achevement Award and varous best paper awards. He was also the recpent of varous teachng and research recogntons from the Unversty of Maryland ncludng the Unversty-Level Inventon of the Year Award, the Poole and Kent Senor Faculty Teachng Award, the Outstandng Faculty Research Award, and the Outstandng Faculty Servce Award, all from the A. James Clark School of Engneerng. He s a Dstngushed Lecturer of the IEEE Sgnal Processng Socety. He s the Past Presdent of the IEEE Sgnal Processng Socety, where he has served as a Presdent, Vce Presdent Publcatons, and as the Board of Governor. He was the Edtor-n-Chef of the IEEE SIGNAL PROCESSING MAGAZINE and a Foundng Edtor-n-Chef of the EURASIP Journal on Advances n Sgnal Processng. He s a Fellow of the Amercan Assocaton for the Advancement of Scence.

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton