Subjective Equilibria in Interactive POMDPs: Theory and Computational Limitations

Size: px
Start display at page:

Download "Subjective Equilibria in Interactive POMDPs: Theory and Computational Limitations"

Transcription

1 Subectve Equlbra n Interactve POMDPs: Theory and Computatonal Lmtatons Prashant Dosh and Potr Gmytrasewcz Dept. of Computer Scence, Unversty of Illnos at Chcago, IL {pdosh,potr}@cs.uc.edu Abstract We analyze the asymptotc behavor of agents engaged n a nfnte horzon partally observable stochastc game formalzed by the nteractve POMDP framework. We show that when agents ntal belefs satsfy a truth compatblty condton, ther behavor converges to a subectve ɛ-equlbrum n a fnte tme, and subectve equlbrum n the lmt. Imposng an addtonal assumpton of mutual sngularty on agents ntal belefs makes ther behavor converge to Nash equlbrum. Whle theoretcally sound, the equlbratng process s dffcult to demonstrate computatonally because of the dffculty n comng up wth ntal belefs that satsfy the truth compatblty condton. 1 Introducton We analyze the nteractons takng place between agents partcpatng n an nfnte horzon partally observable stochastc game formalzed wthn the framework of nteractve POMDPs (I-POMDPs) [Gmytrasewcz and Dosh, 2005, Gmytrasewcz and Dosh, 2004]. I-POMDPs represent and solve a partally observable stochastc game (POSG) from the perspectve of an agent playng the game. Ths approach, also called the decson-theoretc approach to game theory [Kadane and Larkey, 1982], dffers from the obectve representaton of POSGs as outlned n [Hansen et al., 2004]. We consder the settng n whch an agent may be unaware of the other agents behavoral strateges, t s uncertan about ther observatons, and t may be unable to perfectly observe the other agents actons. In accordance to Bayesan decson theory, the agent mantans and updates ts belef about the physcal state as well as the strateges of the other agents, and t ts decsons are best responses to ts belefs. Under the assumpton of compatblty of agents pror belefs about future observatons wth the true dstrbuton nduced by the actual strateges of all agents, we show that for agents modeled wthn the I-POMDP framework, the followng propertes hold: () the agents belefs about the future observaton paths of the game concde n the lmt wth the true dstrbuton over the future, and () the agents belefs about the opponents strateges, and hence ther own strateges (whch are best responses to ther belefs), do not change n the lmt. Strateges wth these propertes are sad to be n subectve equlbrum, whch s stable wth respect to learnng and optmzaton. In pror work, [Kala and Lehrer, 1993a, Kala and Lehrer, 1993b] have shown that the strateges of agents engaged n nfntely repeated games wth dscounted payoffs, who are unaware of others strateges, and under the assumptons of perfect observablty of others actons (perfect montorng) and truth compatblty of pror belefs wll converge to a subectve equlbrum.

2 Hahn [Hahn, 1973] ntroduced the concept of a conectural equlbrum n economes where the sgnals generated by the economy do not cause changes n the agents theores, nor do they nduce changes n the agents polces. Fudenberg and Levne [Fudenberg and Levne, 1993] consder a general model of fntely repeated extensve form games wheren strateges of opponents may be correlated (unlke [Kala and Lehrer, 1993a] where strateges are assumed ndependent), and show that behavor of agents that mantan belefs and optmze accordng to ther belefs, converges to a self-confrmng equlbrum. There s a strong lnk between the subectve equlbrum and ts obectve counterpart the Nash equlbrum. Specfcally, under the assumpton of perfect montorng, both [Kala and Lehrer, 1993a] and [Fudenberg and Levne, 1993] show that the strategy profle n subectve and self-confrmng equlbrum nduce a dstrbuton over the future acton paths that concdes wth the dstrbuton nduced by a set of strateges n Nash equlbrum. Of course, ths does not mply that strateges n subectve equlbrum are also n Nash equlbrum; however, the converse s always true. Work of a smlar ven s reported n [Jordan, 1995]. It assumes agents have a common pror over the possble types of agents engaged n a repeated game, and shows that the sequence of Bayesan-Nash equlbrum belefs of agents converges to a Nash equlbrum. The results contaned n ths paper complement pror results. Specfcally, we show the asymptotc exstence of subectve equlbrum n a more general and realstc multagent settng, one n whch the assumptons of perfect observablty of state and others actons have been relaxed. Further, we address the research problem posed n [Kala and Lehrer, 1993a] regardng the exstence of subectve equlbrum n POSGs. We also draw a parallel wth works n multagent learnng [Hu and Wellman, 1998, Bowlng and Veloso, 2002] that show convergence of play to equlbrum. However, our results dffer n that we assume that the state and others actons are partally observable, and the plan s computed offlne usng a gven model of the envronment. Fnally, we comment on the dffcultes n achevng subectve equlbra when a computatonal perspectve s adopted. The rest of ths paper s structured n the followng manner. In the next secton, we brefly revew the Interactve POMDP framework. We focus on the key steps of belef update, and polcy computaton. In Secton 3, we ntroduce the concept of a subectve equlbrum and theoretcally prove that the strategy profle of agents playng a POSG modeled usng an I-POMDP, n the lmt, s n subectve equlbrum. In Secton 4, we remark on the computatonal nfeasblty of arrvng at ths equlbrum. Fnally, we conclude ths paper wth a dscusson of our results and open research ssues n Secton 5. 2 Overvew of Interactve POMDPs Interactve POMDPs [Gmytrasewcz and Dosh, 2005, Gmytrasewcz and Dosh, 2004] generalze POMDPs to account for presence of other agents n the envronment. They do ths by ncludng models of other agents n the state space. Models of other agents, analogous to types n game theory, encompass all prvate nformaton nfluencng ther behavor. For smplcty of presentaton let us consder an agent,, that s nteractng wth one other agent,. The formalsm easly generalzes to a larger number of agents. I-POMDP An nteractve POMDP of agent, I-POMDP, s: I-POMDP = IS, A, T, Ω, O, R where: IS s a set of nteractve states defned as IS = S O M, where S s the set of states of the 2

3 physcal envronment, and O M s the set of pars consstng of a possble observaton functon and a model of agent. Each model, m M, s a par m = h, π, where π : H (A ) s s polcy tree (strategy), assumed computable 1, whch maps possble hstores of s observatons to dstrbutons over ts actons. 2 h s an element of H. 3 O O, also computable, specfes the way n whch the envronment s supplyng the agent wth ts nput. A = A A s the set of ont moves of all agents T s a transton functon, T : S A S [0, 1] whch descrbes results of agents actons on the physcal state Ω s the set of agent s observatons O s an observaton functon O : S A Ω [0, 1] R s defned as R : S A R. We allow the agent to have preferences over physcal states and actons of all agents. The task of computng a soluton for an I-POMDP, smlar to that of a POMDP, can be decomposed nto two steps: (1)Belef update durng whch the agent updates ts belef to reflect newly avalable nformaton. (2)Polcy computaton durng whch the agent computes the optmal acton(s) to perform from each belef state. 2.1 Bayesan Belef Update There are two dfferences that complcate state estmaton n multagent settngs, when compared to sngle agent ones. Frst, snce the state of the physcal envronment depends on the actons performed by both agents, the predcton of how the physcal state changes has to be made based on the predcted actons of the other agent. The probabltes of other s actons are obtaned based on ther models. Thus, as opposed to the lterature on learnng n repeated games, we do not assume that actons are fully observable by other agents. Rather, agents can attempt to nfer what actons other agents have performed by sensng ther results on the envronment. Second, changes n the models of other agents have to be ncluded n the update. Specfcally, update of the other agent s models due to ts new observaton has to be ncluded. In other words, the agent has to update ts belefs based on what t antcpates that the other agent observes and how t updates. Consequently, an agent s belefs record what t thnks about how the other agent wll behave as t learns. For smplcty we decompose the I-POMDP belef update nto two steps: Predcton When an agent, say, wth a prevous belef, b t 1 f the other agent performs ts acton a t 1 P r(s t a t 1, a t 1, b t 1 ) =, the predcted belef state s, b t 1 IS t 1 :(f,o ) t 1 =(f,o ) t ω t O (s t, a t 1, performs a control acton a t 1 (s) P r(a t 1, a t 1, ω t)δ(append(ht 1, ω t) ht ) and m )T (s t 1, a t 1, a t 1, s t ) where δ s the Kronecker delta functon, and APPEND(, ) returns a strng n whch the second argument s appended to the frst. Correcton When agent perceves an observaton, ω t, the ntermedate belef state, 1 We assume computablty n the Turng machne sense,.e. strateges are partal recursve functons. 2 Note that f A 2, then the space of polcy trees s uncountable; however, by assumng π to be computable, we restrct the space to be countable. 3 In [Gmytrasewcz and Dosh, 2005, Gmytrasewcz and Dosh, 2004], we replace O M wth a specal class of models called ntentonal models. These models ascrbe belefs, preferences, and ratonalty to other agents. We do not ntroduce those models here for the purpose of generalty. 3

4 P r( a t 1, a t 1, b t 1 ), s corrected accordng to, P r(s t ω t, a t 1, b t 1 ) = α a t 1 O (s t, a t 1, a t 1, ω t )P r(s t a t 1, a t 1, b t 1 ) where α s the normalzng constant. The update extends to more than two agents n a straghtforward way. We represent possble correlatons between actons of other agents as dependences between ther models, whch are expressed n s belefs. 2.2 Polcy Computaton Each belef state n I-POMDP has an assocated value reflectng the maxmum payoff the agent can expect n ths belef state: { V (b ) = max ER (s, a )b (s) + γ } P r(ω a, b )V (SE(b, a, ω )) (1) a A s ω Ω where, ER (s, a ) = a R (s, a, a )P r(a m ) (snce s = (s, m )). Eq. 1 s a bass for value teraton n I-POMDPs. As shown n [Gmytrasewcz and Dosh, 2005], the value teraton converges n the lmt. Agent s optmal acton, a, for the case of nfnte horzon crteron wth dscountng, s an element of the set of optmal actons for the belef state, OP T (b ), defned as: { OP T (b ) = argmax ER (s, a )b (s) + γ } P r(ω a, b )V (SE(b, a, ω )) (2) a A s ω Ω Equaton 2 enables the computaton of a polcy tree, π, for each belef b. The polcy, π, gves s best response long term strategy for the belef. 3 Subectve Equlbrum n I-POMDPs In Secton 2, we revewed a framework for two-agent POSG n whch each agent computes the dscounted nfnte horzon strategy whch s the subectve best response of the agent to ts belef. Durng each step of game play, the agent startng wth a pror belef revses t n lght of the new nformaton usng the Bayesan belef update process outlned n Secton 2.1, and computes the optmal strategy gven ts belefs. The latter step s equvalent to usng ts observaton hstory to ndex nto ts polcy tree (computed offlne usng the process gven n Secton 2.2) 4 to compute the best response future strategy. 3.1 Background: Stochastc Processes, Martngales, and Bayesan Learnng A stochastc process s a sequence of random varables, {X t }, t = 0, 1,..., whose values are realzed one at a tme. Well-known examples of stochastc processes are Markov chans, as well as sequences of belefs updated usng the Bayesan update. Bayesan learnng turns out to exhbt an addtonal property that classfes t as a specal type of stochastc process, called a Martngale. 4 In the nfnte horzon case, convergence of value teraton allows us to convenently represent the polcy tree as a fnte state machne 4

5 A Martngale s a stochastc process that, for any observaton hstory up to tme t, h t, exhbts the property that for all l t: E[X l h t ] = X t. Consequently, for all future tme ponts l t the expected change, E[X l X t h t ] = 0. A sequence of an agent s belefs updated usng Bayesan learnng s known to be a Martngale. Intutvely, ths means that the agent s current estmate of the state s equal to what the agent expects ts future estmates of the state wll be, based on ts current observaton hstory. Because the Martngale property of Bayesan learnng s central to our results, we sketch a formal proof below. Let an agent s ntal belef over some state space Ξ be X 0 = P r(ξ). The agent receves some observaton, ω, n the future accordng to a dstrbuton φ that depends on θ. Let the future revsed belef be X 1 = P r(ξ ω). By Bayes theorem, P r(ξ ω) = φ(ω ξ)p r(ξ)/p r(ω). We wll show that E[P r(ξ ω)] = P r(ξ): E[P r(ξ ω)] = ω P r(ξ ω)p r(ω) = φ(ω ξ)p r(ξ) ω P r(ω) P r(ω) = ω φ(ω ξ)p r(ξ) = P r(ξ) ω φ(ω ξ) = P r(ξ) = X 0 The above result extends mmedately to observaton hstores of any length t. Formally, E[X t+1 h t ] = X t, and from the law of condtonal expectatons, E[X l h t ] = X t, l t. Therefore the belefs satsfy the Martngale property. All Martngales share the followng convergence property: Theorem 1 (Martngale Convergence Theorem ( 4 of Chapter 7 n [Doob, 1953]). If {X t }, t = 0, 1,... s a Martngale wth E[ X t 2 ] < U < for some U and all t, then the sequence of random varables, {X t } converges wth probablty 1 to some X n mean-square. 3.2 Subectve Equlbrum We nvestgate the asymptotc behavor of agents playng an nfnte horzon POSG, n whch each agent learns and optmzes. Specfcally, each agent starts wth a pror belef whch s revsed on performng an acton and recept of sensory nformaton, followed by computng the strategy whch optmzes ts belefs. In the context of I-POMDPs, each agent uses ts pror belefs to ndex nto ts polcy (computed offlne usng Equatons 1 and 2) resultng n the polcy tree that wll form ts behavor strategy. Sequental behavor of agents n a POSG may be represented usng ther observaton hstores. For an agent, say, let ω t be ts observaton at tme step t. Let ωt = [ω t, ωt ]. An observaton hstory of the game s a sequence, h = {ω t }, t = 1, 2,.... The set of all hstores s, H = t=1ω t where Ω t = Π t 1(Ω Ω ). The set of observaton hstores upto tme t s, H t = Π t 1(Ω Ω ), and the set of future observaton paths from tme t onwards s, H t = Π t (Ω Ω ). Example: We use the multagent tger problem descrbed n [Gmytrasewcz and Dosh, 2005] as a runnng example throughout ths paper. Brefly, the game conssts of two doors, behnd one s a tger and behnd the other s some gold, and two agents and. The agents are unaware of where the tger s (TL or TR), and each can ether open any one of two doors, or lsten(ol,or or L). A tger emts a growl perodcally, whch reveals ts poston behnd a door (GL or GR) but only wth some certanty. Addtonally, each agent can also hear a creak wth some certanty, f the other agent opens a door (CL,CR, or S). We wll assume that nether agent can perceve other s observatons 5

6 nor actons. The game s non-cooperatve snce ether or may open a door, thereby resetng the locaton of the tger, and renderng any nformaton collected by the other agent about the tger s locaton useless to t. Example hstores n the multagent tger problem are shown n Fg. 1. <GL,S> <GL,S> [TL,L,L] <GR,CR><GR,CR> <GL,S><GL,CL> <GR,CR><GR,CL> [TL,L,L] [TL,L,L] [TL,L,L] [TL,L,L] <GL,S><GL,S> <GR,CR><GR,CR> <GL,S><GL,S> <GR,CR><GR,CR> [TL,OR,OR] [TL,L,L] [TL,L,L] [TL,L,L] Fgure 1: Jont observaton hstores n the nfnte horzon multagent tger problem. The nodes represent the state of the game and play of agents, whle the edges are labelled wth the possble observatons. Ths example starts wth the tger on the left and each agent lstenng. Each agent may receve one of sx observatons (labels on the arrows), and performs an acton that optmzes ts resultng belef. In the I-POMDP framework, each agent s belef over the physcal state and others canddate models, together wth the agent s perfect nformaton regardng ts own model, nduces a predctve probablty dstrbuton over the future observaton paths. These dstrbutons play a crtcal role n our analyss; we represent them mathematcally usng a collecton of probablty measures, {µ k }, k = 0,, defned over the space M H, where M = M M and H s as defned prevously, such that, 1. µ 0 s the obectve true dstrbuton over models of each agent and the hstores, 2. pro Mk µ k = pro Mk µ 0 = δ mk k =, Here, condton 2 states that each agent knows ts own model (δ mk s the Kronecker delta functon). Addtonally, pro H µ 0 gves the true dstrbuton over the hstores as nduced by the ntal strategy profle, and pro H µ k ( b 0 k ) for k =, gves the predctve probablty dstrbuton for each agent over the hstores at the start of the game. 5 If the actual sequence of observatons n the game does not proceed along a hstory that s assgned some postve predctve probablty by an agent, then the agent s observatons would contradct ts belefs and the Bayesan update would not be possble. Clearly, t s desrable for each agent s ntal belef to assgn nonzero probablty to each possble observaton hstory; ths s called the truth compatblty condton. To formalze ths condton we need a noton of absolute contnuty of two probablty measures. Defnton 1 (Absolute Contnuty). A probablty measure p 1 s absolutely contnuous wth p 2, denoted as p 1 p 2, f p 2 (E) = 0 mples p 1 (E) = 0, for any measurable set E. Condton 1 (Absolute Contnuty Condton (ACC)). ACC holds for any agent k =, f pro H µ 0 pro H µ k ( b 0 k ). 5 Followng [Nyarko, 1997, Jordan, 1995] the uncondtonal measure µ k may be seen as a pror before an agent knows ts own model, and µ k along wth the condtons as an nterm pror once an agent knows ts own model. 6

7 Condton 1 states that the probablty dstrbuton nduced by an agent s ntal belef on observaton hstores should not rule out postve probablty events accordng to the real probablty dstrbuton on the hstores. A sure way to satsfy ACC s for each agent s ntal belef to have a gran of truth assgn a non-zero probablty to the true model of the other agent. Snce an agent has no way of knowng the true model of ts opponent from beforehand, t must assgn a non-zero probablty to each canddate model of the other agent. Truth compatble belefs of an agent that performs Bayesan learnng tend to converge n the lmt to the opponent model(s) that most lkely generates the observatons of the agent. In the context of the I-POMDP framework, an agent s belef updated usng the process outlned n Secton 2.1, wll converge n the lmt. Formally, Theorem 2 (Bayesan Learnng n I-POMDPs). For an agent n the I-POMDP framework, f ts ntal belef satsfes the ACC, ts posteror belefs wll converge wth probablty 1. Proof. As we mentoned before, Bayesan learnng s a Martngale. In Secton 3.1, set Ξ = IS, and φ = O. Notng that the I-POMDP belef update s Bayesan, ts Martngale property follows from applyng the proof approprately. In order to apply Theorem 1 to the I-POMDP belef update, set X t = P r(s t h t ) where ht s agent s observaton hstory upto tme t. We must frst show that E[ X t 2 ] s bounded. E[ b t 2 ] = ( A Ω ) t k=1 b t = b k 2 P r( b k ) = ( A Ω ) t b k=1 IS k t (s)2 P r(b k ) (L 2 norm) ( A Ω ) t k=1 1 P r( b k ) ( p 2 1) = 1 Theorem 2 now follows from a straghtforward applcaton of Theorem 1. The above result does not mply that an agent s belef converges to the true model of the other agent. Ths s due to the possble presence of observatonally equvalent models of the other agent. For example, for agent, all models of that nduce dentcal dstrbutons over all possble future observaton paths are sad to be observatonally equvalent. When a partcular observaton hstory obtans, agent s unable to dstngush between the observatonally equvalent models of. In other words, observatonally equvalent models generate dstnct behavors for hstores whch are never observed. Example: For an example of observatonally equvalent models, consder a verson of the multagent tger game n whch the tger perssts behnd ts orgnal door once any door has been opened. Addtonally, has superor observaton capabltes compared to, and each agent s able to perfectly observe other s actons but observes the growls mperfectly. Let s utlty dctate that t wll not open any doors untl t s 100% certan that the tger s behnd the opposte door. The correspondng strategy for s to lsten for an nfnte number of tme steps, and then open the door. Suppose that as a best response to ts belef, were to adopt a strategy n whch t would lsten for an nfnte number of steps, but f at any tme opened a door, t would also do so at the next tme step and then contnue openng the same door. The true dstrbuton assgns a probablty 1 to the hstores {[ GL GR, S, GL GR, S ]} 1. Instead of the above mentoned strategy f were to adopt a follow-the-leader strategy,.e. performs the acton whch dd n the prevous tme step, then the true dstrbuton would agan assgn probablty 1 to the prevously mentoned hstores. The two dfferent strateges of turn out to be observatonally equvalent for. An mmedate consequence of the convergence of Bayesan learnng s that the predctve dstrbuton over the future observaton paths nduced by each agent s belef after a fnte sequence 7

8 of observatons h t k, pro H t µ k ( b 0 k, ht k ), k =, becomes arbtrary close to the true dstrbuton, pro Ht µ 0 ( h t ), for a fnte t, and converges unformly n the lmt. Ths s an mportant result, because t establshes that no matter what the ntal belefs of the agents compatble, the agents opnons (about the future) wll merge and correctly predct the true future n the lmt. Ths result was frst noted n [Blackwell and Dubns, 1962]; we present the theorem below and refer the reader to the paper for ts proof. Theorem 3 ( [Blackwell and Dubns, 1962]). Suppose that P s a predctve probablty on X, and Q s absolutely contnuous w.r.t. P. Then for each condtonal dstrbuton P t (x 1,..., x t ) of the future gven the past w.r.t. P, there exsts a condtonal dstrbuton Q t (x 1,..., x t ) of the future gven the past w.r.t. Q such that, P t (x 1,..., x t ) Q t (x 1,..., x t ) t 0 wth Q-probablty 1. We use Theorem 3 to establsh predctve convergence n the context of the I-POMDP framework. Theorem 4 (ɛ-predctve Convergence n I-POMDPs). For all agents n the I-POMDP framework, f ther ntal belefs satsfy the ACC, then for every ɛ > 0, there exsts a fnte T whch s a functon of ɛ, such that for all t T and wth µ 0 -probablty 1, for k =,. pro Ht µ 0 ( h t ) pro Ht µ k ( b 0 k, h t k) ɛ Proof. Referrng to Theorem 3, let X = H. We observe that pro H µ 0 and pro H µ k ( b 0 k ) for k =, are predctve as defned n [Blackwell and Dubns, 1962]. Set Q = pro H µ 0, and P = pro H µ k ( b 0 k ). Subsequently, Qt = pro Ht µ 0 ( h t ), and P t = pro Ht µ k ( b 0 k, ht k ). Theorem 4 then follows mmedately from a straghtforward applcaton of Theorem 3. We have shown that for a POSG modeled usng the I-POMDP formalsm, the players belefs over opponent s models converge n the lmt f they satsfy the ACC property. However, the lmt belefs may be ncorrect, due to the nablty of agents to dstngush between observatonally equvalent models of the opponent on the bass of the observaton hstory. Nevertheless, ther belefs over the future paths come arbtrary close, and reman close, to the true dstrbuton over the future, after a fnte amount of tme. Further observatons wll only confrm ther belefs about the truth, and wll not alter ther belefs. We capture ths noton usng the concept of a subectve equlbrum [Kala and Lehrer, 1993a], defned as follows: Defnton 2 (Subectve ɛ-equlbrum). Let b t k, k =, be the agents belefs at some tme t. A par of polcy trees, π = [π, π ] s a subectve ɛ-equlbrum f, 1. π OP T (bt ), π OP T (bt ) 2. pro Ht µ 0 ( h t ) pro Ht µ k ( b 0 k, ht k ) ɛ, k =, wth a µ 0-probablty 1. For ɛ = 0, subectve equlbrum obtans. Condton 1 of subectve ɛ-equlbrum states that agents are subectvely ratonal,.e. ther strateges are best responses to ther belefs. As we mentoned before, these strateges are the polcy trees computed usng Equatons 1 and 2. The second condton states that the agents belefs have attaned ɛ-predctve convergence. In other words, a strategy profle s n subectve ɛ-equlbrum when the strateges are best responses to agents belefs that have attaned ɛ-predctve convergence. We now establsh the man result of ths paper, whch s that behavor strateges of agents playng a POSG modeled usng the I-POMDP framework, attan subectve ɛ-equlbrum n fnte tme, and subectve equlbrum n the lmt. The followng corollary gves our result. 8

9 Corollary 1 (Convergence to Subectve Equlbrum n I-POMDPs). Let π = [π, π ] be the strateges of agents, and respectvely, playng a POSG modeled usng the I-POMDP formalsm. Let b 0, and b0 be ther ntal belefs. If the followng condtons are met, 1. π OP T (b 0 ), π OP T (b 0 ) 2. pro H µ 0 pro H µ k ( b 0 k ), k =, (ACC) then for any ɛ > 0, and for all µ 0 -postve probablty hstores, there exsts some fnte tme step T whch s a functon of ɛ, such that for all t T, the strategy profle, π = [π, π ] s a subectve ɛ-equlbrum where, b t and bt are the agents belefs at tme t π OP T (bt ), π OP T (bt ) Proof. Corollary 1 follows n part from Theorem 4, and n part from notng that agents n the I- POMDP framework compute strateges that are best responses to ther posteror belefs at each tme step, and that the belefs are updated usng ther observaton hstory. Strategy profles n subectve ɛ-equlbrum for arbtrarly small ɛ 0 are stable. Specfcally, further play wll brng agents belefs over the future closer to the truth statstcally, and the correspondng strategy profles wll reman n the subectve ɛ-equlbrum. Note that ACC s a suffcent condton, but not a necessary one. An example settng n whch even though ACC s volated, yet subectve ɛ-equlbrum stll results s gven n [Kala and Lehrer, 1993a]. 4 Computatonal Lmtatons of Our Results Recall that n Secton 2, we made the assumpton that agent strateges are computable. Ths restrcts the space of possble strateges to be countable. However, as observed n [Nachbar and Zame, 1996], there may exst strateges whch are exact best responses to computable strateges but are themselves not computable, and even when computable best responses do exst, the decson procedure of computng these best responses may not be computable. Consequences of these negatve results lead to a subtle tenson between learnng and optmzaton. Specfcally, f agents best response strateges are not computable, then ther belefs fal to account for such strateges of others, thereby possbly volatng ACC and preventng predctve convergence. On the other hand, f we post that best responses be computable, then the correspondng belefs may be unnatural for example, they may not assgn non-zero probablty to all possble strateges of others. Nachbar [Nachbar, 1997] makes an argument along smlar lnes n the context of repeated games usng the noton of a conventonal set of strateges (analogous to computable set n our settng) attrbuted to each agent. We beleve that these mplausblty ssues are a drect mplcaton of Bnmore s clam n [Bnmore, 1982] that perfect ratonalty s an unattanable deal. Bnmore proves that a Turng machne cannot always predct truthfully the behavor of an opponent Turng machne (gven ts complete descrpton) and optmze smultaneously. Hs clam rests on a partcular constructon of a two-agent game n whch a supposedly ratonal Turng machne when requred to compute the best response s unable to predct truthfully, and when requred to predct truthfully s unable to termnate ts computatons and optmze. Though the above mentoned negatve results are exstental, they serve to show that t may be problematc to fulfll the assumptons lad out n our analyss n practce. Nevertheless, there may be 9

10 ways to overcome these lmtatons. One nterestng drecton s to replace exact optmzaton wth approxmate optmzaton. Specfcally, rather than computng the exact best response to ts subectve belef, an agent may compute an ɛ-best response 6 that s guaranteed to be always computable. However, the effect of ɛ-optmalty on predctve convergence remans an open queston. 5 Dscusson In ths paper we analyzed play of agents engaged n a partally observable stochastc game formalzed usng the nteractve POMDP framework. In partcular, we have consdered subectvely ratonal agents whch may not know others strateges. Therefore, they mantan belefs over the physcal state and models of other agents and optmze wth respect to ther belefs. We have also shown how such agents update ther belefs on performng actons and recevng observatons, and compute best responses to ther belefs. Wthn ths framework, we proved that f agents belefs satsfy a truth compatblty condton, then strateges of agents that learn and optmze converge to the subectve equlbrum n the lmt, and subectve ɛ-equlbrum for arbtrarly small ɛ > 0 n fnte tme. For completeness, we also gave condtons on agents pror belefs that wll allow play to converge to Nash equlbrum. Though the concept of subectve equlbrum s not novel, we beleve that our results complement the exstng lterature. Specfcally, usng the I-POMDP framework for learnng and optmzng, we have shown the exstence of equlbra n a POSG. The POSGs provde a more realstc settng than repeated games, n whch exstence of equlbra was known. We ponted out that attempts to apply these theoretcal results could run nto obstacles. One problem s the nherent dffculty n perfect optmzaton and smultaneous predcton. One may be forced to resort to ɛ-optmalty. Whether any form of equlbrum obtans when the players are boundedly ratonal s a topc of future work. References [Bnmore, 1982] Bnmore, K. (1982). Essays on Foundatons of Game Theory. Pttman. [Blackwell and Dubns, 1962] Blackwell, D. and Dubns, L. (1962). Mergng of opnons wth ncreasng nformaton. Annals of Mathematcal Statstcs, 33(3): [Bowlng and Veloso, 2002] Bowlng, M. and Veloso, M. (2002). Multagent learnng usng a varable learnng rate. Artfcal Intellgence Journal, 136: [Doob, 1953] Doob, J. L. (1953). Stochastc Processes. John Wley and Sons. [Fudenberg and Levne, 1993] Fudenberg, D. and Levne, D. (1993). Self confrmng equlbrum. Econometrca, 61: [Gmytrasewcz and Dosh, 2004] Gmytrasewcz, P. and Dosh, P. (2004). Interactve pomdps: Propertes and prelmnary results. In AAMAS, pages , NYC, NY. [Gmytrasewcz and Dosh, 2005] Gmytrasewcz, P. and Dosh, P. (2005). A framework for sequental plannng n multagent settngs. Journal of AI Research, One way to compute an ɛ-best response s to consder fnte horzons for maxmzaton, rather than nfnty. 10

11 [Hahn, 1973] Hahn, F. (1973). On the Noton of Equlbrum n Economcs: An Inaugural Lecture. Cambrdge Unversty Press. [Hansen et al., 2004] Hansen, E., Bernsten, D., and Zlbersten, S. (2004). Dynamc programmng for partally observable stochastc games. In AAAI. [Hu and Wellman, 1998] Hu, J. and Wellman, M. P. (1998). Multagent renforcement learnng: Theoretcal framework and an algorthm. In 15th Intl Conference on Machne Learnng, pages [Jordan, 1995] Jordan, J. S. (1995). Bayesan learnng n repeated games. Games and Economc Behavor, 9:8 20. [Kadane and Larkey, 1982] Kadane, J. and Larkey, P. (1982). Subectve probablty and the theory of games. Management Scence, 28(2): [Kala and Lehrer, 1993a] Kala, E. and Lehrer, E. (1993a). Ratonal learnng leads to nash equlbrum. Econometrca, 61(5): [Kala and Lehrer, 1993b] Kala, E. and Lehrer, E. (1993b). games. Econometrca, 61(5): Subectve equlbrum n repeated [Nachbar, 1997] Nachbar, J. H. (1997). Predcton, optmzaton, and ratonal learnng n repeated games. Econometrca, 65: [Nachbar and Zame, 1996] Nachbar, J. H. and Zame, W. (1996). Non-computable strateges and dscounted repeated games. Economc Theory, 8: [Nyarko, 1997] Nyarko, Y. (1997). Convergence n economc models wth bayesan herarches of belefs. Journal of Economc Theory, 74:

A Framework for Sequential Planning in Multi-Agent Settings

A Framework for Sequential Planning in Multi-Agent Settings A Framework for Sequental Plannng n Mult-Agent Settngs Potr J. Gmytrasewcz and Prashant Dosh Department of Computer Scence Unversty of Illnos at Chcago {potr,pdosh}@cs.uc.edu Abstract Ths paper extends

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

A Framework for Sequential Planning in Multi-Agent Settings

A Framework for Sequential Planning in Multi-Agent Settings Journal of Artfcal Intellgence Research 24 (2005) 49-79 Submtted 09/04; publshed 07/05 A Framework for Sequental Plannng n Mult-Agent Settngs Potr J. Gmytrasewcz Prashant Dosh Department of Computer Scence

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Equilibrium with Complete Markets. Instructor: Dmytro Hryshko

Equilibrium with Complete Markets. Instructor: Dmytro Hryshko Equlbrum wth Complete Markets Instructor: Dmytro Hryshko 1 / 33 Readngs Ljungqvst and Sargent. Recursve Macroeconomc Theory. MIT Press. Chapter 8. 2 / 33 Equlbrum n pure exchange, nfnte horzon economes,

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

A note on the one-deviation property in extensive form games

A note on the one-deviation property in extensive form games Games and Economc Behavor 40 (2002) 322 338 www.academcpress.com Note A note on the one-devaton property n extensve form games Andrés Perea Departamento de Economía, Unversdad Carlos III de Madrd, Calle

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Tit-For-Tat Equilibria in Discounted Repeated Games with. Private Monitoring

Tit-For-Tat Equilibria in Discounted Repeated Games with. Private Monitoring 1 Tt-For-Tat Equlbra n Dscounted Repeated Games wth Prvate Montorng Htosh Matsushma 1 Department of Economcs, Unversty of Tokyo 2 Aprl 24, 2007 Abstract We nvestgate nfntely repeated games wth mperfect

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 7, Number 2, December 203 Avalable onlne at http://acutm.math.ut.ee A note on almost sure behavor of randomly weghted sums of φ-mxng

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Subjective Uncertainty Over Behavior Strategies: A Correction

Subjective Uncertainty Over Behavior Strategies: A Correction Subjectve Uncertanty Over Behavor Strateges: A Correcton The Harvard communty has made ths artcle openly avalable. Please share how ths access benefts you. Your story matters. Ctaton Publshed Verson Accessed

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Perfect Competition and the Nash Bargaining Solution

Perfect Competition and the Nash Bargaining Solution Perfect Competton and the Nash Barganng Soluton Renhard John Department of Economcs Unversty of Bonn Adenauerallee 24-42 53113 Bonn, Germany emal: rohn@un-bonn.de May 2005 Abstract For a lnear exchange

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India February 2008

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India February 2008 Game Theory Lecture Notes By Y. Narahar Department of Computer Scence and Automaton Indan Insttute of Scence Bangalore, Inda February 2008 Chapter 10: Two Person Zero Sum Games Note: Ths s a only a draft

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

6. Stochastic processes (2)

6. Stochastic processes (2) 6. Stochastc processes () Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 6. Stochastc processes () Contents Markov processes Brth-death processes 6. Stochastc processes () Markov process

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

6. Stochastic processes (2)

6. Stochastic processes (2) Contents Markov processes Brth-death processes Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 Markov process Consder a contnuous-tme and dscrete-state stochastc process X(t) wth state space

More information

Finite State Equilibria in Dynamic Games

Finite State Equilibria in Dynamic Games Fnte State Equlbra n Dynamc Games Mchhro Kandor Faculty of Economcs Unversty of Tokyo Ichro Obara Department of Economcs UCLA June 21, 2007 Abstract An equlbrum n an nfnte horzon game s called a fnte state

More information

Supplement to Clustering with Statistical Error Control

Supplement to Clustering with Statistical Error Control Supplement to Clusterng wth Statstcal Error Control Mchael Vogt Unversty of Bonn Matthas Schmd Unversty of Bonn In ths supplement, we provde the proofs that are omtted n the paper. In partcular, we derve

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Infinitely Split Nash Equilibrium Problems in Repeated Games

Infinitely Split Nash Equilibrium Problems in Repeated Games Infntely Splt ash Equlbrum Problems n Repeated Games Jnlu L Department of Mathematcs Shawnee State Unversty Portsmouth, Oho 4566 USA Abstract In ths paper, we ntroduce the concept of nfntely splt ash equlbrum

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION. Contents

A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION. Contents A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION BOTAO WU Abstract. In ths paper, we attempt to answer the followng questons about dfferental games: 1) when does a two-player,

More information

k t+1 + c t A t k t, t=0

k t+1 + c t A t k t, t=0 Macro II (UC3M, MA/PhD Econ) Professor: Matthas Kredler Fnal Exam 6 May 208 You have 50 mnutes to complete the exam There are 80 ponts n total The exam has 4 pages If somethng n the queston s unclear,

More information

Maximizing the number of nonnegative subsets

Maximizing the number of nonnegative subsets Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Remarks on the Properties of a Quasi-Fibonacci-like Polynomial Sequence

Remarks on the Properties of a Quasi-Fibonacci-like Polynomial Sequence Remarks on the Propertes of a Quas-Fbonacc-lke Polynomal Sequence Brce Merwne LIU Brooklyn Ilan Wenschelbaum Wesleyan Unversty Abstract Consder the Quas-Fbonacc-lke Polynomal Sequence gven by F 0 = 1,

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Generalized Point Based Value Iteration for Interactive POMDPs

Generalized Point Based Value Iteration for Interactive POMDPs Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (2008) Generalzed Pont Based Value Iteraton for Interactve POMDPs Prashant Dosh Dept. of Computer Scence and AI Center Unversty of Georga

More information

The Size of Message Set Needed for the Optimal Communication Policy

The Size of Message Set Needed for the Optimal Communication Policy The Sze of Message Set Needed for the Optmal Communcaton Polcy Tatsuya Kasa, Hayato Kobayash, and Ayum Shnohara Graduate School of Informaton Scences, Tohoku Unversty, Japan kasa@shno.ece.tohoku.ac.jp,

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness. 20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed

More information

Welfare Properties of General Equilibrium. What can be said about optimality properties of resource allocation implied by general equilibrium?

Welfare Properties of General Equilibrium. What can be said about optimality properties of resource allocation implied by general equilibrium? APPLIED WELFARE ECONOMICS AND POLICY ANALYSIS Welfare Propertes of General Equlbrum What can be sad about optmalty propertes of resource allocaton mpled by general equlbrum? Any crteron used to compare

More information

Portfolios with Trading Constraints and Payout Restrictions

Portfolios with Trading Constraints and Payout Restrictions Portfolos wth Tradng Constrants and Payout Restrctons John R. Brge Northwestern Unversty (ont wor wth Chrs Donohue Xaodong Xu and Gongyun Zhao) 1 General Problem (Very) long-term nvestor (eample: unversty

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Turing Machines (intro)

Turing Machines (intro) CHAPTER 3 The Church-Turng Thess Contents Turng Machnes defntons, examples, Turng-recognzable and Turng-decdable languages Varants of Turng Machne Multtape Turng machnes, non-determnstc Turng Machnes,

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Constant Best-Response Functions: Interpreting Cournot

Constant Best-Response Functions: Interpreting Cournot Internatonal Journal of Busness and Economcs, 009, Vol. 8, No., -6 Constant Best-Response Functons: Interpretng Cournot Zvan Forshner Department of Economcs, Unversty of Hafa, Israel Oz Shy * Research

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Random Walks on Digraphs

Random Walks on Digraphs Random Walks on Dgraphs J. J. P. Veerman October 23, 27 Introducton Let V = {, n} be a vertex set and S a non-negatve row-stochastc matrx (.e. rows sum to ). V and S defne a dgraph G = G(V, S) and a drected

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

THERE ARE INFINITELY MANY FIBONACCI COMPOSITES WITH PRIME SUBSCRIPTS

THERE ARE INFINITELY MANY FIBONACCI COMPOSITES WITH PRIME SUBSCRIPTS Research and Communcatons n Mathematcs and Mathematcal Scences Vol 10, Issue 2, 2018, Pages 123-140 ISSN 2319-6939 Publshed Onlne on November 19, 2018 2018 Jyot Academc Press http://jyotacademcpressorg

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Games and Economic Behavior

Games and Economic Behavior Games and Economc Behavor 76 2012 210 225 Contents lsts avalable at ScVerse ScenceDrect Games and Economc Behavor www.elsever.com/locate/geb Non-Bayesan socal learnng Al Jadbabae a, Pooya Molav a, Alvaro

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Exercise Solutions to Real Analysis

Exercise Solutions to Real Analysis xercse Solutons to Real Analyss Note: References refer to H. L. Royden, Real Analyss xersze 1. Gven any set A any ɛ > 0, there s an open set O such that A O m O m A + ɛ. Soluton 1. If m A =, then there

More information

Understanding Reasoning Using Utility Proportional Beliefs

Understanding Reasoning Using Utility Proportional Beliefs Understandng Reasonng Usng Utlty Proportonal Belefs Chrstan Nauerz EpCenter, Maastrcht Unversty c.nauerz@maastrchtunversty.nl Abstract. Tradtonally very lttle attenton has been pad to the reasonng process

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Genericity of Critical Types

Genericity of Critical Types Genercty of Crtcal Types Y-Chun Chen Alfredo D Tllo Eduardo Fangold Syang Xong September 2008 Abstract Ely and Pesk 2008 offers an nsghtful characterzaton of crtcal types: a type s crtcal f and only f

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information