Calibrated Learning and Correlated Equilibrium

Size: px

Start display at page:

Download "Calibrated Learning and Correlated Equilibrium"

Dorcas Lane
5 years ago
Views:

Unversty of Pennsylvana ScholarlyCommons Statstcs Papers Wharton Faculty Research 10-1997 Calbrated Learnng and Correlated Equlbrum Dean P. Foster Unversty of Pennsylvana Rakesh V.

1 Unversty of Pennsylvana ScholarlyCommons Statstcs Papers Wharton Faculty Research Calbrated Learnng and Correlated Equlbrum Dean P. Foster Unversty of Pennsylvana Rakesh V. Vohra Follow ths and addtonal works at: Part of the Behavoral Economcs Commons, and the Statstcs and Probablty Commons Recommended Ctaton Foster, D. P., & Vohra, R. V. (1997). Calbrated Learnng and Correlated Equlbrum. Games and Economc Behavor, 21 (1-2), Ths paper s posted at ScholarlyCommons. For more nformaton, please contact repostory@pobox.upenn.edu.

2 Calbrated Learnng and Correlated Equlbrum Abstract Suppose two players repeatedly meet each other to play a game where: 1. each uses a learnng rule wth the property that t s a calbrated forecast of the other's plays, and 2. each plays a myopc best response to ths forecast dstrbuton. Then, the lmt ponts of the sequence of plays are correlated equlbra. In fact, for each correlated equlbrum there s some calbrated learnng rule that the players can use whch results n ther playng ths correlated equlbrum n the lmt. Thus, the statstcal concept of a calbraton s strongly related to the game theoretc concept of correlated equlbrum. Dscplnes Behavoral Economcs Statstcs and Probablty Ths journal artcle s avalable at ScholarlyCommons:

3 Calbrated Learnng and Correlated Equlbrum Dean P. Foster Unversty of Pennsylvana Phladelpha, PA Rakesh V. Vohra Oho State Unversty Columbus OH y Frst draft: May 1992, Revsed: June 1993, Ths verson: October 17, 1996 Emal:foster@hellspark.wharton.upenn.edu y Emal:vohra.1@osu.edu 1

4 Abstract Suppose two players meet each other n a repeated game where: 1. each uses a learnng rule wth the property that t s a calbrated forecast of the others plays, and 2. each plays a best response to ths forecast dstrbuton. Then, the lmt pont of the sequence of plays are Correlated Equlbra. In fact, for each Correlated equlbrum there s some calbrated learnng rule that the players can use whch result n ther playng ths correlated equlbrum n the lmt. Thus, the statstcal concept of calbraton s strongly related to the game theoretc concept of correlated equlbrum. 2

5 1 Introducton The concept of a Nash Equlbrum (NE) s so mportant to game theory that an extensve lterature devoted to ts defense and advancement exsts. Even so, there are aspects of the Nash equlbrum concept that are puzzlng. One s why any player should assume that the other wll play ther Nash equlbrum strategy? Aumann (1987) says: \Ths s partcularly perplexng when, as often happens, there are multple equlbra; but t has consderable force even when the equlbrum s unque." One resoluton s to argue that the assumpton about an opponent's plays are the outcome of some learnng process (see for example Chapter 6 of Kreps (1991a)). Learnng s modeled as recurrent updatng. Players choose a best reply on the bass of ther forecasts of ther opponents future choces. Forecasts are descrbed as a functon of prevous plays n the repeated game. Much attenton has focused on developng forecast rules by whch a Nash equlbrum (or ts renements) may be learned. Many rules have been proposed and convergence to Nash equlbrum has been establshed under certan condtons (see Skyrms 1990). For example, Fudenberg and Kreps (1991) ntroduce the class of rules satsfyng a property called `asymptotc myopc bayes.' They prove that f convergence takes place, t does so to a NE. Notce that convergence s not guaranteed. In summarzng other approaches, Kreps (1991b) ponts out, \n general convergence s not assured." Ths lack of convergence serves to lessen the mportance of NE and ts renements. On the postve sde Mlgrom and Roberts (1991) have shown that any learnng rule that requres the player to make approxmately best responses consstent wth ther expectatons, play tends towards the serally undomnated set of strateges. They call such learnng rules adaptve and prove that 3

6 f the sequence of plays converges to a NE (or correlated equlbrum) then each players play s consstent wth adaptve learnng. Learnng, as we have descrbed t, takes place at the level of the ndvdual. An mportant class of learnng models nvolve learnng at the level of populatons (evolutonary models). Here the derent strateges are represented by ndvduals n the populaton. In partcular a mxed strategy would be represented by assgnng an approprate fracton of the populaton to each strategy. A par of ndvduals s selected at random to play the game. Indvduals do not update ther strateges but ther numbers wax and wane accordng to ther average (sutably dened) payo. Even n ths envronment convergence to a NE s not guaranteed. On the postve sde, results analogous to Mlgrom and Roberts have been obtaned by Samuelson and Zheng (1992). A second objecton to NE s that t s nconsstent wth the Bayesan perspectve. A Bayesan player starts wth a pror over what ther opponent wll select and chooses a best response to that. To argue that Bayesans should play the NE of the game s to nsst that they each choose a partcular pror. Aumann (1987) has gone further and argued that the soluton concept consstent wth the Bayesan perspectve s not NE but Correlated Equlbrum (CE). Support for such a vew can be found n Nau and McCardle (1990) who characterze CE n terms of the no arbtrage condton so beloved by Bayesans. Also, Kala and Lehrer (1994) show that Bayesan players wth uncontradcted belefs learn a correlated equlbrum. In ths note, we provde a drect lnk between the Bayesan belefs of players to the concluson that they wll play a CE. We do ths by showng that a CE can be `learned'. We do not partcular a specc learnng rule, rather, we restrct our attenton to learnng rules that possess an asymptotc property 4

7 called calbraton. The key result s that f players use any forecastng rule wth the property of beng calbrated, then, n repeated plays of the game, the lmt ponts of the sequence of plays are correlated equlbra. The game theoretc mportance of calbraton follows from a theorem of Dawd (1992). Gven the Bayesans pror look at the forecasts generated by the posteror. The sequences of future events on whch ths forecast wll not be calbrated, have measure zero. That s the Bayesan's pror assgns probablty zero to such outcomes. Thus, under the common pror assumpton, a bayesan would expect all the other players to be usng ther posteror, and hence to be calbrated. Now usng our result that calbraton mples correlated equlbra, and the common pror assumpton shows that bayesans expect that n the lmt, they wll be playng a correlated equlbrum. Ths provdes an alternatve prove to Aumann's proof that the common pror assumpton and ratonalty mples a correlated equlbrum. If the common pror assumpton holds then t s common knowledge that all players are calbrated. If the players use a Bayesan forecastng scheme that s calbrated, then, by the above, n repeated plays of the game, the lmt ponts of the sequence of plays are correlated equlbra. In the next secton of ths paper we ntroduce notaton and provde a rgorous denton of some of the terms used n the ntroducton. Subsequently we state and prove the man result of our paper. For ease of exposton we consder only the 2-person case. However, our results generalze easly to the n-person case. 1 1 See dscusson after Theorem 3. 5

8 2 Notaton and Dentons For = 1; 2, denote by S() the nte set of pure strateges of player and by u (x; y) 2 < the payo to player where x 2 S(1) and y 2 S(2). Let m = js(1)j and n = js(2)j. A correlated strategy s a functon h from a nte probablty space? nto S(1) S(2),.e., h = (h 1 ; h 2 ) s a random varable whose values are pars of strateges, one from S(1) and the other from S(2). Note that f h s a correlated strategy, then u (h 1 ; h 2 ), s a real valued random varable. So as to understand the denton of a correlated equlbrum, magne an umpre who announces to both players what? and h are. Chance chooses an element g 2? and hands t to the umpre who computes h(g). The umpre then reveals h (g) to player only and nothng more. Denton: A correlated strategy h s called a correlated equlbrum f: E ( u 1 (h 1 ; h 2 ) ) E ( u 1 ((h 1 ); h 2 ) ) for all : S(1)! S(1); and, E ( u 2 (h 1 ; h 2 ) ) E ( u 2 (h 1 ; (h 2 )) ) for all : S(2)! S(2); Thus, a CE s acheved when no player can gan by devatng from the umpre's recommendaton, assumng the other player wll not devate ether. The devatons, are restrcted to be functons of h because player knows only h (g). For more on CE see Aumann (1974) and Aumann (1987). We turn now to the noton of calbraton. Ths s one of a number of crtera used to evaluate the relablty of a probablty forecast. It has been argued by a number of wrters (see Dawd (1982)) that calbraton s an 6

9 appealng mnmal condton that any respectable probablty forecast should satsfy. Dawd oers the followng ntutve denton: Suppose that, n a long (conceptually nnte) sequence of weather forecasts, we look at all those days for whch the forecast probablty of precptaton was, say, close to some gven value p and (assumng these form an nnte sequence) determne the long run proporton of such days on whch the forecast event (ran) n fact occurred. The plot of aganst p s termed the forecaster's emprcal calbraton curve. If the curve s the dagonal = p, the forecaster may be termed well calbrated. 2 To gve the noton a formal denton, suppose that player 1 s usng a forecastng scheme f. The output of f n round t of play s an n-tuple f(t) = fp 1 (t); : : : ; p n (t)g where p j (t) s the forecasted probablty that player 2 wll play strategy j 2 S(2) at tme t. Let (j; t) = 1 f player 2 plays ther j-th strategy n round t and zero otherwse. Denote by N(p; t) the number of rounds up to the t-th round that f generated a vector of forecasts equal to p. Let (p; j; t) be the fracton of these rounds for whch player 2 plays j,.e., (p; j; t) = f N(p; t) > 0 and zero otherwse. t s=1 I f (s)=p (j; s) ; N(p; t) The forecast f s sad to calbrated wth respect to the sequences of plays made by player 2 f: lm t!1 p N(p; t) j(p; j; t)? p j j = 0 t 2 Dawd (1982) page 605. Hs notaton has been changed to match ours. 7

10 for all j 2 S(2). Notce that takng 0=0 = 0 s now seen not to matter snce the only tme t wll occur s f N(p; t) = 0, and thus t would be multpled by zero anyway. Roughly, calbraton says that the emprcal frequences condtoned on the assessments converge to the assessments. Ths s to be contrasted wth the asymptotc myopc bayes condton of Fudenberg and Kreps whch says that the emprcal frequences n round t converge together wth the assessments n round t. 3 Calbraton and Correlated Equlbrum It s clear from the denton of correlated strateges that a CE s smply a jont dstrbuton over S(1) S(2) wth a partcular property. Hence, we focus on D t (x; y), the fracton of tmes up to tme t that player 1 plays x and player 2 plays y. Ths s the emprcal jont dstrbuton. We assume that when players select ther best response (for a gven forecast) they use a a statonary and determnstc te breakng rule; say the lowest ndexed strategy. Theorem 1 Let be the set of all correlated equlbra. If each player uses a forecast that s calbrated aganst the others sequence of plays, and then makes a best response to ths forecast, then, mn D2 max x2s(1);y2s(2) jd t (x; y)? D(x; y)j! 0 as t, the number of rounds of play, tends to nnty. PROOF: Observe rst that the nm-tuple each of whose components s of the form D t (x; y) les n the nm? 1 dmensonal unt smplex. By the 8

11 Bolzano-Werstrass theorem any bounded sequence n t contans a convergent subsequence. Thus, for any subsequence fd t (x; y)g and D(x; y) such that x2s(1) y2s(2) we need to show that D s a CE. jd t (x; y)? D(x; y)j! 0; For each x 2 S(1) let M b (x) be the set of mxtures over S(2) for whch x s a best response. M b (x) s a closed convex subset of the n? 1 dmensonal smplex. Let M p (x) be the set of mxtures where player 1 actually plays x gven that the forecast s n M p (x). By the assumpton that players choose best responses, M p (x) M b (x). Further, fm p (x) : x 2 S(1)g forms a partton of the smplex. The emprcal condtonal dstrbuton of y 2 S(2) gven that player 1 played x s P c2s(2) D(x;c) D t (x;y) Pc2S(2) Dt (x;c). Ths converges to D(x;y) as long as P c2s(2) D t (x; c) does not converge to zero. If t dd, t would mean that the proporton of tmes that x s played tends to zero. Hence, n the lmt, player 1 never plays x, so t can be gnored. To complete the proof t suces to show that the n-tuple whose y-th component s contaned n M b (x). Observe that: D t (x; y) = t?1 = t?1 = t?1 = t?1 + t?1 rt :f (r)2m p(x) p2m p(x) p2m p(x) p2m p(x) p2m p(x) (y; r) rt :f (r)=p (y; r) (p; y; t )N(p; t ) p y N(p; t ) + ((p; y; t )? p y )N(p; t ) D(x;y) s D(x;c) Pc2S(2) 9

12 Snce the forecasts beng used are calbrated, the second term n the last expresson goes to zero as t tends to nnty. Note: p2m p(x) N(p; t ) p y P N(q; t ) 2 M b(x) q2m p(x) because t s a convex combnaton of vectors n M b (x) f recall, M p M b g, and M b (x) s convex. Therefore D(x; y) Pc2S(2) D(x; c) = lm t!1 p2m p(x) N(p; t) p y N(p; t) p2m p(x) whch s then the y th component of a vector n M b (x) also. We have shown that any sequence fd t (x; y)g contans a convergent subsequence whose lmt s a CE. The theorem now follows. 2 In some sense the result above s not surprsng. We know from Mlgrom and Roberts (1991) f players use best responses they elmnate domnated strateges. Secondly, the calbraton requrement forces lmt ponts to satsfy an addtonal equlbrum requrement. Correlaton arses because players are able to condton on prevous plays. It s natural to ask f Theorem 1 would hold wth a non-statonary tebreakng rule. The followng verson of matchng pennes shows that ths s not possble. In each round the row player wll forecast that there s a 50% Matchng Pennes h t H 1n-1-1n1 T -1n1 1n-1 chance that column wll play heads and a 50% chance that column wll play 10

13 tals,.e., (0.5, 0.5) s the forecast. The column player wll do lkewse. Gven these forecasts there s a te for the best reply. Consder the followng te breakng rule: on even numbered rounds play heads and tals on the other rounds. Notce that the resultng sequence of plays wll be: Tt, Hh, Tt, Hh, : : :. Clearly the forecasts of each player are calbrated, but the dstrbuton of plays does not converge to a CE. Theorem 1 rases the queston of how a calbrated forecast s to be produced. Oakes (1985), has shown that there s no determnstc forecast that s calbrated for all possble sequences of outcomes. Our requrements are more modest. Gven a game, and a correlated equlbrum of ths game, s there a sequence of plays and a determnstc forecastng rule dependng only on observed hstores that s calbrated? The next theorem provdes a postve answer to ths queston. Denton: Call a pont of the dstrbuton D(x; y) a lmt pont of calbrated forecasts f there exst determnstc best reply functons R () and calbrated forecastng rules p such that f each player, plays R (p ), then the lmtng jont dstrbuton wll be D(x; y). Denton: calbrated forecasts. Let be the set of all dstrbutons whch are lmt ponts of Usng ths notaton we can restate Theorem 1 as sayng that. We can represent every game by a vector n < 2mn, where each component corresponds to a players payo. A set of games s of measure zero f the correspondng set of ponts n < 2mn has Lebesgue measure zero. 11

14 Theorem 2 For almost every game (G) = (G). In other words, for almost every game, the set of dstrbutons whch calbrated learnng rules can converge to s dentcal to the set of correlated equlbrums. Proof: Because of Theorem 1 we need only prove that. Let (x j ; y j ) be a determnstc computable sequence such that the lmtng jont dstrbuton s D(x; y). At tme j, have player 1 forecast and player two forecast p 1;j () = D(x j ; ) p 2;j () = D(; y j ), y2s(2), x2s(1) D(x j ; y) D(x j ; y) : By the assumpton that the jont dstrbuton converges to D(x; y), t s clear that both of these forecasts are calbrated. Further, x j s n fact a best response to the forecast p 1;j (), and y j to p 2;j (). So, dene R 1 (p) such that for all j, x j = R 1 (p 1;j ) and smlarly for R 2 (p). These forecasts and these best reply functons are the key dea of the proof. In fact, n the stuaton where R 1 () and R 2 () are both well dened we have completed the proof. But, R 1 () and R 2 () mght not be well dened. In other words, there mght be two derent strateges x 0 and x 00 such that x j 0 = x 0 and x j 00 = x 00, then p 1;j 0() = p 1;j 00() = p. Ths s where the \almost every game" condton comes nto play. Almost every game has the property that all the sets M b (x) have nonempty nteror. To see why ths s the case, observe that M b (x) s formed by the ntersecton of half-spaces. Start wth a closed convex set wth non-empty nteror, C, say and add these half-spaces one at a tme. We can choose C 12

15 to be the smplex of all mxed strateges. Consder a half-space H, chosen at random such that the coecents that dene H are contnuous wth respect to lebesgue measure. We clam that the ntersecton of C and H s ether the empty set, or a set wth an open nteror. Pck a pont p n the nteror of C. Let q be the pont n the boundary of H whch s closest to p. Let v be the ray from p to q and d ts length. Both v and d have contnuous dstrbutons snce they are a contnuous transformaton of the half-space H. Now consder dstrbuton of d condtonal on v. Gven v there s a unque d such that H wll be tangent to C and not contan C. The condtonal probablty of d takng ths value s 0. Hence the uncondtonal probablty s zero also. The nterors of the sets of the form M b (x) are dsjont. 3 Thus, near the pont p there are ponts p x0 and p x00 such that the unque best response to p x0 s x 0 and the unque best response to p x00 s x 00. Forecastng p x0 or p x00 nstead of p makes the reply functon well dened. Unfortunately, when the forecast of p x0 s made, the actual frequency wll turn out to be p. Thus, the calbraton score wll be o by at most jp x0? p j. If we can choose p x0 be convergent to p solves ths last problem and our proof s complete. Dene a sequence p x0 p and for all, p x0 = (1? 1=)p + (1=)p x0. Then p x0 to converges to has x 0 as ts unque best reply. For each forecast p x0 sucently many tmes to ensure that there s a hgh probablty that the emprcal dstrbuton s wthn 1= of p. Wth hgh probablty the emprcal frequency condtonal on forecast p x0 wll be wthn 2= of p x0 and hence the calbraton score wll converge to zero. 2 3 The nterors and the unon of the boundares would form a partton. 13

16 To see why theorem 2 only holds for almost every game and not every game, consder the followng game: Example of 6= A 2n2 0n3 0n1 B 2n2 0n1 0n3 C 2n0 1n1 1n0 If ROW randomzes between A and B (wth equal probablty) and COL plays 1, then ths s a Correlated Equlbrum wth a payo of (2,2). But, the only pont n s the dstrbuton whch puts all ts weght on pont (C,2) whch yelds a payo of (1,1). Ths s because: M b (A) = M b (B) = f(1; 0; 0)g and M b (c) s the entre smplex. So, f R ROW ((1; 0; 0)) = A, then ROW wll never play strategy B, and lkewse f R ROW ((1; 0; 0)) = B, then ROW wll never play A. So, a mxture of A and B s mpossble and thus the payo (2,2) s mpossble. Thus, 6=. Can Theorem 1 be strengthened such that convergence to Nash Equlbrum s assured nstead of to a CE? The prevous theorem shows f one assumes only calbraton, one gets any CE n. So, wthout further assumptons on the forecastng rule, convergence to Nash cannot be assured. In partcular by addng an assumpton that the lmt exsts does not rene the equlbrum attaned (n contrast wth Fudenberg and Kreps who show that f a lmt exsts, t must be Nash). Ths s because Theorem 2 does not just nd an accumulaton pont t nds a drect lmt. Is t easy to construct a forecast that s calbrated? Gven the mpossblty theorem of Oakes (1985) the exstence of a determnstc scheme that s calbrated for all sequences s ruled out. However, a randomzed forecastng 14

17 scheme s possble. Theorem 3 ( Foster and Vohra 1991) There exsts a randomzed forecast that player 1 can use such that no matter what learnng rule player 2 uses, player 1 wll be calbrated. That s to say, player 1 0 s calbraton score C t p j2s(2) N(p; t) j(p; j; t)? p j j t (1) converges to zero n probablty. lm t!1 P (C t < ) = 1: In other words, for all 00 we have that Proof: See the appendx. The mportant thng to notce about Theorem 3 s that each player can ndvdually choose to be calbrated. The other player can not fol ths choce. Player 1 does not have to assume that player 2 s usng an exchangeable sequence, nor that the player 2 s ratonal. Player 1 s stll calbrated f player 2 plays any arbtrary sequence. Secondly, the proof s constructve,.e., there s an explct algorthm for producng such a forecast. 4 To extend ths result to the n-person case the forecastng rules must predct the jont dstrbuton of what everyone else wll play. If n Theorem 1 we requre only that the players use a forecastng rule that s close to calbrated n the sense of Theorem 3, we obtan: Corollary There exsts a randomzed forecastng scheme, such that f both player 1 and player 2 follow ths scheme, then FOR ANY normal form matrx game and for all > 0, there exsts a t 0 > 0, such that for all t > t 0, P (mn max jd t(x; y)? D(x; y)j < ) > 1? : D2 x;y 4 The most nvolved step s nvertng a matrx. 15

18 In other words, D t converges n probablty to the set under the Hausdor topology. 4 The Shapley Game and Fcttous Play The most famous of learnng rules for games s called Fcttous Play (FP), rst conceved n 1949 by George Brown. In a two person game t goes as follows: Denton: Denton of Fcttous Play: Row computes the proporton of tmes up to the present that Column has played each of hs/her strateges. Then, Row treats these proportons as the probabltes that Column wll select from among hs/her strateges. Row then selects the strategy that s hs/her best response. Column does lkewse. In 1951 Jula Robnson proved that FP converges to a NE n 2 person zero sum games. After the Robnson paper, nterest naturally turned to tryng to generalze Robnson's theorem to non-zero sum games. In 1961, K. Myasawa proved that FP converges to a NE n 2-person non-zero sum games where each player has at most two strateges. 5 However, n 1964 Lloyd Shapley dashed hopes of a generalzaton by descrbng a non-zero sum game consstng of three strateges for each player n whch FP dd not converge to a NE. In ths secton we show that FP doesn't converge to a Correlated Equlbrum. We use Shapley's orgnal example: 5 See Monderer and Shapley (1993) for other stuatons n whch FP converges. 16

19 Payo Matrx for Shapley Game n0 0n1 0n0 2 0n0 1n0 0n1 3 0n1 0n0 1n0 As observed by Shapley, FP n ths game wll oscllate between 6 states, (1,1) then (1,2), then (2; 2); (2; 3); (3; 3); (3; 1), then repeat. Fcttous play stays longer and longer n each state, so the perods of oscllaton get larger and larger. There s only one Correlated equlbrum wth support on these sx states. 6 It assgns probablty 1=6 to each state. Fcttous play s never close to ths dstrbuton. 7 Thus, t does not converge to a CE. 6 Usng Nau and McCardle (1990) the followng lnear program produces all the CE. p 11 p 12 p 22 p 23 p 33 p 31 p 11 ; p 13 p 11 ; p 13 p 23 ; p 21 p 22 ; p 21 p 31 ; p 32 p 33 ; p 32 p 12 : Whch s equvalent to the LP : p 11 = p 12 = p 22 = p 23 = p 33 = p 31 = p 11 ; p 13 p 11 ; p 21 p 11 ; p 32 p 11. Addng the constrant that p 13 = p 21 = p 32 = 0, ths LP has a unque soluton of p 11 = p 12 = p 22 = p 23 = p 33 = p 31 = 1=6. 7 Ths can be see ether by drect calculaton, or by the followng trck. If Fcttous play was ever close to ths CE, then the margnals would have to be close to (1=3; 1=3; 1=3). But, these margnals correspond to the Nash Equlbrum. Shapley created ths example precsely to show that the margnals ddn't converge to the margnals of the Nash equlbrum, n fact the margnals are bounded away from the (1=3; 1=3; 1=3) pont. Thus the Nash equlbrum s not an accumulaton pont of the sequence of plays. Thus, we know that the margnals are never close to beng correct, and thus the jont dstrbuton s also never close. 17

20 The Shapley game s nterestng because t has a CE whch s not a mxture of Nash Equlbrums. 8 Theorem 3 tells us that there are calbrated learnng rules whch wll then converge to ths CE. The expected payo s (1=2; 1=2) whch Pareto domnates the Nash payo of (1=3; 1=3). Postscrpt Earler versons of ths paper as well as presentatons of the results at varous conferences have generated a deal of follow on papers on calbraton and ts connectons to game theory. In ths secton, we gve a bref descrpton of some of ths work. Theorem 3, whch establshes the exstence of randomzed forecastng scheme that s calbrated has prompted a number of alternatve proofs. The rst of these was due to Sergu Hart (personal communcaton) and s partcularly smple and short. It makes use of the mn-max theorem. The draw back s that the scheme mpled by the method s mpractcal to mplement. Independently, Fudenberg and Levne (1995) also gave a proof usng the mnmax theorem. The approach s more elaborate than Hart's but produces a forecastng scheme that s practcal to mplement. In a follow up paper Fudenberg and Levne [1996] consder a renement of the calbraton dea that nvolves the classcaton of observatons nto varous categores. For ths renement they derve a procedure that yelds almost as hgh a tme-average payo as could be obtaned f the player chooses knowng the condtonal dstrbuton of actons gven categores. If players use such a procedure, long run the tme average play resembeles a correlated equlbrum. 8 The unque Nash Equlbrum for ths game s (1=3; 1=3; 1=3) vs (1=3; 1=3; 1=3). So, any CE whch sn't Nash, s also not a mxture of Nash Equlbrums. 18

21 Our own proof of Theorem 3 (whch s descrbed n the appendx) s based on establshng the exstence of a forecastng scheme that has a property called no-regret. An proof along the lnes of Theorem 1 shows that a noregret procedure would also lead to a correlated equlbrum. Hart and Mas- Collel (1996) have extended ths dea n many ways. Frst they proved a very elegant proof of no-regret based on Blackwell's (195?) vector mn-max theorem. Second they modfy ths scheme whch requres a matrx nverson to one that nvolves regret-matchng. Ths greatly reduces the computatons requred to mplement the procedure. The smpled procedure no longer has the no-regret property but t wll converge to a correlated equlbrum. Ther theorem s much harder to prove snce they can't smply appeal to a no-regret/calbraton property as we have done. Kala, Lehrer and Smorodnsky (1996) have recently shown that the noton of calbraton s mathematcally equvalent to that of mergng. Ths allows one to establsh relatonshps between convergence results based on mergng and those based on calbraton and so derve some new convergence results. Appendx Ths appendx provdes a telegraphc proof of Theorem 3. For more detals see Foster and Vohra (1991). We wll rst prove a property called \no-regret." Consder k forecasts each wth a loss or penalty at tme t of 0 L t 1 for = 1; : : : ; k. Now consder a randomzed forecast whch pcks forecast at tme t wth probablty w t. We dene the loss from usng the combned forecast to be the weghted sum of the losses of each forecast, namely, P k =1 w tl t. 19

22 Denton: s R!j T The regret generated by changng all forecasts to j forecasts maxf0; S j T g = S j T I S j T >0 where I x>0 s the ndcator functon and S j T T t=1 We choose the probablty vector w t conservaton equatons: (8) k w t j=1 w t(l t? L j t): so that t satses the followng ow R!j t?1 = k j=1 w j tr j! t?1 : The dualty theorem of lnear programmng can be used to establsh the exstence of a non-negatve soluton w t to ths system such that P k =1 w t = 1. Lemma 1 (No-regret) For all and j the regret grows as the squareroot of T. In partcular, R!j T p 2kT. Proof: Let G (x) x2 2 I x>0. Snce x G (x) we see that R!j T G (S j T ) j G (S j T ) Now G 0 (x) = xi x>0 and so j (S j t? S j t?1)g 0 (S j t?1) = j = = 0 w t(l t? L j t)(s j t?1i S j L t w t R!j t?1? j t?1 >0) w j t R j! t?1 j {z } = 0 by ow conservaton Expandng G (S j t ) as a two term Taylor seres around S j t?1 shows j G (S j t ) j G (St?1) j + (S j t? St?1)G j 0 (Sj t?1) + (w t )2 (L? t Lj t) 2 j j 20

23 j j G (S j t?1) + k G (S j t?1) + k: (w t )2 Computng the recursve sum we see that P j G (S j T ) T k and so R!j T 1 + T k. Pckng = 1=p 2kT shows R!j 2 T p 2T k. 2 We wll now show that for a sutable loss functon, a randomzed forecast that has no regret must also be calbrated. Frst, our forecastng scheme wll choose n each round a probablty vector from the set fp j = 0; 1; : : : ; kg whch s chosen so that any probablty dstrbuton over S(2) (the opponents strateges) s wthn of one of these ponts. We denote the move made by player n 2 by the vector t = [ t;1 ; t;2 ; t;3 ; : : :] where t;j = 1 f strategy j 2 S(2) was chosen and zero otherwse. Notce that t wll be a 0-1 vector wth exactly on non-zero component. Next, the loss ncurred n round t from forecastng p wll be L t = j t? p j 2 = P j2s(2) j t;j? p j j2. The probablty of forecastng p at tme t wll be w t. We would lke to choose the w t 's so that L-2 calbraton C 2 (t) goes to zero n probablty as t gets large, where C 2 (t) = p The expected value of C 2 (t) s gven by: E(C 2 (t)) = t k ((p; j; t)? p j ) 2N(p; t) t s=1 =1 j2s(2) 21 w t( t (p ; j; s)? p j) 2 =s:

24 Smple algebra yelds max j R!j t =t E(C 2 (t)) + max R!j t =t j If the probabltes w t 's are chosen to satsfy the ow conservaton equatons dsplayed earler, we deduce that E(C 2 (t)) + O( k p t ): Thus f we let k grow slowly and go slowly to zero, we see that C 2 (t)! 0 n expectaton whch mples C 2 (t)! 0 n probablty by Jensen's nequalty. The L-1 calbraton denton of equaton (1) follows from the fact that t s smaller than the square root of the L-2 calbraton. Thus we have proved Theorem 3. REFERENCES Aumann, R. J., \Subjectvty and Correlaton n Randomzed Strateges", Journal of Mathematcal Economcs, 1, 67-96, Aumann, R. J., \Correlated Equlbrum as an Expresson of Bayes Ratonalty", Econometrca, 55, #1, 1-18, Blackwell, D., \A vector valued analog of the mn-max theorem," Pacc J. Math. 6, 1-8, Dawd, A. P., \The Well Calbrated Bayesan", Journal of the Amercan Statstcal Assocaton, 77, #379, , Foster, D. P., and R. Vohra \Asymptotc Calbraton", unpublshed manuscrpt, Foster, D. P., and H. P. Young \Stochastc Evolutonary Game Dynamcs," Theoretcal Populaton Bology, 38, ,

25 Fudenberg, D. and D. Kreps, \Expermentaton, Learnng, and Equlbrum n Extensve Form Games", unpublshed notes, Fudenberg, D. and D. Levne, \An easer way to Calbrate", unpublshed manuscrpt, Fudenberg, D. and D. Levne, \Condtonal Unversal Consstency", unpublshed manuscrpt, Hart, S. and Mas-Collel, A., \A Smple Adaptve Procedure Leadng to Correlated Equlbrum," unpublshed manuscrpt, Kala, E. and E. Lehrer, \Subjectve Games and Equlbra", unpublshed manuscrpt, Kala, E., E. Lehrer and R. Smorodnsky, \Calbrated Forecastng and Mergng", manuscrpt, Kreps, D. \Game Theory and Economc Modelng," Oxford Unversty Press, Oxford, 1991a. Kreps, D., Semnar on Learnng n Games, Unversty of Chcago, 1991b. Mlgrom, P. and J. Roberts \Adaptve and Sophstcated Learnng n Normal Form Games", Games and Economc Behavor, 3, , Monderer, D. and L. Shapley \Fcttous Play Property for games wth Identcal Interests," workng paper, Department of Economcs, UCLA, Myasawa, K., \On the Convergence of the Learnng Process n a 2 2 Non-zero- sum Two-person Game," Economc Research Program, Prnceton Unversty, Research Memorandum #33 (1961). Nau, R. F. and K. McCardle, \Coherent Behavor n Non-cooperatve Games," Journal of Economc Theory, 50, #2, Oakes, D. \Self-Calbratng Prors Do Not Exst", Journal of the Amercan Statstcal Assocaton, 80, 339, Robnson, J. \An Iteratve Method of Solvng a Game" Annals of Math- 23

26 ematcs, 54, , Samuelson, L. and J. Zhang, \Evolutonary Stablty n Asymmetrc Games", Journal of Economc Theory, 57, , Shapley, L. \Some Topcs n Two-Person Games," Advances n Game Theory, Prnceton Unversty Press, Prnceton Skyrms, B. The Dynamcs of Ratonal Delberaton, Harvard Unversty Press,

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player