Perfect Versus Imperfect Monitoring in Repeated Games

Perfect Versus Imperfect Monitoring in Repeated Games Takuo Sugaya and Alexander Wolitzky Stanford Graduate School of Business and Stanford University April 5, 2014 Abstract We show that perfect monitoring is not necessarily the optimal monitoring technology in repeated games, unless players are both patient and have access to a mediator who can condition her recommendations on the entire history of actions and recommendations. This claim follows from three results. First, players can be better o with unmediated imperfect (public or private) monitoring than with mediated perfect monitoring. Second, the folk theorem holds with mediated perfect monitoring without any full-dimensionality conditions, so imperfect monitoring cannot improve on mediated perfect monitoring when players are patient. Third, if the mediator can condition her recommendations on actions only, then even patient players can bene t from imperfect monitoring. 1 Introduction Intertemporal incentives can only be provided when actions are monitored. The purpose of this note is to examine whether it is always the case that the strongest possible intertemporal incentives can be provided when monitoring is perfect, or alternatively if individuals can For helpful comments, we thank Michihiro Kandori, Hitoshi Matsushima, Stephen Morris, Tadashi Sekiguchi, and Andy Skrzypacz. 1

sometimes bene t from imperfections in monitoring. While our contribution is purely conceptual, this question seems potentially relevant for understanding the role of informationsharing institutions in settings where repeated game modeling is prevalent, such as longdistance trade (Milgrom, North, and Weingast, 1990; Greif, Milgrom, and Weingast, 1994), collusion (Stigler, 1964; Harrington and Skrzypacz, 2011), organizational economics (Baker, Gibbons, and Murphy, 1994; Levin, 2003), and contemporary international trade (Maggi, 1999; Bagwell and Staiger, 2004). In particular, we show that perfect monitoring is not necessarily the optimal monitoring technology in repeated games settings like these, unless players are both patient and have access to a mediator. An obvious way in which imperfect monitoring can help players in a repeated game is that it can give them a way to correlate their actions. We rule out this e ect by assuming that with perfect monitoring the players have access to a mediator, who stands in for the players ability to observe the outcomes of arbitrarily complicated correlating devices (Forges, 1986; Myerson, 1986). The comparison in this note is thus between perfect monitoring with a mediator and unmediated imperfect monitoring. We remain agnostic as to whether or not the plentiful opportunities for correlation captured by the mediator are realistic; we allow a mediator simply to compare perfect and imperfect monitoring on a level playing eld. We present three results. First, players can be better o with imperfect (public or private) monitoring than with perfect monitoring, even with a general dynamic mediator who can condition recommendations on the full history of actions and recommendations. Second, the folk theorem holds with perfect monitoring and a dynamic mediator without any full-dimensionality conditions, so patient players cannot do (much) better with imperfect monitoring than with perfect monitoring and a dynamic mediator (and the rate of convergence is 1, so players need only be moderately patient). Third, even patient players can do better with imperfect monitoring than with perfect monitoring and a Markov mediator who can condition recommendations only on actions (the Markov mediator is so called because it can be replaced each period by a new mediator who knows only the publicly observable history of actions). Taken together, our results show that perfect monitoring is not always the optimal monitoring technology in repeated games, but that it is approximately optimal when players are patient and can access a su ciently powerful mediator. 2

An important feature of our results is that we do not restrict attention to equilibria in public strategies. Indeed, Kandori (1992) shows that for such equilibria players can only bene t from more precise public monitoring, so our result that players can bene t from imperfect public monitoring necessarily involves private strategies. In addition, the key reason why even patient players can be better o with imperfect monitoring than with perfect monitoring and a Markov mediator is that in the latter case all strategies are public. The role of private strategies in our results is also quite di erent from that in Kandori and Obara (2006a), where private strategies are used to improve monitoring. One way of interpreting our results is in the light of previous attempts to make predictions in repeated games that are robust to the information structure. 1 A common approach in the literature is to derive a recursive upper bound on the equilibrium payo set in an arbitrary repeated game with private monitoring by adding opportunities for communication or correlation to these games (Renault and Tomala, 2004; Tomala, 2009; Cherry and Smith, 2011). Our rst result implies that even the equilibrium payo set with perfect monitoring and a dynamic mediator does not provide such a bound in general. bound may be possible for patient players or for speci c classes of games. 2 However, a tighter Finally, we note that examples by Kandori (1991), Sekiguchi (2002), Mailath, Matthews, and Sekiguchi (2002), and Miyahara and Sekiguchi (2013) show that players can bene t from imperfect monitoring in nitely repeated games. However, in their examples this conclusion relies on the absence of a dynamic mediator, and is thus ultimately traceable to the possibilities for correlation opened up by private monitoring. We illustrate this in Section 6 below. The broader point that players can bene t from less monitoring of each other s actions in games has also been made in various settings, including by Myerson (1991, Section 6.7), Bagwell (1995), Kandori and Obara (2006b), and Fuchs (2007); the basic idea that information can be bad for incentives has been known at least since Hirshleifer (1971). 1 This exercise is similar to that performed by Bergemann and Morris (2012) in the context of mechanism design; and by Lehrer, Rosenberg, and Shmaya (2010, 2013) and Bergemann and Morris (2013a,b) in the context of one-shot games. 2 We also show in a separate note (available from the authors) that, in repeated games with perfect monitoring and dynamic mediation, there does exist a recursive characterization of equilibrium payo s. 3

2 Model The stage game is G = I; (A i ) i2i ; (u i ) i2i, where I = f1; : : : ; ng is the set of players, Ai is the nite set of player i s actions, and u i : A! R is player i s payo function. Players have common discount factor. The solution concept is sequential equilibrium. We compare the following information structures. 2.1 Mediated Perfect Monitoring In period t = 1; 2; : : :, the game proceeds as follows. 1. Given the mediator s history h t m = (m ; a ) t 1 =1 with h1 m = f;g, the mediator sends a private message m i;t 2 M i to each player i, where M i is an arbitrary nite set. H t m be the set of the mediator s period t histories. 2. Given player i s history h t i = (m i; ; a ) t 1 =1 ; m i;t with h 1 i = m i;1, player i takes an action a i;t 2 A i. All players and the mediator observe action pro le a t 2 A. Let H t i be the set of player i s period t histories. Let A strategy of the mediator s is a map m : S 1 t=1 Ht m! (M). The mediator is Markov if m (h t m) depends only on (a ) t 1 =1 for all ht m. A strategy of player i s is a map i : S 1 t=1 Ht i! (A i ). Note that the de nition of sequential equilibrium depends on whether we treat the mediator as a machine who cannot tremble or as a player who can. In the machine interpretation, sequential rationality is imposed (and beliefs are de ned) only at histories consistent with the mediator s strategy. In the player interpretation, sequential rationality is imposed everywhere, and the mediator is treated like any other player in the de nition of sequential equilibrium. We adopt the machine interpretation throughout, but all our results also hold with the player interpretation; the only change would be a slight alteration to the proof of Theorem 2. 4

2.2 Private Monitoring In period t = 1; 2; : : :, the game proceeds as follows. Given player i s history h t i = (a i; ; z i; ) t 1 =1 with h1 i = f;g, player i takes an action a i;t 2 A i. Signal z t = (z i;t ) i2i 2 Z is drawn from distribution p (zja). Player i observes z i;t. Again, H t i is the set of player i s period t histories, and a strategy of player i s is a map i : S 1 t=1 Ht i! (A i ). Note that player i may not observe her payo s each period. The standard interpretation is that represents the probability of the game s continuing, and that payo s are received when the game ends. This assumption is made purely for ease of exposition. In particular, both of our private monitoring examples may be easily modi ed so that players receive payo s each period (i.e., player i s realized payo is a deterministic function of a i and z i, and u i (a) is the ex ante expected payo ). 3 Private Monitoring Versus Perfect Monitoring with Dynamic Mediation Our rst example shows that private monitoring (without mediation) can outperform perfect monitoring with a dynamic mediator. At the end of this section, we point out how essentially the same example shows that imperfect public monitoring (with private strategies) can also outperform mediated perfect monitoring. Player 1 (row player, she ) and player 2 (column player, he ) play a repeated game with common discount factor = 1 6. The stage game is as follows. L M R U 2; 2 1; 0 1; 0 D 3; 0 0; 0 0; 0 T 0; 3 6; 3 6; 3 B 0; 3 0; 3 0; 3 Our rst result is the following. The next two subsections provide the proof. 5

Theorem 1 In this game, there is no sequential equilibrium where the players per-period payo s sum to more than 3 with perfect monitoring and a dynamic mediator, while there is such a sequential equilibrium for some private monitoring technology. In fact, the same is true for Nash equilibrium. This is a second point of contrast with the examples of Kandori (1991), Sekiguchi (2002), Mailath, Matthews, and Sekiguchi (2002), and Miyahara and Sekiguchi (2013), which work only for sequential equilibrium. The intuition for Theorem 1 is that player 1 can be induced to play U in response to L only if action pro le (U; L) is immediately followed by (T; M) with high probability. With perfect monitoring, player 2 always sees (T; M) coming after (U; L), and will therefore deviate to L. With private monitoring, player 2 may not know whether (U; L) has just occurred, and therefore may be unsure of whether the next action pro le will be (T; M) or (B; M), which gives him the necessary incentive to play M. 3.1 Perfect Monitoring with Dynamic Mediation As the players stage game payo s from any pro le other than (U; L) sum to at most 3, it follows that the players per-period payo s may sum to more than 3 only if (U; L) is played in some period t with positive probability. For this to occur in equilibrium, player 1 s expected continuation payo from playing U must exceed her expected continuation payo from playing D by more than 1, her instantaneous gain from playing D rather than U. addition, player 1 can guarantee herself a continuation payo of 0 by always playing D, so her expected continuation payo from playing U must exceed 1. This is possible only if the probability that (T; M) is played in period t + 1 when U is played in period t exceeds the number p such that or p = 3 5. 1 6 [p (6) + (1 p) (3)] + 1 1 6 6 + 1 6 + (6) = 1; {z 2 } 1 5 In particular, there must exist a period t + 1 history ht+1 2 of player 2 s such that (T; M) is played with probability at least 3 5 in period t + 1 conditional on reaching ht+1 2. In 6

But, at such a history, player 2 s payo from playing M is at most 3 5 ( 3) + 2 5 (3) + 1 (3) = 0; 5 while, noting that player 2 can guarantee himself continuation payo 0 by playing 1 2 L + 1 2 M, player 2 s payo from playing L is at least 3 5 (3) + 2 5 ( 3) + 1 5 (0) = 3 5 : Therefore, player 2 will deviate at such a history, so no such equilibrium can exist. 3.2 Private Monitoring Consider the following imperfect private monitoring structure. Player 1 always observes an uninformative signal ;. There are two possible private signals for player 2, m and r. Conditional on any action pro le in which player 1 plays U, signal m obtains with probability 1. Conditional on any other action pro le, signals m and r obtain with probability 1 2 each. We now describe a strategy pro le under which the players payo s sum to 23 7 3:29. Player 1 s strategy: In each odd period t = 2n + 1 with n = 0; 1; :::, player 1 plays 1 U + 2D. Let a 3 3 1(n) be the realization of this mixture. In the even period t = 2n + 2, if the previous action a 1 (n) = U, then player 1 plays T ; if the previous action a 1 (n) = D, then player 1 plays B. Player 2 s strategy: In each odd period t = 2n + 1 with n = 0; 1; :::, player 2 plays L. Let y 2 (n) be the realization of player 2 s private signal. In the even period t = 2n + 2, if the previous private signal y 2 (n) = m, then player 2 plays M; if the previous signal y 2 (n) = r, then player 2 plays R. We check that this strategy pro le, together with any consistent belief system, is a sequential equilibrium. In an odd period, player 1 s payo from U is the solution to v = 2 + 1 6 (6) + 1 6 2 v: 7

On the other hand, her payo from D is 3 + 1 6 (0) + 1 6 2 v: Hence, player 1 is indi erent between U and D (and clearly prefers either of these actions to T or B). In addition, playing L is a myopic best response for player 2, player 1 s continuation play is independent of player 2 s action, and the distribution of player 2 s signal is independent of player 2 s action. Hence, playing L is optimal for player 2. Next, in an even period, it su ces to check that both players always play myopic best responses, as in even periods continuation play is independent of realized actions and signals. If player 1 s last action was a 1 (n) = U, then she believes that player 2 s signal is y 2 (n) = m with probability 1 and thus that he will play M. Hence, playing T is optimal. If instead player 1 s last action was a 1 (n) = D, then she believes that player 2 s signal is equal to m and r with probability 1 each, and thus that he will play 1M + 1 R. Hence, both T and B 2 2 2 are optimal. On the other hand, if player 2 observes signal y 2 (n) = m, then his posterior belief that player 1 s last action a 1 (n) = U is 1 (1) 3 1 (1) + 2 3 3 = 1 1 2 : 2 Hence, player 2 is indi erent among all of his actions. If player 2 observes y 2 (n) = r, then his posterior is that a 1 (n) = D with probability 1, so that M and R are optimal. 3 Finally, expected payo s under this strategy pro le in odd periods sum to 1 3 (4) + 2 3 (3) = 10, and in even periods sum to 3. Therefore, per-period expected payo s sum to 3 1 1 10 6 3 + 1 1 6 (3) + 16 + 16 + : : : = 23 2 4 7 : 3 These indi erences result because we have chosen payo s to make the example as simple as possible. The example is generic. 8

3.3 Imperfect Public Monitoring The following slight modi cation of the above example shows that imperfect public monitoring can also outperform perfect monitoring with a dynamic mediator. Modify the example by adding a strategy L 0 for player 2, which is an exact duplicate of L as far as payo s are concerned. Let monitoring be perfect, except that if player 1 plays U or D and player 2 plays L or L 0, a public signal m or r is generated as follows. p (mj (U; L)) = 1; p (rj (U; L)) = 0; p (mj (U; L 0 )) = 0; p (rj (U; L 0 )) = 1; p (mj (D; L)) = p (rj (D; L)) = p (mj (D; L 0 )) = p (rj (D; L 0 )) = 1 2 : Thus, the relevant change in the monitoring technology is that now the signal m or r is public, but player 2 has an additional action L 0 that ips the interpretation of the signal. Essentially the same argument as with private monitoring shows that the following strategy pro le, together with any consistent belief system, is a sequential equilibrium with total per-period payo s of 23. 7 Player 1: Same as with private monitoring. Player 2: In odd periods, play 1L + 1 2 2 L0. In even periods, if he played L in the previous (odd) period, play M after observing signal m and play R after observing signal r. If he played L 0 in the previous period, play R after observing signal m and play M after observing signal r. 4 The Folk Theorem with Perfect Monitoring and Dynamic Mediation We now show that the folk theorem always holds with perfect monitoring and dynamic mediation, with a rate of convergence of 1. Therefore, perfect monitoring is an approximately optimal monitoring technology when players are moderately patient and a dynamic mediator is available. 9

Fix a mediated perfect monitoring game. Let u i be player i s correlated minmax payo, given by u i = min i 2(A i ) max u i (a i ; i ) : a i 2A i Let d i be the greatest possible di erence in player i s payo resulting from a change in player i s action only, so that d i = max u i (a 0 i; a i ) u i (a) : a2a;a 0 i 2A i Let u = (u 1 ; : : : ; u n ) and d = (d 1 ; : : : ; d n ). A payo vector v 2 R n is feasible if v = u () for some 2 (A). Let F R n denote the set of feasible payo vectors, and let F denote the relative interior of this set. Say that v is strictly individually rational if v > u. Note that this is more permissive than the usual notion of strict individual rationality as it is de ned relative to correlated minmax payo s. set. Our result is the following. Theorem 2 If a payo vector v 2 F satis es Let E () denote the sequential equilibrium payo v u + 1 d; (1) then v is a sequential equilibrium payo vector. That is, v 2 F : v u + 1 d E () : With the caveat that it does not address the attainability of payo vectors on the boundary of F, Theorem 2 is substantially stronger than the folk theorem, as it explicitly gives a discount factor su cient for each feasible and strictly individually rational payo vector to be attainable in sequential equilibrium. In particular, the rate of convergence of E () to F is 1. We nonetheless record the folk theorem as a corollary. Corollary 1 (Folk Theorem) For every strictly individually rational payo vector v 2 F, there exists < 1 such that if > then v 2 E (). Proof. Take = max i2i d i v i u i +d i and apply Theorem 2. 10

In light of the proof of Theorem 2 below, Corollary 1 may be extended to cover strictly individually rational payo vectors v on the Pareto frontier of F as follows: The mediator recommends correlated actions to achieve v with probability 1 in the absence of a deviation. If a player unilaterally deviates, play transitions to a sequential equilibrium yielding a payo v 0 2 F that is strictly individually rational and Pareto dominated by v (such a v 0 must exist as v > u and F is convex). Such a punishment equilibrium exists by Theorem 2. Given this punishment, it is optimal to follow the recommendations on-path for su ciently high. The intuition behind Theorem 2 is that the mediator can recommend correlated actions leading to the target payo v with high probability, while recommending every action pro le with positive probability. Following a deviation by player i, the mediator can recommend that i s opponents minmax her forever, without alerting them to the fact that a deviation occurred (in particular, i s opponents believe that the deviation was in fact recommended by the mediator, since the recommendation has full support). The key point is then that i s opponents are willing to minmax i forever even though this may be very costly for them because they never realize that a deviation has occurred and always expect to return to getting payo v in the next period. Proof of Theorem 2. Fix a payo vector v 2 F satisfying (1). Let v = 1 jaj P a2a u (a) be the payo vector that results when the players mix uniformly over all possible pure action pro les. Choose " > 0 small enough such that the payo vector v given by v = v "v 1 " is feasible; this is possible by the assumption that v 2 F. u ( ) = v. Let 2 (A) be such that Consider the following mediator s strategy. There are n + 1 states, a regular state R and a punishment state for each player i, P i. Begin in state R. In state R, recommend with probability 1 pro le with probability " jaj (that is, recommend (1 ", and recommend each pure action ") + P a2a one player i does not follow her recommendation, go to state P i. state R. 11 " a). jaj If exactly Otherwise, stay in

In state P i, recommend ^ i to players i, for some ^ i 2 arg min i 2(A i ) and recommend ^a i to player i, for some max u i (a i ; i ) ; a i 2A i ^a i 2 max a i 2A i u i (a i ; ^ i ) : Stay in state P i. We check that it is optimal for players to obey the mediator s recommendations. We rst claim that after any history of actions and private recommendations, each player i either assigns probability 1 to the state being R or assigns probability 1 to the state being P i. More speci cally, we claim that if player i has disobeyed the mediator more recently than she last received a recommendation a i 6= ^a i, then she is certain that the state is P i ; while if player i received a recommendation a i 6= ^a i more recently than she has disobeyed the mediator (or if she has never disobeyed the mediator), then she is certain that the state is R. To see this, rst note that if player i has never disobeyed the mediator then she is certain the state is R, as all possible histories of recommendations and opposing actions are on-path. Starting at a history h t i where player i is certain the state is R, if player i disobeys the mediator and is recommended ^a i in every period = t + 1; : : : ; t 0, then player i is certain the state is P i at history h t0 i. Starting from a history h t i where player i is certain the state is P i, if player i is recommended a i 6= ^a i then player i learns that in fact that state has never been P i (as once the mediator enters state P i she recommends ^a i forever, and the mediator never trembles). This can only be because another player also disobeyed the mediator in every period where player i disobeyed the mediator, and since trembles are independent across information sets this implies that the current state is R with probability 1. Thus, we have shown that player i begins the game certain that the state is R, switches from being certain that the state is R to being certain that the state is P i whenever she disobeys the mediator, and switches back to being certain that the state is R whenever she is recommended an action a i 6= ^a i. This proves the claim. 12

It remains only to check that it is optimal for each player i to obey the mediator when she is certain the state is R and when she is certain the state is P i. If player i is certain the state is R, her instantaneous gain from disobeying the current recommendation is at most (1 ) d i, while her loss in continuation payo is (((1 ") v + "v) u i ) = (v u i ) : It follows from (1) that (1 ) d i (v u i ), so obeying the recommendation is optimal. If player i is certain the state is P i, then as we have seen she must have been recommended ^a i (as at any history, receiving recommendation a i 6= ^a i convinces her that the state is R). Thus, obeying the mediator is a myopic best response, and her continuation payo is independent of her current action, so obeying is optimal. The reader may wonder whether the mediator can be dispensed with in Theorem 2 and Corollary 1 if players can communicate through cheap talk, as in the literature on implementing correlated equilibria without a mediator (see Forges (2009) for a survey). In the online appendix, we show that this is indeed possible unless there are exactly three players. 4 A similar but simpler argument shows that this is also possible if there are three players with equivalent utilities in the sense of Abreu, Dutta, and Smith (1994), and Abreu, Dutta, and Smith show that the folk theorem always holds if no pair of players has equivalent utilities. Thus, the perfect monitoring folk theorem holds with perfect monitoring and unmediated communication in all cases except when there are three players, of whom exactly two have equivalent utilities. 5 Private Monitoring Versus Perfect Monitoring with Markov Mediation It may sometimes be more natural to assume that the mediator (if available) can condition her recommendations only on actions and not on past recommendations. For example, this 4 The closest result in the literature on implementing correlated equilibria is due to Gerardi (2004), who requires at least ve players. 13

will be the case if there is a di erent mediator every period, and the period t mediator knows the publicly observable history of actions but not the private messages sent by the period t 1 mediator. More abstractly, the key feature of Markov mediation is that it preserves common knowledge of continuation play, and thus preserves the recursive structure of the repeated game. 5 This section shows that the availability of a Markov mediator is not enough for perfect monitoring to be an approximately optimal monitoring technology with patient players. That is, with Markov mediation, even patient players can bene t from private monitoring. To see this, consider the following slight modi cation of a well-known example due to Fudenberg and Maskin (1986). There are four players. Player 1 picks a row, player 2 picks a column, and player 3 picks a matrix. Player 4 is a dummy player. The payo matrix is as follows. l r l r T 1; 1; 1; 0 0; 0; 0; 100 T 0; 0; 0; 100 0; 0; 0; 100 B 0; 0; 0; 100 0; 0; 0; 100 B 0; 0; 0; 100 1; 1; 1; 0 L R Note that player 4 s payo is high when the payo of players 1, 2, and 3 is low, and vice versa. We show the following. Theorem 3 In this game, for all < 1 there is no sequential equilibrium where the players per-period payo s sum to more than 76 with perfect monitoring and a Markov mediator, while there is such a sequential equilibrium for some < 1 and some private monitoring technology. 5.1 Perfect Monitoring with Markov Mediation Fudenberg and Maskin (1986) show that, without mediation, the equilibrium payo of players 1, 2, and 3 cannot be less than 1. Their proof extends easily to the case with a Markov 4 mediator, as follows. 5 With perfect monitoring, sequential equilibrium with a Markov mediator corresponds to Tomala s (2009) perfect communication equilibrium. 14

For any correlated action pro le 2 (A), let i (1) be the probability attached to the rst action for player i 2 f1; 2; 3g (that is, to T, l, or L; we neglect the dummy player 4). Then there must be a player i who faces j (1) 1 2 and k(1) 1 2, or j(1) 1 2 and k (1) 1, for j 6= i and k 6= i; j. In the former case, playing the rst action gives player i a 2 payo no less than 1, while in the latter case, playing the second action does so. 4 With a Markov mediator, the distribution of future paths of play depends only on the public history of actions (a ) t 1 =1. Let v 2 R be the in mum over all histories of actions and all sequential equilibria of the (common) continuation payo of players 1, 2, and 3. At any history, the current correlated action pro le is common knowledge, so there is at least one player i 2 f1; 2; 3g who can guarantee an instantaneous payo of 1 4. Hence, v (1 ) 1 4 + v: That is, v 1 4. It follows that the sum of the four players equilibrium payo s is no more than 1 (1 + 1 + 1)+ 4 3 (100) = 75 3. 4 4 5.2 Private Monitoring We exhibit an information structure for which the sum of the players payo s may be arbitrarily close to 100. Each player i 2 f1; 2; 3g observes two signals, y i 2 fa; B; Og and z i 2 f0; 1g. Player 4 observes nothing. The distribution of signal y i is determined by player 3 s action, as follows. 1. If a 3 = L, then with probability " 2, all players observe y i = A; with probability " 2, all players observe y i = B; and with probability 1 2" 2, all players observe y i = O. 2. If a 3 = R, then with probability " 2, player 1 observes A, player 2 observes B, and player 3 observes O; with probability " 2, player 1 observes B, player 2 observes A, and player 3 observes O; and with probability 1 2" 2, all players observe y i = O. Note that if player 3 plays R then players 1 and 2 s signals may be miscoordinated. 15

The distribution of signal z is given by: 1. If a 1 = T and a 2 = l, or if a 1 = B and a 2 = r, then all players observe z i = 1 with probability ", and all players observe z i = 0 with probability 1 ". 2. Otherwise, all players observe z i = 0 with probability 1. Note that players 1 and 2 must coordinate in order to generate signal z 1 = z 2 = z 3 = 1. For each " > 0, we construct an equilibrium with payo s converging to 2" 1+2" for players 1, 2, and 3 as! 1. By feasibility, player 4 s payo converges to 100. We ignore player 4 throughout the construction. Each player i 2 f1; 2; 3g has state s i;t 2 fa; B; Og in period t. The initial state is s 1;1 = s 2;1 = s 3;1 = O. For each period t, 1. If s i;t = O, then player i = 1 plays B, player i = 2 plays r, and player i = 3 plays L. If y i;t = A, go to state s i;t+1 = A. If y i;t = B, go to state s i;t+1 = B. If y i;t = O, go to state s i;t+1 = O. 2. If s i;t = A, then player i = 1 plays T, player i = 2 plays l, and player i = 3 plays L. If z i;t = 0, go to state s i;t+1 = A. If z i;t = 1, go to state s i;t+1 = O. 3. If s i;t = B, then player i = 1 plays B, player i = 2 plays r, and player i = 3 plays R. If z i;t = 0, go to state s i;t+1 = B. If z i;t = 1, go to state s i;t+1 = O. Intuitively, if player 3 deviates and plays R when s 1;t = s 2;t = s 3;t = O, then the next state might be (s 1;t+1 ; s 2;t+1 ) = (A; B) or (B; A). If the state reaches (s 1;t+1 ; s 2;t+1 ) = (A; B) or (B; A), then players 1 and 2 miscoordinate forever, yielding payo 0 for players 1, 2, and 3. Note that this punishment for player 3 s deviation is like the punishments underlying Theorem 2: players 1 and 2 never learn that a deviation has occurred, so they minmax player 3 (and themselves) forever, while always believing that they are still on the equilibrium path. We check that this strategy pro le, together with any consistent belief system, is a sequential equilibrium. 16

Let v i (s i ) be player i s equilibrium continuation value in state s i, given by v i (O) = (1 ) (0) + 1 2" 2 v i (O) + " 2 v i (A) + " 2 v i (B) ; v i (A) = (1 ) (1) + ((1 ") v i (A) + "v i (O)) ; v i (B) = (1 ) (1) + ((1 ") v i (B) + "v i (O)) : Solving this system of equations yields v i (O) = 2" 2 1 + " + 2" 2 : v i (A) = v i (B) = 1 + 2"2 1 + " + 2" 2 : It is straightforward to check that players 1 and 2 do not have an incentive to deviate: In state s i;t = O, player 1 or 2 cannot control the instantaneous utility or transition probability. In state s i;t = A or B, conforming gives payo v i (s i;t ), while deviating gives payo v i (s i;t ). For player 3, the only problematic state is s 3;t = O, as in the other states she cannot control the instantaneous utility or transition probability. player 3 s continuation payo from L and R as follows. When s 3;t = O, we can compare 1. With probability 1 2" 2, the next state is s 1;t = s 2;t = s 3;t = O regardless of a 3, so player 3 s continuation payo is independent of a 3. 2. With probability 2" 2, (a) With L, the next state is s 1;t+1 = s 2;t+1 = s 3;t+1 = A or s 1;t+1 = s 2;t+1 = s 3;t+1 = B, so player 3 s continuation payo is v 3 (A) = v 3 (B) = 1 +2"2 1 +"+2" 2. (b) With R, players 1 and 2 will be permanently miscoordinated: (s 1; ; s 2; ) = (A; B) or (B; A) for all > t. Player 3 s continuation payo is 0. Hence, player 3 will conform if 1 2" 2 1 + 2" 2 1 + " + 2" 2 : 17

As the left hand side converges to 0 while the right hand side converges to 4"3, player 3 will 1+2" conform for high enough. Finally, as desired, the players equilibrium payo is v i (O) = 2" 2 1 + " + 2" 2!!1 2" 1 + 2" : 5.3 Scope of the Example In the above example, the sum of the players equilibrium payo s is higher with private monitoring than with perfect monitoring and Markov mediation. A natural question is whether private monitoring can strictly Pareto dominate perfect monitoring with Markov mediation. That is, letting E Markov () be the equilibrium payo set with perfect monitoring and Markov mediation, and letting E Markov = lim!1 E Markov () be the limit equilibrium payo set, can we nd for every a monitoring structure and equilibrium such that the equilibrium payo v() converges to v such that v i > w i for all w 2 E Markov and i 2 I? The answer is no. Take a feasible payo vector v that strictly Pareto dominates w 2 E Markov () for some < 1. Since v is feasible, for su ciently large, there exists a sequence of action pro les (a ) T =1 with T < 1 such that the average payo from the sequence is equal to v: 1 1 T P T =1 1 u(a ) = v. 6 In addition, E Markov () is monotone by standard arguments, so w 2 E Markov ( 0 ) for all 0. Then, for su ciently large, the following strategy pro le is an equilibrium: play a t in period t (mod T ) unless there has been a unilateral deviation; after a unilateral deviation, play the equilibrium whose payo is w. It is also natural to look for su cient conditions for private monitoring not to outperform perfect monitoring with Markov mediation. If there are only two players, or if the NEU condition is satis ed as in Abreu, Dutta, and Smith (1994), then the correlated minmax folk theorem holds with perfect monitoring and Markov mediation: E Markov = fv 2 F : v ug. Under these conditions, the limit equilibrium payo set with private monitoring is a subset of E Markov. 6 See Fudenberg and Maskin (1991) for a proof. 18

6 Comparison with Kandori (1991) and Related Examples Kandori (1991), Sekiguchi (2002), Mailath, Matthews, and Sekiguchi (2002), and Miyahara and Sekiguchi (2013) present examples in which players bene t from imperfect monitoring. This section shows that in Kandori s and Sekiguchi s examples this conclusion relies on the absence of a dynamic mediator. The same is true of the examples of Mailath, Matthews, and Sekiguchi (2002) and Miyahara and Sekiguchi (2013), for similar reasons. Let NE(G) denote the set of stage game Nash equilibria of G. Condition 1 of Sekiguchi (2002), which generalizes Kandori (1991), is as follows. Condition 1 There exists a 2 A n NE(G), 2 NE(G), and p 2 (A) such that 1. For each i, the support of p i is a subset of the support of i. 2. For any i, u i ( ) max u i (a i ; p i ) > max u i (a i ; a a i 2A i a i 2A i i) u i (a ): Sekiguchi shows that in the undiscounted two-period repetition of G, there is a private monitoring structure and a corresponding equilibrium in which a is played in period 1 and is played in period 2. By having the mediator mix over recommendations as in the proof of Theorem 2, we show that the same is true with perfect monitoring and dynamic mediation. In particular, let BR i " = fa i 2 A i : u i (a i ; a i) + " u i (a )g: Consider the following mediator s strategy. In period 1, recommend a with probability 1 and recommend a i ; a i with probability jijjbr " ij for each a i 2 BR " i and i 2 I. In period 2, if there was a unilateral deviation by player i to a i 2 BR i " in period 1, recommend BR i p i ; p i. Otherwise, recommend. 19

Take " > 0 and > 0 su ciently small so that, for all i 2 I, u i ( ) max u i (a i ; p i ) > max u i (a i ; a a i 2A i a i 2A i i) " > u i (a ) + 2 max ju i (a)j + "; (2) a 2 1 max ju i (a)j : (3) a Each player i has an incentive to follow the recommendation in period 2, as if player i deviated to a i 2 BR i " then she is recommended BR i p i and expects to face p i ; while if player i did not deviate (or deviated to a i =2 BR " i ) then she believes that the Nash equilibrium is recommended, regardless of the realized period 1 action pro le. Consider period 1. When a i is recommended, player i s payo from following the recommendation is at least (1 ) u i (a ) max ju i (a)j + u i ( ): a When a i 2 BR i " is recommended, player i s payo from following the recommendation is at least u i (a ) " + u i ( ): In either case, player i s payo from deviating to a 0 i 2 BR i " is at most (1 ) max u i (a i ; a a i 2A i i) + max ju i (a)j + max u i (a i ; p i ) a a i 2A i Hence, in either case, (2) implies that deviating to a 0 i 2 BR " i is unpro table. Finally, player i s payo from deviating to a 0 i =2 BR " i when a i is recommended is at most (1 ) (u i (a ) ") + max ju i (a)j + u i ( ) ; a while her payo from deviating to a 0 i =2 BR " i when a i 2 BR " i is recommended is at most u i (a ) " + u i ( ) : Hence, in the rst case (3) implies that deviating to a 0 i =2 BR " i is unpro table, while in the second case such a deviation is clearly unpro table. 20

References [1] Abreu, D., P.K. Dutta, and L. Smith, The Folk Theorem for Repeated Games: a NEU Condition, Econometrica, 62, 939-948. [2] Bagwell, K. (1995), Commitment and Observability in Games, Games and Economic Behavior, 8, 271-280. [3] Bagwell, K.W. and R.W. Staiger (2004), The Economics of the World Trading System, MIT Press: Cambridge. [4] Baker, G., R. Gibbons, and K.J. Murphy (2002), Relational Contracts and the Theory of the Firm, Quarterly Journal of Economics, 117, 39-84. [5] Bergemann, D. and S. Morris (2012), Robust Mechanism Design, World Scienti c Publishing: Singapore. [6] Bergemann, D. and S. Morris (2013a), Robust Predictions in Games with Incomplete Information, Econometrica, 81, 1251-1308. [7] Bergemann, D. and S. Morris (2013b), The Comparison of Information Structures in Games: Bayes Correlated Equilibrium and Individual Su ciency, mimeo. [8] Cherry, J. and L. Smith (2011), Unattainable Payo s for Repeated Games of Private Monitoring, mimeo. [9] Forges, F. (1986), An Approach to Communication Equilibria, Econometrica, 54, 1375-1385. [10] Forges, F. (2009), Correlated Equilibrium and Communication in Games, Encyclopedia of Complexity and Systems Science, edited by R. Meyers, Springer: New York. [11] Fudenberg, D. and E. Maskin (1986), The Folk Theorem in Repeated Games with Discounting or with Incomplete Information, Econometrica, 54, 533-554. [12] Fudenberg, D. and E. Maskin (1991), On the Dispensability of Public Randomization in Discounted Repeated Games, Journal of Economic Theory, 53, 428-438. [13] Fuchs, W. (2007), Contracting with Repeated Moral Hazard and Private Evaluations, American Economic Review, 97, 1432-1448. [14] Gerardi, D. (2004), Unmediated Communication in Games with Complete and Incomplete Information, Journal of Economic Theory, 2004, 104-131. [15] Greif, A., P. Milgrom, and B.R. Weingast (1994), Coordination, Commitment, and Enforcement: The Case of the Merchant Guild, Journal of Political Economy, 102, 745-776. 21

[16] Harrington, J.E. and A. Skrzypacz (2011), Private Monitoring and Communication in Cartels: Explaining Recent Collusive Practices, American Economic Review, 101, 2425-2449. [17] Hirshleifer, J. (1971), The Private and Social Value of Information and the Reward to Inventive Activity, American Economic Review, 61, 561-574. [18] Kandori, M. (1991), Cooperation in Finitely Repeated Games with Imperfect Private Information, mimeo. [19] Kandori, M. (1992), The Use of Information in Repeated Games with Imperfect Monitoring, Review of Economic Studies, 59, 581-593. [20] Kandori, M. and I. Obara (2006a) E ciency in Repeated Games Revisited: The Role of Private Strategies, Econometrica, 74, 499-519. [21] Kandori, M. and I. Obara (2006b), Less is More: An Observability Paradox in Repeated Games, International Journal of Game Theory, 34, 475-493. [22] Lehrer, E., D. Rosenberg, and E. Shmaya (2010), Signaling and Mediation in Games with Common Interest, Games and Economic Behavior, 68, 670-682. [23] Lehrer, E., D. Rosenberg, and E. Shmaya (2013), Garbling of Signals and Outcome Equivalence, Games and Economic Behavior, 81, 179-191. [24] Levin, J. (2003), Relational Incentive Contracts, American Economic Review, 93, 835-857. [25] Mailath, G.J., S.A. Matthews, and T. Sekiguchi (2002), Private Strategies in Finitely Repeated Games with Imperfect Public Monitoring, Contributions in Theoretical Economics, 2. [26] Maggi, G. (1999), The Role of Multilateral Institutions in International Trade Cooperation, American Economic Review, 89, 190-214. [27] Milgrom, P.R., D.C. North, and B.R. Weingast (1990), The Role of Institutions in the Revival of Trade: The Law Merchant, Private Judges, and the Champagne Fairs, Economics & Politics, 2, 1-23. [28] Miyahara, Y. and T. Sekiguchi (2013), Finitely Repeated Games with Monitoring Options, Journal of Economic Theory, 148, 1929-1952. [29] Myerson, R.B. (1986), Multistage Games with Communication, Econometrica, 54, 323-358. [30] Myerson, R.B. (1991), Game Theory: Analysis of Con ict, Harvard University Press: Cambridge. [31] Renault, J. and T. Tomala (2004), Communication Equilibrium Payo s in Repeated Games with Imperfect Monitoring, Games and Economic Behavior, 49, 313-344. 22

[32] Sekiguchi, T. (2002), Existence of Nontrivial Equilibria in Repeated Games with Imperfect Private Monitoring, Games and Economic Behavior, 40, 299-321. [33] Stigler, G.J. (1964), A Theory of Oligopoly, Journal of Political Economy, 72, 44-61. [34] Tomala, T. (2009), Perfect Communication Equilibria in Repeated Games with Imperfect Monitoring, Games and Economic Behavior, 67, 682-694. 23

Online Appendix: Dispensing with the Mediator This appendix considers repeated games with perfect monitoring without a mediator but with cheap talk communication. The formal model is that in every period a plain cheap talk extension of G is played (Gerardi, 2004). Informally, a plain cheap talk extension consists of a nite number of steps of communication, where at each step a set of senders, receivers, and possible messages is speci ed. 7 Theorem 4 Assume that plain cheap talk communication is available but a mediator is not. If n 6= 3, then for every strictly individually rational payo vector v in either the relative interior of F or the Pareto frontier of F, there exists < 1 such that if > then v 2 E (). The idea of the proof is to construct an assessment satisfying the following three properties: 1. From each player s perspective, her opponents strategies are totally mixed. 2. O -path histories are attributed to trembles in communication rather than in actions. 3. Mixing is determined by random variables observed by three or more players. Properties 1 and 2 ensure that players never believe that they will be asked to permanently minmax an opponent, as in the proof of Theorem 2. Property 3 ensures that players cannot strategically initiate punishment phases for each other, which might otherwise be pro table. Note that initiating a punishment phase is never pro table if all players have equivalent utilities, which explains why Property 3 and thus the assumption that n 6= 3 is not needed in this case. 8 Proof. If n = 2, no communication is necessary; see Fudenberg and Maskin (1986). Hence, suppose that n 4. We assume that each subset of players J I has an access to a randomization device whose outcome is public only among J. Let p J denote such a randomization device among J. Aumann and Hart (2003) show that this assumption is without loss of generality when cheap talk communication is available. We ignore integer problems where they can be easily addressed. Fix a discount factor < 1 and a strictly individually rational payo vector v 2 F such that v > u + 1 d: (4) We show that v 2 E () (we discuss payo vectors on the Pareto frontier of F at the end of the proof). As will be seen, on the equilibrium path, the players collaboratively draw an action pro le a from some joint distribution 2 (A). Given realization a, each player i independently mixes all actions uniformly with probability ", that is, she plays (1 ")a i + " P 1 a 0 i 2A i ja i j a0 i. 7 Gerardi also considers the more general notion of a cheap talk extension. The simpler notion of a plain cheap talk extension su ces here. 8 The formal proof for this case is available upon request. 24

For su ciently small " > 0, we can nd 2 (A) such that the expected payo from this procedure equals v; this is possible by the assumption that v 2 F. Let u ( ) = v. Let T be a number such that 1 v > u + T 1 +1d: (5) Consider the following strategy pro le. There are n + 1 states, a regular state R and a possible punishment state for each player i, P i, with i = 1; :::; n. 9 As will become clear, state transitions are coordinated, so that at the beginning of each period either all players are in state R or all players are in state P i for some i. Play begins in state R. For notational convenience, we let players draw randomization devices before they choose actions and communicate after they take actions. Thus, each period is broken into a randomization phase, an action phase, and an ex post communication phase. Let us de ne the action and transition rules in each state. Suppose that the players are in state R in period t. In the randomization phase, play is as follows. 1. The players select a target action pro le a R for that period with probability (a R ) by public randomization. 2. For each player i 2 I, we create n 1 groups of n 1 players. Each group excludes a player other than player i. For each j 6= i, let N(i; j) be the group excluding j 6= i; that is, N(i; j) consists of I n fjg. Each group draws two random variables. In particular, players I n fjg in group N(i; j) draw p l i;j;t 2 f1; :::; ja i jg and p m i;j;t 2 f1; :::; 1 g. They draw these random variables such " that p l i;j;t and p m i;j;t are uniform and jointly independent. Note that player j cannot observe the random variables of the group from which player j is excluded; that is, player j cannot observe p l i;j;t and p m i;j;t. Note that player i knows fp l i;j;tg j6=i and fp m i;j;tg j6=i. She calculates p l i;t = P j6=i pl i;j;t mod ja i j and p m i;t = P j6=i pm i;j;t mod 1 ". Following this randomization, in the action phase, player i takes a i;t = a R i if p m i;t 6= 1, and takes a i;t = a i (p l i;t) if p m i;t = 1. Here, a i (p l i;t) is the p l i;tth element of A i. Note that, for each player j 6= i, from player j s perspective, given the variables of the groups from which j is included (that is, p m i;k;t and pl i;k;t for each k 6= i; j), pl i;t is uniformly distributed over f1; :::; ja i jg and p m i;t is uniformly distributed over f1; :::; 1 g. " Finally, in the ex post communication phase, each group publicizes p m i;j;t and p l i;j;t. That is, each player k publicly reports l i;j;t (k) = p l i;j;t and m i;j;t (k) = p m i;j;t for each i and j such that I n fjg includes k. Now let us de ne the state transition. We say that player i is guilty if the following condition holds: 9 This is a slight abuse of notation, as it will become clear that if represented as an automaton the strategy pro le will actually have nt + 1 states. 25

For all j 6= i, there exist l i;j;t and m i;j;t such that l i;j;t (k) = l i;j;t and m i;j;t (k) = m i;j;t for all but at most one player k 2 I n fjg, and either m i;t 6= 1 and a i;t 6= a R i or m i;t = 1 and a i;t 6= a i (l i;t ), where l i;t = P j6=i l i;j;t mod 1 and m " i;t = P j6=i m i;j;t mod ja i j. 10 If there exists a unique guilty player j, then the next state is P j. Otherwise, the players remain in state R. Since all communication is public in the ex post communication phase and actions are perfectly monitored, state transitions are coordinated. Since we have nished de ning the strategy in state R, let us now turn to state P i. Suppose that the players are in state P i in period t. In the randomization phase, play is as follows. 1. Players i select a target action pro le a i i;t with probability ^ i (a i i;t) by a randomization that is public among i. Note that player i cannot observe a i i;t. 2. For player ' 6= i, we consider n 1 groups fn('; j)g j6='. As in state R, players Ifjg in N('; j) draw p l ';j;t 2 f1; :::; ja i jg and p m ';i;t 2 f1; :::; 1 g. Again, we assume " that the distributions are uniform and jointly independent. Knowing fp l ';j;tg j6=' and fp m ';j;tg j6=', player ' calculates p l ';t = P j6=' pl ';j;t mod ja ' j and p m ';t = P j6=' pm ';j;t mod 1. " Following this randomization, in the action phase, player ' 6= i takes a ';t = a i ';t if p m ';t 6= 1, and takes a ';t = a ' (p l ';t) if p m ';t = 1. Here, a i ';t is player ' s action in the target action a i i;t. On the other hand, player i takes the best response to players i s action distribution. Note that, from player i s perspective, given the variables of the groups in which i is included (that is, p m ';k;t and pl ';k;t for each k 6= '; i), pl ';t is uniformly distributed over f1; :::; ja i jg and p m ';t is uniformly distributed over f1; :::; 1 g. Hence, player i knows players i s action " distribution, and player i s action distribution is equal to the minimax one, ^ i (a i i), with probability at least 1 (n 1)". Hence, player i s expected payo in state P i may be made arbitrarily close to u i by taking " su ciently small. Finally, the ex post communication phase is as follows. In this phase, all communication is public. 1. As in state R, each group publicizes p m ';j;t and p l ';j;t. That is, each player k publicly reports l ';j;t (k) = p l ';j;t and m ';j;t (k) = p m ';j;t for each ' 6= i and j such that I n fjg includes k. In addition, players i publicize the target action a i i;t. That is, each player k 6= i publicly reports ^a i i;t(k) = a i i;t. 2. At the same time, let be the most recent period in which the state was s 6= P i. The players resend the messages that they are supposed to send in the ex post communication phase in period : If the most recent state was R, for each player i 2 I, player k publicizes l i;j; (k) = p l i;j; and m i;j; (k) = p m i;j; for each j such that I n fjg includes k. On the other hand, if the most recent state was P i 0, then for each ' 6= i 0, each player k publicizes l ';j; (k) = p l ';j;, m ';j; (k) = p m ';j;, and a i0 i 0 ; for each j such that I n fjg includes k. 10 Intuitively, we interpret m i;j;t and l i;j;t as the true realization of p m i;j;t and pl i;j;t, using (n 2)-majority rule, and de ne l i;t and m i;t analogously to the de nitions of p l i;t and pm i;t. 26