Belief-free Equilibria in Repeated Games

Belief-free Equilibria in Repeated Games Jeffrey C. Ely Johannes Hörner Wojciech Olszewski November 6, 003 bstract We introduce a class of strategies which generalizes examples constructed in two-player games under imperfect private monitoring. sequential equilibrium is belief-free if, after every private history, each player s continuation strategy is optimal independently of his belief about his opponents private histories. We provide a simple and sharp characterization of equililibrium payoffs using those strategies. While such strategies have desirable robustness properties, they are not rich enough to generate a folk theorem in most games besides the prisoner s dilemma, even when noise vanishes. Introduction (Infinitely) repeated games have widespread application in economics as simple and tractable models of ongoing strategic relationships between agents. The tractability of the repeated game model is due to its recursive nature: the game that begins at date t is identical to the game that was begun at date 0. powerful set of analytical techniques have been developed to characterize The present paper develops some ideas from Ely and Välimäki (000) and Hörner and Olszewski (00). We are indebted to Juuso Välimäki for his work on the project in early stages. We are also grateful to Marcin Peski for his help with removing a flaw from Matsushima (00), to Ichiro Obara for pointing out an important error in an early draft of Hörner and Olszewski (00), and to Rakesh Vohra for his suggestions on dynamic programming literature. The paper has been presented at the Canadian Economic Theory Conference in 003, and in workshops at Northwestern University, University of Chicago and Harvard-MIT. We are grateful to those audiences for comments. Department of Economics, Boston University. ely@bu.edu. Financial support from NSF grant #998546 is gratefully acknowledged. Department of Managerial Economics and Decision Sciences, Kellogg Graduate School of Management. j-horner@kellogg.northwestern.edu. Department of Economics, Northwestern University. wo@northwestern.edu.

behavior in repeated games. These methods exploit a recursive property of equilibrium made possible by the recursive structure of the repeated game. breu, Pearce, and Stachetti (990), Fudenberg, Levine, and Maskin (994) and others have applied these techniques to a special class of equilibria, referred to as public equilibria, in which the behavior of all players is contingent only on information that is publicly known. While this restriction rules out some sequential equilibria, there is an important class of economic environments in which the restriction entails little loss of generality. In particular, if all of the information player i obtains about the behavior of his rivals is also public information (these are games with public monitoring), then all pure-strategy sequential equilibria are observationally equivalent to public equilibria, see breu, Pearce, and Stachetti (990). Moreover, it is an implication of Fudenberg, Levine, and Maskin (994) that in games with public monitoring, all (pure or mixed) sequential equilibrium payoffs can be obtained by public equilibria, provided information is sufficiently rich and the players are sufficiently patient (this is the folk theorem.) Still, the restriction to games with public monitoring leaves out many potential applications. well-known example is a repeated oligopoly model in which firms compete in prices and neither these prices nor the firms sales are public information. This is the secret price-cut model of Stigler (964). For these repeated games with private monitoring, public strategies accomplish very little, and so to determine the equilibrium possibilities it is necessary to investigate strategies in which continuation play can be contingent on information held privately by the players. The difficulty appears to be that the recursive structure of equilibria is then lost, see Kandori (00). Recently however, some advances in the analysis of repeated games with private monitoring have made this obstacle appear less severe than at first glance. In the context of a two-player repeated prisoners dilemma, Piccione (00) and Ely and Välimäki (00) (hereafter VPE) identified a new class of sequential equilibrium strategies that can be analyzed using recursive techniques and showed that this class is sufficiently rich to establish a version of the folk theorem for that game. The key characteristic of these strategies is that when they are used, the optimal continuation strategy for a given player i is independent of that player s own history. This means that the history of the other player is a sufficient statistic for player i s payoffs and can thus be used as a state variable in a dynamic programming formulation of player i s optimization. In this paper, we look at two-player repeated games with private monitoring and consider strategies with exactly this property. We call the property belief-free because it implies that a player s belief about his opponent s history is not needed for computing a best-reply. Thus, the daunting complexity of tracking a player s beliefs over time to ensure that his strategy remains a best-reply is avoided for equilibria involving belief-free strategies. Belief-free strategies are attractive from the standpoint of robustness as well: because a player s beliefs are irrelevant, Moreover, the above results leave open the possibility that for a fixed discount factor or for information structures that are not sufficiently rich, some mixed-strategy sequential equilibrium may achieve more than any public equilibrium. In fact this possibility has been demonstrated by Fudenberg and Tirole (99) (exercise 5.0) and Kandori and Obara (003). See also Mailath, Matthews, and Sekiguchi (00) for similar examples in finitely repeated games.

the player need not worry whether he may have mis-interpreted his information about the past. Furthermore, equilibria depend only on the marginal distributions of each player s signal and not on the correlation. We demonstrate that these equilibria can be analyzed using recursive techniques which build on a variation of the concepts of self-generation and factorization due to breu, Pearce, and Stachetti (990). We say that a payoff vector (v, v ) is strongly generated by a set of vectors W W if there is a mixed action profile α from the stage game, and for each player i, continuation values can be selected from W i so that α i is a best reply for player i and results in total payoff v i no matter what action from the support of α i he expects his opponent to play. It is the last condition that we add to the breu, Pearce, and Stachetti (990) version of self-generation and this is the condition that captures the belief-free property of the equilibria we characterize. Let B(W ) be the set of all vectors strongly generated by W. We say that a set W for which W B(W ) is strongly self-generating and show that all members of a strongly self-generating set are belief-free equilibrium payoffs. Furthermore, the set of all belief-free equilibrium payoffs of a given game is itself a strongly self-generating set, indeed the largest such set. Finally, we characterize the structure of this largest strongly self-generating set W. In particular we show that iteratively applying the set-valued mapping B( ) beginning with the feasible set of payoffs results in a shrinking sequence of sets whose intersection is W. Furthermore, we show a version of the bang-bang principle for belief-free equilibrium payoffs. ny strongly self-generating set W of payoffs can be supported by belief-free strategies that are implementable by state automata whose only continuation payoffs are the extreme points of W, such as the strategies used in Ely and Välimäki (00). Next we consider two limiting cases: increasing patience and increasing monitoring accuracy. These are also the limits considered by VPE. For increasing patience (δ ), we show that the limiting set of equilibria can be easily characterized by a family of linear programs. This characterization is a version of the methods introduced by Fudenberg and Levine (994) and adapted by Kandori and Matsushima (998) to characterize the limiting set of public equilibria. s an example of this method, the maximum payoff for player i can be found by considering an auxiliary contracting problem where player i chooses a mixed action profile and gives utility penalties to player i as a function of observed signals in order to induce i to play a mixed action that results in as high as possible a net utility for i. We use this characterization to discover the boundaries of the belief-free equilibrium payoff set under the second limit, as the players monitor one another with increasing accuracy. We find a simple formula that can easily be computed by linear programming methods and apply it to a series of examples. These examples show that belief-free equilibria can support a large set of payoffs, but the prisoner s dilemma considered by VPE is apparently exceptional: in general this set is not large enough to establish the folk Fudenberg and Levine (994) analyzed public equilibria in games with long-run and short-run players, and Kandori and Matsushima (998) looked at equilibria of games with private monitoring in which all payoff-relevant behavior was conditioned on public announcements made by the players. 3

theorem. s a final application of our techniques, we consider the special case of games with independent monitoring. These are games in which, conditional on the chosen action profile, the players observe statistically independent signals. Whereas the folk theorem of VPE required signals to be nearly perfectly informative, recently Matsushima (00) demonstrated the folk theorem for the prisoner s dilemma with conditionally independent but only minimally informative signals. This was accomplished by augmenting the strategies used by Ely and Välimäki (00) with a review phase. We apply our results to provide a sufficient condition for equilibrium payoffs in arbitrary two action games with independent monitoring. The condition generalizes the result of Matsushima (00) and our techniques simplify the argument. The remainder of this paper is organized as follows. Section is a tour through the results of this paper by means of a worked example. In section 3 we introduce the notation used in the paper, present the definition of belief-free strategies, and establish some preliminary results. Section 4 introduces the concept of strong self-generation and uses it to characterize the set of belief-free equilibrium payoffs. The bang-bang result appears here. In section 5 we present our characterization results for discount factors near and near-perfect monitoring structures, and demonstrate their use with some examples. We also show here how our results can be applied to games with infinitely many strategies. Finally, section 6 takes up the case of independent monitoring and section 7 concludes. Introductory Example: Partnership Game To introduce the ideas and techniques of this paper, consider the following partnership game, a private-monitoring version of a game studied by Radner, Myerson, and Maskin (986), breu, Milgrom, and Pearce (99), Kandori and Obara (003). There are two players working on a common project. The two players simultaneously decide whether to work W or shirk S. Mixed actions are denoted α [0, ] and represent the probability of W. Each player realizes profit from the project which can be either high H or low L. Let m(σ a, a ) denote the marginal probability that player i realizes profit level σ {H, L} when the action (i.e. effort) choices are (a, a ) {W, S} {W, S}. Effort is costly, but raises the probability of high profits. Specifically, we assume that the cost and profit distributions are such that the players expected net payoffs, expressed as a function of the two players efforts are represented (perhaps after normalization) by the following table, where g and l are positive numbers. For simplicity, we shall suppose that the profit distributions depend only on the number of players exerting effort and not on their identity. Let p k denote the marginal probability that i realizes low profits when k {0,, } players choose W. Note that we are assuming that the individual players effort-conditioned profit distributions are symmetric and anonymous, but we make no assumptions on the correlation in profits across players. For notational convenience, define the following likelihood ratios r = p /p, and r 0 = p 0 /p, 4

W S W, l, + g S + g, l 0, 0 Figure : Normalized Partnership Game and values ( ) v = g r ( ) v = l r 0 We will assume that the likelihood ratios are sufficiently large that v > v. special case of this model is perfect correlation of the players realized profits. In this case, the (common) realized profit is always public information, and the game reduces to one of imperfect public monitoring. This is the case analyzed by breu, Milgrom, and Pearce (99) and these authors showed that v is the maximum symmetric perfect public equilibrium payoff. 3 We will now show using the techniques introduced in this paper that under our assumption on the likelihood ratios, for discount factors close enough to, this value can be achieved in a symmetric belief-free equilibrium for any degree of correlation in the players profits. Continuation Strategies and Belief-Free Equilibria In the repeated partnership game, in each period players choose whether to work or shirk as a function of their private history: their own history of past effort levels and realized profits. The continuation strategy used by player i starting in period t thus depends on the private history h i observed by player i. For each such continuation strategy, there is a set of best-reply continuation strategies for player i. In general because player i cannot infer h i from his own private history, i s best-reply at period t is based on an expected payoff calculation relative to i s conditional beliefs about h i. mong other difficulties, the need to track these conditional beliefs over time and across histories makes verification of sequential rationality an intractable problem in general. 4 We follow VPE by restricting attention to strategy profiles in which the set of best-replies for i is independent of the private history of player i. We call such strategies belief-free. In a sequential equilibrium involving belief-free strategies (belief-free equilibrium for short), there is a given set of actions t i such that player i is wiling to play any action in t i regardless 3 Kandori and Obara (003) also analyzed the public monitoring special case. They showed that for some parameter values, there are equilibria in non-public strategies with payoffs exceeding v. 4 See Kandori (00) for further discussion of the difficulties of characterizing sequential equilibrium in repeated games with private monitoring. 5

of the private history i has observed. Simply set t i to be the set of actions that are played by some optimal continuation strategy. Since the belief-free property implies that the set of optimal continuation strategies is history-independent, each element of t i will be optimal at each of i s t-length private histories. We refer to the set t = t t as the regime at period t. Belief-free equilibria can be classified according to the sequence of regimes,,.... For the partnership application, we will characterize the payoffs arising from a particularly simple class of belief-free equilibria: those in which the regime is constant over time and equal to the full set = {W, S} {W, S} of action profiles. Notice that in such a belief-free equilibrium every repeated game strategy is a best-response continuation strategy after every history. Continuation Values and Strong Self-Generation In any sequential equilibrium, for each player there is a set V i of potential continuation values. These are the payoffs from continuation strategy profiles that could arise after some finite history of play. In general, i s continuation value in period depends on the first period private histories of both players since continuation strategies do. In a belief-free equilibrium, the continuation value of player i can be treated as a function w i (a i, σ i ) only of the private history of i. That is because given the private history of i, each continuation strategy played by i is a best-reply and hence achieves the same value. This enables us to decompose the belief-free equilibrium strategy of player i into the initial mixed action α i and the continuation value function w i mapping private histories into V i. With this representation of i s strategy, we can determine the repeated game payoff of player i as a function only of i s first-stage action. Indeed, the first-stage action determines (together with α i ) the first-stage payoff as well as the distribution over (discounted) continuation payoffs. In order for the equilibrium to be belief-free with constant regime, it must be that both a i = W, S deliver repeated game payoff v i, i.e. v i = ( δ)u i (a i, α i ) + δ E(w i a i, α i ) where for any f : {W, S} {H, L} R we write E(f a i, α i ) = α i (a i )m(σ i a i, a i )f(a i, σ i ) a i {W,S} σ i {H,L} for the conditional expectation of f. In this case we say that v i is strongly generated by V i. We say that a set of payoffs V i is strongly self-generating for player i if every v V i is strongly generated by V i. The concept of self-generation was introduced by breu, Pearce, and Stachetti (990) to characterize equilibria of repeated games with imperfect public monitoring. In general, no such simple characterization is available for repeated games with private monitoring. Our strong version of self-generation adds to theirs the constraint that the player be indifferent among each action in the given regime. With this constraint added, we show (Proposition ) that strongly self-generating sets of payoffs are belief-free sequential equilibrium payoffs. The 6

proof of this result is a straightforward adaptation of the original proof for games with public monitoring. Note that to show that an interval V i is strongly self-generating, it suffices to show that the endpoints are strongly generated by V i. To see this, note that the constraints are linear. Thus if they are satisfied for some the pairs (α i, w i ) and (α i, w i), then they are satisfied for any convex combinations of these. If the endpoints of the interval are strongly generated, then taking convex combinations will deliver all payoffs in the interior of the interval. nalysis Just as in the public monitoring case, where self-generation has yielded many simple techniques for further characterizing equilibrium payoffs (see Fudenberg and Levine (994) and Kandori and Matsushima (998)), we provide a simple linear-programming recipe for finding strongly self-generating sets. To illustrate the technique in the context of the partnership example, consider the following static moral hazard contracting problem. Player will select a mixed action α and make transfer payments to player depending on both the realization of α and profit σ. Denote by x (a, σ ) the payment to player. Faced with this contract, player will select an action a and obtain utility u (a, α ) + E(x a, α ) The objective is to select α and x in order to provide player a given utility level v and to make player indifferent between W and S. This gives rise to the following constraints. v = u (a, α ) + E(x a, α ) for both a = W, S. Consider any v which can be provided in this way subject to the additional constraint that x (, ) 0. lso suppose that v > v can be provided when we impose the constraint that x (, ) 0. Let α and α be the corresponding mixed action choices for player. We can show that the interval [v, v ] is strongly self-generating for player when δ is close enough to. To this end, define continuation values w (a, σ ) = v + δ x (a, σ ) δ Note that if x (, ) 0 then for δ close enough to, each continuation value is within [v, v ]. Substituting into the constraints for providing v, we have v = u (a, α ) + δ δ [ v + E(w a, α )] dding δ δ v to both sides and multiplying by δ shows that v is strongly generated by [v, v ]. similar manipulation shows that v is strongly generated by [v, v ]. 7

With these results in hand, we can now show that v is a belief-free symmetric equilibrium payoff. Our assumption on the likelihood ratios guarantees that v < v and thus we need only show that v can be provided using non-positive transfers and v can be provided using nonnegative transfers. To implement v, set α = and x (W, H) = 0. With such a contract, a choice of W earns player utility + p x (W, L) and S obtains ( + g) + p x (W, L). Equating these gives x (W, L) = g/(p p ) and this provides utility ( ) p g p p which is another way of writing v. Similarly, to implement v, set α = 0, x (S, L) = 0 and x (S, H) = l/(p 0 p ). 3 Definitions and Preliminary Results We analyze two-player repeated games with imperfect monitoring. Each player i =, has a finite action set i and a finite set of signals Σ i. n action profile is an element of :=. We use W to represent the set of probability distributions over a finite set W, and P (W ) to represent the collection of all non-empty subsets of W. Let i and := represent respectively the set of mixed actions for player i and mixed action profiles. For each possible action profile a, the monitoring distribution m ( a) specifies a joint probability distribution over the set of signal profiles Σ := Σ Σ. When action profile a is played and signal profile σ is realized, player i observes his corresponding signal σ i. Let m i ( a) denote the marginal distribution of i s signal. Letting ũ i (a i, σ i ) denote the payoff to player i from action a i and signal σ i, we can represent stage payoffs as a function of mixed action profiles alone: u i (α) = α (a) m i (σ i a) ũ i (a i, σ i ). a σ i Σ i Repeated game payoffs are evaluated using the discounted average criterion. The players share a common discount factor δ <. Let U i (σ) denote the expected discounted average payoff to player i in the repeated game when the players use strategy profile σ. t-length (private) history for player i is an element of H t i := ( i Σ i ) t. pair of t-length histories (called simply a history) is denoted h t. Each player s initial history is the null history, denoted. Let H t denote the set of all t-length histories, H t i the set of i s private t-length histories, H = t H t the set of all histories, and H i = t H t i the set of all private histories for i. repeated-game (behavior) strategy for player i is a mapping s i : H i i. strategy profile 8

is denoted s. For history h t i, let s h t i denote the continuation strategy derived from s following history h t i. Specifically, if h i ĥ i denotes the concatenation of the two histories h i and ĥi, then s hi is the strategy defined by s hi (ĥi) = s(h i ĥ i ). Given a strategy profile s, for each t and h t i H t i let B i (s h t i) denote the set of continuation strategies for i that are best replies to s i h t i. Definition strategy profile s is belief-free if for every h t, s i h t i B i (s h t i) for i =,. It is immediate that every belief-free strategy profile is a sequential equilibrium. We will therefore speak directly of belief-free equilibria. Belief-free equilibria are robust to certain perturbations in monitoring structure. For example a belief free equilibrium for a given monitoring structure remains a sequential equilibrium for any monitoring structure with the same marginal distributions. This follows because for a given continuation strategy of player i the payoff to any strategy of player i depends only on the induced distribution over the signals of player i. Thus, the set B i (s h t i) depends only on the marginal distribution over these signals. This observation implies an additional robustness property. If given s i, s i is an optimal strategy which is belief-free, then it is still optimal even if player i s ability to monitor player i s action is randomly perturbed. Conversely, if player i s strategy is robust in this sense, then it must be belief-free. Observe that public strategies in games with (perfect or imperfect) public monitoring are belief-free (because private histories are trivial). Notice also that in the literature on private monitoring, the strategies used by VPE are belief-free. On the other hand, the strategies used by Sekiguchi (997), Bhaskar and Obara (00), Mailath and Morris (00) and Matsushima (00) are not. There always exists a belief-free equilibrium, since any history-independent sequence of static equilibrium action profiles is belief-free. 3. Regimes Suppose that s is an equilibrium belief-free strategy profile. It is convenient to describe belief-free equilibria in terms of the sets of optimal actions in a given period t: t i = {a i i : h t i H t i such that s i (h t i)[a i ] > 0} We refer to t = t i t i as the regime that prevails at date t. Denote the set of all regimes by J := P( ) P( ). Every belief-free equilibrium gives rise to a sequence of (non-empty) regimes { t }. 3. Exchangeability s we show in this section, belief-free equilibria satisfy an exchangeability property, similar to the exchangeability of Nash equilibria in two-person zero sum games. In particular, given two distinct 9

belief-free equilibria, each governed by the same sequence of regimes, we obtain a new belief-free equilibrium by pairing player s strategy in the first equilibrium with player s strategy in the second. This property will be used to show that for any given sequence of regimes, the set of belief-free equilibrium payoffs has a product structure. Proposition Let { t } be a sequence of regimes and let s, s be two belief-free equilibria with regime sequence { t }. Then the profiles (s, s ) and ( s, s ) are also belief-free equilibria with regime sequence { t }. Proof: We will show a stronger result. ny strategy z which adheres to the regime sequence { t }, (i.e. z (h t ) t for every h t H) t is a belief-free sequential best-reply to s, i.e. z h t B (s h t ) for all t, and h t H t. It suffices to consider pure strategies z. For each t = 0,,... there exists a history h t such that the mixed action s (h t ) assigns positive probability to the pure action z (h t ). This is because z (h t ) t and t is the regime governing s in period t. We define a (continuation) strategy z h t to be the strategy which begins by playing z (h t ) and thereafter reverts to s h t. Note that z h t B (s h t ) for all t, and h t H t. This is because s h t B (s h t ) and z h t differs from s h t only in the initial period in which it plays one of the actions assigned positive probability by s h t. Now we construct a sequence of strategies for player, z t for t = 0,,.... First we set z 0 = z. Next, we inductively define z t by z(h t τ ) = z t (h τ ) if τ < t and z t h t = z h t. Observe that z t h t = z h t B (s h t ) for all t, and h t H t. Now, z t+ h t differs from z t h t only by replacing its continuation strategies with z h t+. Since this cannot reduce the payoff, we have z t+ h t B (s h t ) and by induction z t+k h t B (s h t ) for all t 0, k 0, h t. () By construction, for all k 0, z t+k (h t ) = z (h t ) and thus for any fixed h t, the sequence of continuation strategies z t+k h t converges, as k to z h t, history-by-history, i.e. in the product topology. Because discounted payoffs are continuous in the product topology, () implies which is what we set out to show. z h t B (s h t ) for all t, and h t H t. s a corollary, we have that the set of all belief-free equilibrium payoffs for a given sequence of regimes is a product set. Corollary Let W ({ t }) be the set of all payoffs arising from belief-free equilibria using regime sequence { t }. Then W ({ t }) = W W for some subsets W R and W R. 0

4 Strong Self-Generation In this section we develop a generalization of the breu, Pearce, and Stachetti (990) concept of self-generation to characterize belief-free equilibrium payoffs. Definition Let W = W W R and =. Payoff vector v R is generated by W using if for each player i there is a mixture α i i and a continuation payoff function w i : i Σ i W i such that v i ( δ)u i (a i, α i ) + δ σ i Σ i for each a i i, with equality for each a i i. α i (a i )m i (σ i a i, a i )w i (a i, σ i ) () a i i When there exists α i and w i as in the definition, say that the pair (α i, w i ) enforce i and generate v i. Let B (W ) designate the set of payoff vectors generated by W using. If W B (W ), then say that W is strongly self-generating using. Note that B (W ) is defined only in terms of marginal signal distributions, without any requirement on the joint distributions. Proposition If W is strongly self-generating using, then each element of W is a belieffree equilibrium payoff with constant regime. Conversely, the set of all belief-free equilibrium payoffs using the constant regime is itself a strongly self-generating set. Proof: Let v = (v i, v i ) belong to W. We will show that player i has a strategy which randomizes over i after every history against which player i s maximum payoff is v i and that this payoff is achieved by any strategy which randomizes over i after every history. Since the symmetric argument implies the same conclusion with the roles reversed, these strategies form a belief-free profile. Since W B (W ), for each u W i there is a mixture αi u i and a continuation value function w u which satisfy (). Construct a Markovian strategy for player i as follows. The i state of the strategy will be the continuation value for player i. In any stage in which the state is u, player i will play mixed action αi u. When in state u and having played action a i, and observed signal σ i, player i will transit to state w i(a u i, σ i ). The initial state will be v i. It now follows from equation () and the one-stage deviation property that when the marginal distribution of i s signal is given by m i, any strategy for player i which randomizes over i after every history is a best-reply and achieves payoff v i. To prove the converse, consider any belief-free equilibrium s with regime. Write w i (h) = U(s h ) for the continuation payoff to player i after history h. Let W i be the set of all possible continuation values for i in the equilibrium s. Formally W i = {w i (h) : h H}.

Let us first observe that in a belief-free equilibrium, w i (h) depends only on h i. To see why, suppose w i (h i, h i ) > w i ( h i, h i ). Then s i is not a best-reply to s hi i h i implying s is not belief-free. We can thus write w i (h) = w i (h i ). Consider any date t + and history h t. Because s is governed by regime, the mixed action α i := s i (h t i) played by i s opponent after h t belongs to i. Let a i i. Then there is a best-reply continuation strategy ŝ i for i which plays a i after history h t i; moreover, the best-reply continuation strategy must induce a best-reply after each possible subsequent history h t+ i thus ŝ i is a best-reply at each h t+ i. Because the equilibrium is belief-free, U(ŝ i, s i h t+ ) = w i(h t+ i i ) for every h t+ i. We can write h t+ i = (h t i; a i, σ i ). In this way we can view w i (h t+ i ) as a continuation value function w i (h t i;, ) taking values in W i. The payoff to i from using ŝ i against s i (h t i) can thus be written ( δ)u i (a i, α i ) + δ α i (a i )m i (σ i a i, a i )w i (h t i; a i, σ i ) a i i σ i Σ i Since ŝ i is a best-reply against s i h t i, this is equal to w i (h t ). Moreover, since a i was an arbitrary element of i, this equality holds for all a i i. Finally since s is belief-free with regime, player i cannot achieve a greater continuation value with a strategy that begins with some action outside i. Thus, w i (h t ) must be greater than or equal to the above expression when a i / i. This shows that the pair (s i (h t i), w i (h t i;, )) enforce i and generate w i (h t ). Since h t was arbitrary, every element of W i can be so generated. pplying the same argument for player i shows that the set W = i= W i is strongly self-generating. Now let W be the union of all continuation values occurring along histories of all belief-free equilibria with regime. The set W is the union of strongly self-generating sets and is therefore strongly self-generating. The previous proposition characterized belief-free equilibria with a single regime prevailing in every period. Now suppose that W is strongly self-generating using some regime, and consider the set B (W ). Each element of this set can be generated by a mixed action profile in and continuation payoffs in W. It follows that any such payoff can be sustained by a belief-free equilibrium which begins in regime and remains in regime thereafter. We can construct more and more belief-free equilibrium payoffs by considering arbitrary sequences of regimes in this manner. Rather than pursue this directly however, we shall proceed in a slightly different way by allowing the players access to a public randomization device over regimes. t first glance, it may seem that public randomization and private monitoring do not go together, but it will be clear that public randomizations are used as a substitute for sequences of regimes in a manner similar to the case of perfect monitoring where public randomizations substitute for transitions among mixed action profiles.

4. Public Randomization We will suppose that in each period, the players publicly observe the outcome of a lottery over the set of regimes. The interpretation is that when a certain regime is realized in a certain period, the players will play action profiles from. Of course in equilibrium it must be optimal for a given player i to select his action from i. When the players have access to such a public randomization device a strategy now depends on private history as well as the sequence of realizations of the public randomization. The definition of belief-free equilibrium must also be appropriately modified. In particular, the set of best-replies can depend on the regime, but not on the private history. Formally, a public history of length t is a sequence y t = (,..., t ) Y t of regimes representing the outcomes of the public randomization device over the first t periods. strategy is now a mapping s i : t Hi t Y t i which specifies the mixed action to play for each public/private history pair. Continuation strategies s i (h t i,y t ) and best-reply continuation strategies B i (s (h t i, y t )) are defined as before. Definition 3 In the presence of a public randomization device, a strategy profile s is belief-free if for all t, y t t, and h t i H t i, s i (h t i,y t ) B i(s (h t i, y t )) for every h t i H t i. Strong self-generation is now defined with respect to a fixed public randomization over regimes. Definition 4 Let p J be a public randomization over regimes and W = W W a set of continuation payoffs. Say that v is p-generated by W if for each i and for each, there exist a mixture α i i and continuation value function wi : i Σ i W i such that for each a i : J i v i p() ( δ)u i (a i (), α i) + δ α i(a i )m i (σ i a i (), a i )wi (a i, σ i ) J a i i σ i Σ i (3) with equality if a i () i for each. We write v B p (W ) if v is p-generated by W. We will also abuse terminology and say that α i and wi generate v i. The set W is strongly self-generating if W B p (W ) for some public randomization p. The following proposition provides an algorithm which can be used to compute the set of all strongly self-generating payoffs for a given p. Proposition 3 For each p, there exists a maximal strongly self-generating set W. W is compact and convex. Let V be the convex hull of the feasible, individually rational payoffs. Set W 0 = B p (V ), and inductively W τ = B p (W τ ). Then W τ W τ for each τ and W = τ 0 W τ 3

Proof: The following observation will be used repeatedly in the proof. For any two sets W, W such that W W, if v is generated by W then v is also generated by W. Thus B p ( ) is monotonic in the sense that B p (W ) B p (W ) for W W. The union of strongly self-generating sets is itself strongly self-generating. This shows the existence of a maximal set W. We will first show that B p (W ) is convex. Let v, v be elements of B p (W ). By the definition of B p (W ), for each i and there are associated mixed actions α i(v), α i(v ) and continuation value functions wi (v) and wi (v ) used to generate v i and v i respectively. By the linearity of the inequalities in the definition of strong self-generation, any convex combination of v i and v i is generated by the corresponding convex combinations of α i(v) and α i(v ) and wi (v) and wi (v ). Since W B p (W ), monotonicity of B p ( ) implies B p (W ) B p (B p (W )) implying that B p (W ) is also strongly self-generating. But since W is maximal, B p (W ) W so that W = B p (W ). Thus, W is convex. Obviously W 0 V. Using monotonicity we conclude that W W 0 and inductively that W τ W τ. We now show that the operator B p ( ) preserves compactness. Obviously B p (W ) is bounded whenever W is. To show that B p (W ) is closed for compact W, let v r be a sequence of elements of B p (W ) with lim r v r = v. Then for each i and each regime, there are corresponding sequences (α i) r and continuation value functions (wi ) r taking values in co(w i ) used to generate vi r. By the compactness of i and co(w i ) there is a subsequence of r s such that (α i) r α i and (wi ) r (a i, σ i ) wi (a i, σ i ) co(w i ) for each of the finitely many pairs (a i, σ i ). By continuity, α i and wi generate v i, and therefore v B p (W ). Since V is compact and B p ( ) preserves compactness, W 0 and by induction each W τ are compact. Thus W τ is a nested sequence of compact sets and the set W := τ 0 W τ is compact. We now show that W is strongly self-generating. Let v W. Then for every τ there exist (α i) τ and (wi ) τ : i Σ i Wi τ which generate v i. We claim that there exist limit continuation values wi (a i, σ i ) W such that (passing to a subsequence if necessary) (wi ) τ (a i, σ i ) wi (a i, σ i ) for each pair (a i, σ i ). If not, then there exist a pair (a i, σ i ) and an open neighborhood U of W such that (wi ) τ (a i, σ i ) / U for infinitely many τ. This implies that U W τ for these τ. But U W τ is an infinite family of nested, non-empty, compact sets and hence, τ 0 [ U W τ ] = U τ 0 W τ = U W which is a contradiction. 4

gain, by continuity v i is generated by lim(α i) τ (passing to a further subsequence if necessary) and w i. Thus v B p (W ) and we have shown that W is strongly self-generating. It remains only to show that W includes W, which by the maximality of W would imply that W = W and conclude the proof. Since W V, we have W = B p (W ) W 0 and by induction W W τ for every τ. Thus W W. We can adapt the proof of Proposition to show that W is exactly the set of belief-free equilibrium payoffs using public randomization p. Proposition 4 Fix a public randomization p. The corresponding set W is equal to the set of all belief-free equilibrium payoffs using the public randomization p. s a corollary we can further characterize the structure of belief-free equilibrium payoff sets. Corollary For any given public randomization p, the set of all belief-free equilibrium payoffs using p is the product of closed intervals. Proof: Proposition and Corollary generalize immediately to a public randomization device, and the statement then follows from Corollary, Proposition 3, and Proposition 4. 4. Bang-Bang In this section we show that belief-free equilibrium payoffs can always be obtained by simple strategies that can be represented by a two-state automaton. s a corollary we obtain a version of the traditional bang-bang result that a set is strongly self-generating if and only if the set generates its extreme points. machine strategy for player i is defined as follows. There is a set of states Θ, and for each state θ Θ there is a behavior rule α θ : P() i and a transition rule φ θ : P() i Σ i Θ. The interpretation is as follows. When in state θ, if the outcome of the public randomization is regime, the strategy plays α θ (). Then, if the action a i is realized and the signal σ i is observed, a new state is drawn from the distribution φ θ (, a i, σ i ) and the strategy begins the next period in that state. Imagine that i were playing such a machine strategy and player i was informed each period of the state. Then we could compute the value v(θ) to player i of being in state θ. Furthermore, the transition rule would imply for each state the continuation value function w as follows i w i (a i, σ i ) = θ Θ v(θ ) φ θ (, a i, σ i )[θ ] Note that in the proof of Proposition we construct such a machine strategy whose continuation value functions replicate the continuation value functions used in the definition of strong 5

self-generation. In fact as we now show it is always possible to implement a belief-free equilibrium payoff using machine strategies which have only two states. 5 The values of the two states in i s machine correspond to the maximum and minimum belief-free equilibrium payoffs of player i. Proposition 5 Consider intervals V i = [v i, v i ] for i =,. Let ρ be a public randomization over regimes. Suppose for each i that v i is p-generated by V i using α i and w i and similarly for v i using ᾱ i and w i. Then every element v of V V is the payoff to a belief-free equilibrium in which each i plays a machine strategy with two states {θ, θ} whose behavior rules are α i, ᾱi respectively and whose derived continuation value functions are w i, w i, respectively. Proof: dapt the Markovian strategy used in the proof of Proposition as follows. There are only two states, corresponding to values v i and v i. Now, when then continuation value should be w [v i, v i ], player i will randomly transit to states v i and v i with probabilities q and q where w = qv i + ( q) v i. Finally, player i determines his initial state by similarly randomizing over the two states with probabilities calculated to provide initial value v i. The equilibria provided by proposition 5 satisfy a stronger robustness property than belieffree alone. To see this suppose that ᾱ i and α i are pure actions for each regime. In the initial period i is determining his state, and hence his initial action by randomizing. Regardless of the realization of this randomization, player i is indifferent over his actions in the regime. Thus, the equilibrium strategy of player i remains a best-reply even if before play i could observe the outcome of i s private randomization. 5 Characterizing Belief-Free Equilibrium Payoffs for δ close to 5. General Information Structure The preceding results can be used to determine the set of belief-free equilibrium payoffs for any given discount factor δ. When instead we are interested in the limit of these sets as δ approaches, a simpler characterization is available as we describe in this section. To get a feel for the techniques presented here, consider a fixed regime, and for the moment a given discount factor δ. Let W (δ) be the set of belief-free equilibrium payoffs given δ, using the constant regime. From previous results, we know that W (δ) is the product of closed intervals, so W (δ) = W (δ) W (δ). Let us write W (δ) = [w, w]. Because W (δ) is strongly self-generating (by Proposition 4), both w and w are generated by W (δ). Furthermore, w is the maximum value w generated by the interval [w, w], for if 5 In a related context, Kandori and Obara (003) also examine conditions under which two-state automata are sufficient. 6

w > w were generated by [w, w], then by Proposition 5, the set [w, w] W (δ) would be strongly self-generating, contradicting the definition of W (δ). Thus, w solves the following optimization problem w = max w such that for some α and w : Σ R [ ] w ( δ)u (a, α ) + δ α (a )m (σ a, a )w (a, σ ) a σ Σ for each a, with equality for each a. w (a, σ ) w a, σ w (a, σ ) w a, σ. Similarly, w is the minimum value w generated by the interval [w, w], and is thus the solution to the corresponding minimization problem. Conversely, the solutions to these problems characterize the boundaries of W (δ) whenever it is non-empty. Observe that for δ close to, the differences in continuation values required to satisfy the incentive constraints in the above problem can be made arbitrarily small. Thus, when w > w, the last constraint can always be satisfied when δ is close enough to. s a further simplification, let us define x (a, σ ) = δ [w δ (a, σ ) w] 0, substitute out for w (, ), and rewrite the maximization as follows. w = max w such that for some α and x : Σ R w u (a, α ) + α (a ) m (σ a, a ) x (a, σ ), a σ Σ with equality if a. We can interpret this characterization as follows. Player will select a mixed action α and levy fines x (a, σ ) on player depending on the realized action and signal. The objective is to induce player to select any action in and to provide the maximum total utility to player in the process. That maximum will turn out to be player s maximum belief-free equilibrium payoff using regime when δ is close to. To find the minimum, we analyze the corresponding minimization problem with the difference that player will offer bonuses rather than fines. In what follows, we formalize and extend this analysis to the case of an arbitrary public randomization over regimes and use it to characterize the set of all belief-free equilibrium payoffs for δ close to. For given = i i, define Mi as follows: 7

v i u i (a i, α i ) + Mi = sup v i such that for some (4) α i i x i : i Σ i R a i i σ i Σ i α i (a i ) m i (σ i a i, a i ) x i (a i, σ i ), for all a i with equality if a i i. Similalry, define m i as follows: v i u i (a i, α i ) + m i = inf v i such that for some (5) a i i α i i x i : i Σ i R + σ i Σ i α i (a i ) m i (σ i a i, a i ) x i (a i, σ i ), for all a i with equality if a i i. Here and in what follows, sup = and inf = +. The solutions to these linear programs will be used to provide tight bounds on the sets of belief-free equilibrium payoffs. 6 The set of payoffs inside the bounds will be shown to be strongly self-generating and therefore belief-free. On the other hand, it will be shown that any belief-free equilibrium value will be in the set. The following preliminary result is useful in the sequel. Lemma Every v i < M i is feasible for (4) and every v i > m i is feasible for (5). Let us order the regimes as J = {,..., J}. For i =,, let M i = ( ), Mi,..., Mi J and m i = ( ). m i,..., m J i Let M i = Mi m i and M i = M i m i. We distinguish the following three cases.. (The positive case) There exists p 0 such that M i p > 0, i =,.. (The negative case) There exists no p 0 such that M i p 0, i =,, with at least one strict inequality. 3. (The abnormal case) There exists no p 0 such that M i p > 0, i =,, but there exists p 0 such that M i p 0, i =,, with one strict inequality. 6 s formulated, these programs are not linear in (α i, x i ); they are however linear in (α i, y i ), where y i = α i x i : i Σ i R or R +. 8

Observe that which case obtains depends both on the stage game payoffs and on the monitoring structure. For a given stage game, let V be the limit of the set of (belief-free) equilibrium payoffs when δ. We claim that: Theorem V is a convex polytope. In. the positive case, V = {p 0: Mi p 0,i=,,p=} [m i p, M i p] ; (6). the negative case, V is the convex hull of the Nash equilibria of the bimatrix game; Proof: Consider the positive case. We first show that the right-hand side of (6) is included in V. We pick for each payoffs v, v, v, v and a public randomization p over regimes such that m i < v i Mi > v i and i=, p ( v i v i ) > 0 i =, (7) where e.g. v i = (v i ) J. Existence is guaranteed by the positive case. Define z i = p v i, z i = p v i. We will show that there exists δ < such that for all δ ( δ, ), the set U defined by U = co {z i, z i } i=, is strongly self-generating. Because the right-hand side of (6) is the closure of the union of all such sets U, this will prove its inclusion in V. By Proposition 5, it is enough to show that each of the extreme values is generated by the convex hull. Consider v i. Since Mi > v i, by Lemma, v i is a feasible value for (4) and hence there exists ᾱ i i and x i : i Σ i R such that for every a i () i ( ) v i u i ai (), ᾱ i + a i i σ i Σ i ᾱ i(a i )m i (σ i a i (), a i ) x i (a i, σ i ), (8) with equality if a i () i. (9) For each we multiply the above inequality by p() and then sum across regimes to find that for all strategies {a i ()} J, z i J ( ) p() u i ai (), ᾱ i + a i i ᾱ i(a i )m i (σ i a i (), a i ) x i (a i, σ i ), σ i Σ i with equality if a i () i for each. (0) 9

Define w i (a i, σ i ) = z i + δ x i (a i, σ δ i ) () substitute into (0), and re-arrange to obtain z i ( ) p() ( δ)u i ai (), ᾱ i + δ ᾱ i(a i )m i (σ i a i (), a i ) w i (a i, σ i ), J σ i Σ i a i i with equality if a i () i for each. Because x i (, ) 0, it follows from () and (7) that for all δ exceeding some δ <, w i (, ) belongs to U i. We have therefore shown that the extreme point z = ( z, z ) is generated by U. The symmetric derivation shows that z is generated by U. Let α i, and w i the corresponding strategies and continuation value functions. Now it is easily verified that e.g. the value (z i, z i ) is generated using α i, w i and ᾱ i, w i. Next we show that the right-hand side of (6) includes V. Observe that ( δ) Mi + δ V solves max ( δ) u i (a i, α i ) + δ α i (a i ) m i (σ i a i, a i ) V i (a i, σ i ) () α i i V i : i Σ i R a i i σ i Σ i subject to, for all a i i, a i i (with equality if a i i ), ( δ)u i (a i, α i ) + δ α i (a i ) m i (σ i (a i, a i )) V i (a i, σ i ) a i i σ i Σ i ( δ) u i (a i, α i ) + δ α i (a i ) m i (σ i (a i, a i )) V i (a i, σ i ) σ i Σ i Similarly, ( δ) m i a i i V i (a i, σ i ) V for all (a i, σ i ) i Σ i + δv solves min ( δ) u i (a i, α i ) + δ α i i V i : i Σ i R a i i (3) σ i Σ i α i (a i ) m i (σ i a i, a i ) V i (a i, σ i ) (4) subject to, for all a i i, a i i (with equality if a i i ), ( δ)u i (a i, α i ) + δ α i (a i ) m i (σ i a i, a i ) V i (a i, σ i ) a i i σ i Σ i ( δ) u i (a i, α i ) + δ α i (a i ) m i (σ i a i, a i ) V i (a i, σ i ) σ i Σ i a i i V i (a i, σ i ) V for all (a i, σ i ) i Σ i (5) 0