Finite State Equilibria in Dynamic Games

Size: px

Start display at page:

Download "Finite State Equilibria in Dynamic Games"

Godwin Nash
5 years ago
Views:

1 Fnte State Equlbra n Dynamc Games Mchhro Kandor Faculty of Economcs Unversty of Tokyo Ichro Obara Department of Economcs UCLA June 21, 2007 Abstract An equlbrum n an nfnte horzon game s called a fnte state equlbrum,feachplayer sactonon the equlbrum path s gven by an automaton wth a fnte state space. We provde a complete characterzaton of ths class of equlbra and provde a recursve computatonal method to check the equlbrum condtons. Ths encompasses the majorty of exstng works on repeated games wth prvate montorng. 1 Introducton An equlbrum n an nfnte horzon game s called a fnte state equlbrum, f each player s acton on the equlbrum path s gven by an automaton wth a fnte state space. We provde a complete characterzaton of ths class of equlbra and provde a recursve computatonal method to determne f a gven profle of fnte automata (one for each player to determne hs acton on the path of play) can consttute a sequental (thus a fnte) equlbrum. A major area of applcaton concerns repeated games wth prvate montorng, and the present paper provdes a unfyng general theory to encompass the majorty of the exstng work as most of them are based on some form of fnte state equlbra. The belef-based approach by Sekguch [9], Bhaskar and Obara [1] consder trgger strateges on the path of play, hence fnte state equlbra. The present paper can be regarded as a generalzaton of those papers. The belef-free approach by Ely and Välmäk [3] consders an equlbrum whch can be mplemented by a fnte state automata on and off e-mal: kandor@e.u-tokyo.ac.jp e-mal: obara@econ.ucla.edu 1

2 the path of play. Proposton 4 n Ely, Horner and Olszewsk [2](bang-bang property) shows that most of belef-free equlbrum payoffs (ndeed all of them f the dscount factor s close to 1) can be obtaned by a fnte state equlbrum wth only two states. Matsushma s revew strategy equlbra [6] and Hörner and Olszewsk [4] also employ fnte equlbra. An excepton whch does not employ a fnte equlbrum s Pccone [8], whose equlbrum path requres nfnte (countably many) states. However, the result by Ely, Horner and Olszewsk shows that Pccone s equlbrum payoff can be obtaned by a fnte state equlbrum. A more recent paper by Phelan and Skrzypacz [7] also proposes an algorthm to compute a class of statonary fnte state equlbra. Later we dscuss dfferences between ther approach and our approach after presentng our theorem. 2 Model Ths paper consders nfnte horzon stochastc games wth prvate montorng. The stage game s defned by players, actons, types, prvate sgnals, and payoffs. Let N = {1,...,n} be the set of players. For each player, there s a fnte set of possble types Θ and a fnte set of actons A. Actons are not observable. Player observes a prvate sgnal s S. Let π (s a) be the probablty of prvate sgnal profle s gvenactonprofle a. We assume that, for each player, the margnal dstrbuton of π on s s a full support dstrbuton for every a A. Ths mples that the standard mperfect publc montorng games ft nto our framework, and we can apply our method to the prvate strategy equlbra (Kandor and Obara [5]) n such a model. Player s payoff s determned by hs acton, hs prvate type, and hs prvate sgnal. Denote player s payoff by u (θ,a,s ). Let g (θ,a) = P s u (θ,a,s ) π (s a) be player s expected payoff gven a when hs type s θ. Playersplaythsstagegamefornfnte number of perods. Let z 4Θ be the ntal jont dstrbuton on prvate types. Ther types θ = (θ 1,...,θ n ) Θ follow a Markov process characterzed by the transton functon p θ 0 θ,a [0, 1] such that P θ 0 p θ 0 θ,a =1for all (θ,a). Player s perod t hstory s h t = θ 1,a 1,s1,...,θt 1,a t 1,s t 1 H t. Player s pure strategy σ Σ s a mappng from t=1 Ht to A. Prvate strategy profle σ, combned wth z, generates a probablty measure on the set of nfnte sequences of prvate types and actons. Player s expected dscounted average payoff V (σ,z)=(1 δ) E[ P t=1 δt 1 g θ t,a t σ,z] s computed wth 2

3 respect to ths measure. We analyze a sequental equlbrum of ths nfnte horzon stochastc games wth prvate montorng. Ths model ncludes many type of economc models. As already mentoned, ths model becomes a repeated game wth prvate montorng when there s no type. If types are drawn only once n the begnnng and fxed throughout the game (p (θ θ,a)=1for all (θ,a)), then t becomes the standard model of reputaton wth one-sded or two-sded ncomplete nformaton. If the type s drawn ndependenty across tme and players, then t becomes s the standard model of repeated adverse selecton. Note that our model also allows the mddle range: the case where types are nether perfectly persstent nor perfectly ndependent over tme. Wefocusonaclassofequlbrawhereequlbrum behavor can be descrbed by fnte state automatons. An automaton conssts of three components: states, a transton rule, a mappng from states to actons for each type. Let Q (θ ) be the fnte set of states for player whosetypesθ. Suppose that player s current type s θ and the next-perod type s θ 0. The probablty that player s n state q 0 Q θ 0 n the next perod depends on hs prvate sgnal s and hs current type-state par (θ,q ). Ths probablty s denoted m θ0 (q 0 θ,q,s ) 4Q θ 0. A mappng f : θ Θ Q (θ ) A descrbes player s play for each type-state par of player. Notethatwe can assume that players choose a pure acton n each state wthout loss of generalty. For example, when player plays acton a 0 and a wth an equal probablty at state θ, we can splt θ nto two states θ 0 and θ 00 and assume that a 0 (a00 )splayednθ0 (θ 00 ) (and the mxng probabltes can be encoded n the state transton µ probabltes). Thus an automaton for n o player s defned as M = Q (θ ),m θ,f. Ths automaton θ Θ, does not descrbe player s behavor completely for two reasons. Frst, t does not specfy whch state to start. We allow players to use a correlaton devce, whch s ndependent of θ, to generate the ntal jont dstrbuton over states. The dea s that a medator sends a prvate message to each player, whch players use to determne ther ntal state combned wth ther type realzatons. Ths generates an ntal dstrbuton μ over the space of type-state par profles: Z = Π θ Θ (θ Q (θ )). Secondly, note that each automaton does not specfy whch acton to choose when player does not follow the course of acton suggested by the automaton. Note that aspecfcaton of the actons on the path of play corresponds to a reduced normal form strategy. A reduced normal form strategy for player specfes the sequence of actons of player for any strategy profle of the opponents. 3

4 Ths comes from our assumpton that prvate sgnal s has full support. 2.1 Belef-Based Strategy and Fnte State Equlbrum Fx a profle of automatons for every player but player. Gven these automatons, player can compute hs belef about the other players type-state par profle at every prvate hstory. Denote player s belef by μ 4Z where Z = Π j6= θj Θ j (θ j Q j (θ j )). Consder a functon ρ : 4Z A. Now ths functon defnes a full-fledged strategy gven any ntal belef. We call such a strategy belef-based strategy. If we combne player s automaton wth any belef-based strategy ρ by assumng that player s behavor follows ρ once he devates from M, then we have a complete strategy for player. Denote ths strategy for player by σ (M, ρ ). For a gven profle of fnte state automaton (M 1,...,M n ), suppose that we can fnd a profle of belef-based strateges (ρ 1,...,ρ n ) such that (σ 1 (M 1, ρ 1 ),...,σ n (M n, ρ n )) s a (correlated) sequental equlbrum wth some ntal dstrbuton μ on Z. We call such equlbrum a fnte state equlbrum. Ths equlbrum noton generalzes mxed trgger strategy equlbra used n Sekguch [9] and Bhaskar and Obara [1]. 3 Belef-Based Approach to Fnte State Equlbrum We ask the followng queston: when does a partcular profle of fnte state automatons generate a fnte state equlbrum? When players behavor s based on fnte state automatons, ther contnuaton strateges are completely summarzed by ther types and states. Therefore what s relevant to player s decson at each pont of tme s hs belef over Z. Snce Z s a fnte set, player s belef s always confned nthefnte dmensonal smplex. Ths makes our analyss and computaton easer. Our approach to attack ths queston s as follows. Snce player cares only about the other players types and states, player s decson problem becomes a dynamc programmng problem where the state varable s hs belef over Z. Solvng ths dynamc programmng problem, we can derve the optmal acton correspondence from player s belef to player s optmal actons. From ths correspondence, we generate a set of sequentally optmal belef-based strateges. Then we check consstency, namely, whether an acton assgned by an automaton s ndeed optmal at every hstory on the equlbrum path. To do so, we need 4

5 to make sure that, when player s state s q and a = f (q ), player s belef on Z scontanednthesetofbelefswherea s optmal, whchever prvate hstory has brought player to state q. We start wth a relatvely smple case wth no type. That s, the underlyng game s a repeated game wth prvate montorng. In ths case, an automaton for player s smply M =(Q,m,f ). Fx a set of automaton (M 1,..., M n ). Suppose that player s belef s μ ht 4Z n perod t. Ths determnes the dstrbuton of a n the current perod. Player s belef n the next perod μ ht+1 ³ s affected by hs acton and prvate sgnal n the current perod. Let V t μ ht be player s optmal payoff gven μ ht. Snce stage games are bounded above and there s dscountng, V t satsfes the followng dynamc programmng equaton. ³ µ V t μ ht =max(1 δ) E μ h t [g (a,f (q ))] + δe μht [V t+1 μ ht+1 a A a ] where μ ht+1 μ ht+1 (q )= Z s computed by Bayes rule as follows: P P q 0 s t μht (q0 )π s t a t,f Q ³ q 0 j6= m j P q 0 In the statonary form, ths becomes P s t μht (q0 )π s t a t,f q j qj 0,st j. q 0 V (μ )=max a A (1 δ) E μ [g (a,f (q ))] + δe μ [V (χ [μ,a,s ]) a ] where χ : 4Z A S 4Z s gven by P q Ps χ [μ,a,s ](q )= 0 μ (q 0 )π Q ³ s a,f q 0 j6= m j q j qj 0,s j Ps μ (q 0 )π s a,f q 0 P q 0 We call ths functon V (μ ) a belef-based value functon. By solvng ths dynamc programmng problem, we can obtan a correspondence from player s belef to player s optmal actons. Let τ : A 4Q be the correspondence defned by μ τ (a ) fandonlyfa s one of the optmal actons gven μ. Thus τ s the (lower) nverse of the optmal polcy correspondence, defned for each a A. We lke to verfy whether player s automaton M assgns optmal actons on the equlbrum path. For M to be optmal, then player s belef μ 5

6 at each q needs to satsfy μ τ (f (q )). Note that player s belef at each state s not unque because dfferent hstores can lead player to the same state. Let B (q,z) be the set of belefs whch may arse at state q when players are playng accordng to M and the ntal jont dstrbuton s z 4Z. Then we have the followng theorem. Theorem 1 A fnte-state automaton profle M generates a fnte state equlbrum wth ntal jont dstrbuton z 4Q f and only f B (q,z) τ (f (q )) for all q Q and. Proof. Suppose that M generates a fnte state equlbrum. If B (q, μ 1 ) / τ (f (q )) for some q and, then there exsts some prvate hstory h t after whch player s state s q and player s belef μ h t, μ 1 s not n τ (f (q )). Then f (q )=a cannot be an acton to satsfy sequental ratonalty at ths hstory, ths s a contradcton. Conversely, suppose that B (q, μ 1 ) τ (f (q )) for all q Q and. Let ρ : 4Q A be any selecton from the optmal polcy correspondence of the above dynamc programmng problem for player. Consder acomplete strategy profle σ =(σ 1 (M 1, ρ 1 ),..., σ n (M n, ρ n )). Ths strategy satsfes one-shot devaton constrants off the equlbrum path by defnton. Oneshot devaton constrants on the equlbrum path are also satsfed because B (q, μ 1 ) τ (f (q )) for all q Q and. Snce all one-shot devaton constrants are satsfed, ths strategy profle consttutes a sequental equlbrum. Remark. 1. Belef-free equlbrum s ncluded n ths class of equlbra. In partcular, τ (f (q )) = 4Q for all q holds for belef-free equlbra,.e. the acton assgned to player (and player s contnuaton strategy) on the equlbrum path s sequentally optmal ndependent of hs belef about the other players states. 2. Phelan and Skrzypacz [7] analyzes a smlar problem, but the followng ponts dstngush our approach from thers. They consder the class of sequental equlbra that can be represented by a profle of fnte state automatons on and off the equlbrum path. On the other hand, we only requre that some fnte state automaton profle descrbes on the equlbrum path behavor. Contnuaton strateges off the equlbrum 6

7 path are gven by belef-based strateges whch are not necessary fntestate automatons. Ths allows us to deal wth belef-based equlbra n Sekguch [9] or Bhaskar and Obara [1]. The example of the next secton has the property that the mxture of the grm trgger strategy and permanent defecton can be played on the path of play of some equlbra, but they cannot specfy the optmal acton off the path of play (f Phelan and Skrzypacz [7] s method s appled to ths class of strateges, the only equlbrum t can dentfy s the permanent defecton). Ths generalzes to the case where players have prvate types. Let μ be player s current belef over Z, whch determnes the dstrbuton of a n the current perod. Player s belef n the next perod depends on hs acton, hs prvate sgnal, and hs type n the next perod. Denote ths belef by μ 0 (μ, θ 0,a,s ). Then player s dynamc programmng problem for θ becomes V θ (μ )=max(1 δ) E μ [g (θ,a,a (q ))]+δe μ [V θ0 χ [μ, θ 0,a,s ] a ] a A Note that there are as many dynamc programmng problems as the number of types because types are payoff relevant. Solvng these programmng problems, we can obtan τ θ : A 4Z, θ Θ. Let B (θ,q,z) be the set of belefs whch may arse at type-state par (θ,q ) θ Θ θ Q (θ ) when players are playng accordng to M and the ntal jont dstrbuton s z 4Z Theorem 2 A fnte-state automaton profle M generates a fnte state equlbrum wth ntal jont dstrbuton z 4Z f and only f B (θ,q,z) τ θ (f (q )) for all (θ,q ) θ Θ θ Q (θ ) and. 4 More Analytcal Approach to Fnte State Equlbrum Our Theorems 1 and 2 provde numercal recursve methods to fnd a fnte state equlbrum. Gven a profle of automata descrbng the equlbrum path, the belef-based value functon V (μ ) can be numercally computed by the standard recursve method wth a fnte dmensonal state space. Then one can determne the optmal polcy functon and ts nverse functon τ. The dynamcs of belefs can be calculated by the standard Bayes rule to 7

8 compute set B. By checkng the consstency of τ and B as n Theorems 1 and 2, one can determne f the postulated fnte automata can ndeed be the path of play of an equlbrum (fnte state equlbrum). In ths secton, we present a specal case whch allows us to verfy analytcally (as opposed to numercally) whether a certan fnte automaton profle s a fnte state equlbrum. Fx a profle of fnte automata (M 1,..., M n ), where M = (Q,m,f ). Let V (q) be player s payoff generated by (M 1,..., M n ) when the ntal state profle s q =(q 1,...,q n ). We wll fnd an equlbrum where contnuaton path of play s always gven by automata (M 1,..., M n ) wth some ntal state q. Let W (q, μ )= P q V (q)μ (q ). Defne the upper envelope of those functons W (μ )=max q Q W (q, μ ). To nterpret ths, let M (q ) be player s (contngent) acton path generated by automaton M wth ntal state q. Functon W (μ ) s the (constraned) optmal payoff when player s path of play s restrcted to M (q ), q Q. For each μ,letq (μ ) be the set of states whch are constraned optmal: Q (μ )={q Q W (μ )=W (q, μ )}. Note that W (q, μ ) s lnear n μ, so that () the upper envelope W (μ ) s a pecewse lnear functon and () T (q ) s a convex polyhedron. In what follows we derve the condtons under whch the constraned optmzaton concdes wth the true optmal (so that W (μ ) s equal to the belef-based (truly optmal) value functon V (μ )). V (a, μ )= X q μ (q ) (1 δ) X g (a,f (q )) + δ X a s W (χ [μ,a,s ])π (s a,f (q )) Let T W (μ ) be the value functon when the contnuaton value functon s gven by W (μ ). Formally, T W (μ )=maxv (a, μ ) X μ (q ) (1 δ) X g (a,f (q )) + δ X a A q a s Note that functons W (the upper envelope of fntely many lnear functons). To show that W (μ ) = V (μ ), we just need to show that the constraned optmal path s an unconstraned optmal contnuaton strategy. For each μ, let Q 0 (μ ) be the set of states such that f (q ) solves 8 W (χ [μ,a,s ])π (s a,f (q

9 max a A V (a, μ ). Suppose that, for every q Q 0 (μ ), the transton probablty s consstent wth M.e. q 0 T (χ [μ,a,s ]) for every q 0 such that m (q 0 q,a,s ) > 0. Then max a A V (a, μ ) s acheved by followng M (q ),.e. V (f (q ), μ )=W (q, μ ) for any q Q 0 (μ ). Furthermore t s straghtforward to show that M (q ) generates the constraned optmal payoff,.e. V (f (q ), μ )=W (μ ) for any q Q 0 (μ ).. Ths establshes that W (μ ) s the value functon for the orgnal unconstraned problem. Ths also mples that Q 0 (μ )=T (μ ) We say that a subset of belefs over the opponents states Ξ Q s belef-closed f belefs, once n ths set, never ext from ths set on and off the equlbrum path: a,s μ Ξ = μ 0 = χ [μ,a,s ] Ξ. Now we are ready to state a set of suffcent condtons for the gven automata (M 1,..., M n ) to generate a fnte state equlbrum. Proposton 3 A fnte state automata (M 1,..., M n ) generates a fnte state equlrbum f and only f 1. there s a belef-closed set Ξ for every N such that, for every μ Ξ, there exsts q that satsfes the followng two propertes: (a) f (q ) maxmzes V (a, μ ) (b) χ [μ,a,s ] T (q 0) for all (a,s ), for all q 0 suppm ( q,a,s ), and 2. there s an ntal jont dstrbuton μ on Q such that q Q (μ (q )) f μ (q ) > 0. Proof. Suppose that (1) s satsfed. Then we havet W (μ )=W (q, μ ) for ths q. Notce that T W (μ ) W (μ ) by defnton. Snce W (μ ) s the upper envelope of W (q, μ ), ths mples that W (q, μ )=W (μ ). Hence W s a fxed pont of T, thus t must concde wth the true value functon V on Ξ (because the value functon s unque by Blackwell s theorem). Then clearly ths fnte state automata (M 1,..., M n ) assgns the optmal acton at every hstory gven the jont dstrbuton n (2). On the other hand, suppose that (M 1,..., M n ) generates a fnte state equlrbum wth some ntal jont dstrbuton μ. Clearly (2) s a necessary condton for ths fnte state automata to play an optmal strategy. Defne 9

10 Ξ to be the set of all possble belefs of player on and off the equlbrum path gven ths automata. For any μ Ξ,M s optmal wth some q beng the startng pont by defnton. Therefore 1-(a) and 1-(b) follow. 5 An example: Fnte state equlbra n Bhaskar and Obara (2002) We apply the above technque to provde a complete characterzaton of fnte state equlbra from Bhaskar and Obara [1]. Consder the standard prsoners dlemma stage game wth the followng payoff table C D C 1, 1 h, 1+d D 1+d, h 0, 0 where d, h > 0 (D s domnant) and d h<1 ((C, C) s effcent). We also assume d<h: the gan from playng D s relatvely larger when the other player s playng D. Actons are not observable. Player observes aprvatesgnals {c, d} about player j 6= s acton. We assume that Pr(s 1, s 2 a 1,a 2 ) = Pr(s 1 a 2 )Pr(s 2 a 1 ) and Pr(s j = a a ) = 1 ε for all (a,s j ) and 6= j. Thus prvate sgnals are condtonally ndependent and each player s sgnal s affected only by the other player s acton. Gven these restrctons, t s wthout loss of generalty to assume ε 1 2. Consder a fnte state automaton M =(Q,m,f ), =1, 2 whch satsfes Q = {x, y},m (x x, c) =1,m (x x, d) =m (x y, s )=0,f(x) =C, and f (y) =D. Let σ C be a partal strategy whch descrbes player s play when he starts at state x and σ D be a partal strategy startng at state y. The frst strategy s realzaton equvalent to the standard trgger strategy and the second strategy s to the strategy whch plays D ndependent of hstory. Bhaskar and Obara [1] shows that (M 1,M 2 ) generates a fnte state equlbrum when δ s n some mddle range and ε s small enough. Our goal s to characterze a range of parameters (ε, δ) where (M 1,M 2 ) generates a fnte state equlbrum. Snce (σ D, σ D ) s always a sequental equlbrum, we are nterested n equlbra where both players pck σ C wth postve probablty. Suppose that the other player s ether playng σ C or σ D. Then the other player s contnuaton strategy s also ether σ C or σ D at any pont of tme n the future. Regardng these (contnuaton) strateges as states, each player s decson problem becomes the followng dynamc programmng problem. 10

11 V (μ) = max{v C (μ),v D (μ)} V C (μ) = (1 δ) g (C, μ)+δ [Pr (c μ) V (χ[μ,c,c]) + (1 Pr (c μ)) V (χ[μ,c,d])] V D (μ) = (1 δ) g (D, μ)+δ [Pr (c μ) V (χ[μ,d,c]) + (1 Pr (c μ)) V (χ[μ,d,d])],wherev (ρ) s the maxmum payoff for player when he beleves that player j s playng σ C wth probablty μ and σ D wth probablty 1 μ. When player plays a and contnues to play the optmal contnuaton strategy, he obtans V a (μ). Player 0 s stage game payoff gven a and μ s denoted g (a, μ). Player s belef s tracked by χ[μ,d,d], whch s player s belef of the other player s state beng x n the next perod gven that (a,s ) s realzed n the current perod. These functons have the followng forms: χ[μ,c,c] = χ[μ,c,d] = χ[μ,d,c] = χ[μ,d,d] = Let s frst prove the followng lemma: Lemma μ (1 ε) 2 μ (1 ε)+(1 μ) ε με (1 ε) με +(1 μ)(1 ε) μ (1 ε) ε μ (1 ε)+(1 μ) ε με 2 με +(1 μ)(1 ε) 1. V (μ) s contnuous, strctly ncreasng, and convex n ρ. 2. There exsts μ (0, 1] such that V (μ) = V C (μ) >V D (μ) f μ > μ V (μ) = V D (μ) V C (μ) f μ = μ V (μ) = V D (μ) >V C (μ) f μ < μ Proof. (Proof of 1). The value functon V (μ) can be computed teratvely by: V t+1 (μ) = max VC t+1 (μ),vd t+1 (μ)ª VC t+1 (μ) = (1 δ) g (C, μ)+δ Pr (c μ) V t (χ[μ,c,c]) + (1 Pr (c μ)) V t (χ[μ,c,d]) V t+1 D (μ) = (1 δ) g (D, μ)+δ Pr (c μ) V t (χ[μ,d,c]) + (1 Pr (c μ)) V t (χ[μ,d,d]) 11

12 For example, f we set V 1 (ρ) =1+g (constant), then V t (μ) ª s a sequence of decreasng functons, convergng to V (μ) n sup norm (V t V t+1 t=1 s a contracton mappng). Note that f V t s contnuous and ncreasng, then V t+1 s also contnuous and ncreasng. Snce contnuty and monotoncty spreservednthelmt,v (μ) s contnuous and ncreasng. We can show that V (μ) s ndeed strctly ncreasng. Suppose that V (μ) =V C (μ). For any eμ > μ, V (eμ) V (μ) V C (eμ) V C (μ) = (1 δ) {g (C, eμ) g (C, μ)} + {Pr (c eμ) Pr (c μ)}{v (χ[eμ,c,c]) V (χ[eμ,c,d])} + δ Pr (c μ) {V (χ[eμ,c,c]) V (χ[μ,c,c])} + (1 Pr (c μ)) {V (χ[eμ,c,d]) V (χ[μ,c,d])} The frst term (the current perod payoff dfference) s strctly postve. The other terms are all postve as V s ncreasng. The same proof apples when V (μ) =V D (μ). Let s show that V (μ) s convex. Let μ be a postve lnear combnaton of μ 1 and μ 2 and suppose that C s played and the optmal contnuaton strategy σ follows gven any prvate sgnal s. Let V (C, μ) be the expected dscounted average payoff V (C, μ) from ths strategy. Snce V (C, μ) s lnear n μ : V C (λμ 1 +(1 λ) μ 2 )=λv (C, μ 1 )+(1 λ) V (C, μ 2 ). Snce ths fxed contnuaton strategy may not be optmal for dfferent belefs, we have V (C, μ 1 ) V C (μ 1 ) and V (C, μ 2 ) V C (μ 2 ). Therefore V C (λμ 1 +(1 λ) μ 2 ) λv C (μ 1 )+(1 λ) V C (μ 2 ). Smlarly V D can be shown to be convex n μ. Hence V (μ 2 ) s convex because t s an upper envelope of two convex functons. (Proof of 2). (a, μ 2 ) : V C (eμ) V D (eμ) > V C (μ) V D (μ) for any eμ > μ. Ths and V D (0) >V C (0) guarantees that there exsts such μ. 1 We show that V a (μ) =lmv t a (μ) s supermodular n V C (μ) V D (μ) = (1 δ) {g (C, μ) g (D, μ)} + Pr (c μ) {V (χ[μ,c,c]) V (χ[μ,d,c])} + δ (1 Pr (c μ)) {V (χ[μ,c,d]) V (χ[μ,d,d])} Frst, g (C, μ) g (D, μ) s strctly ncreasng n μ because d<h. It s straghtforward to show that χ[μ,c,c] χ[μ,d,c] and χ[μ,c,d] χ[μ,d,d] s also strctly ncreasng n μ. Snce V s ncreasng and convex, V (χ[μ,c,c]) 1 Note that μ can be 1. In that case, V D (μ) always domnates V C (μ). 12

13 V (χ[μ,d,c]) and V (χ[μ,c,d]) V (χ[μ,d,d]) s also strctly ncreasng n μ. Note also that Pr (c ρ) s also affected by the change of μ : there would be more weght on c for a larger μ. Snce t can be shown that χ[μ, C, c] χ[μ, D, c] > χ[μ, C, d] χ[μ, D, d] for any μ, hence V (χ[μ, C, c]) V (χ[μ,d,c]) >V(χ[μ,C,d]) V (χ[μ,d,d]) by the convexty of V, more probablty on c also has an effect of ncreasng the dfference between the contnuaton payoffs ofv C (μ) and V D (μ). Therefore t s proved that V C (μ 0 ) V D (μ 0 ) >V C (μ) V D (μ) for any μ 0 > μ. 1. If (1 ε) 2 ε, then χ[μ,a,s ] < μ for all μ > 0. Ths means that every μ > 0 s convergng to 0 even f C has always been played and c has always been observed. Hence ε < s necessary for σ C to be optmal for any belef. 2. Whatever the ntal belef s, when only (C, c) has been observed, the posteror belef s convergng to the fxed pont of χ[μ,c,c]: (1 ε)2 ε 1 2ε (0, 1). Ths mples two thngs: for σ C to be optmal, (1) (1 ε)2 ε 1 2ε μ and are necessary. " # (1 ε) 2 ε (2) χ,c,d μ 1 2ε 3. After a moment of reflecton, we can see that (1) and (2) s also suffcent for σ C to be optmal for μ [μ, eμ] where χ[eμ,c,d]=μ (eμ < 1). Note that χ[μ,d,s ] μ for all μ and s = c, d. (Thus σ D s optmal for μ [0, μ ]). In the followng, we compute μ to characterze a range of parameters where (1) and (2) s satsfed. When (1) and (2) s satsfed, a player s contnuaton strategy s σ C for μ > μ, σ D for μ < μ and he s ndfferent between σ C and σ D when μ = μ. We can calculate μ from ths ndfference. Let V zz 0 be player 1 s dscounted average payoff when player 1 plays σ Z and player 2 plays σ Z 0. Then o V CC = (1 δ)+δ n(1 ε) 2 V CC +(1 ε) ε (V CD + V DC )+ε 2 V DD 13

14 V CD = (1 δ)( h)+δ {εv CD +(1 ε) V DD } V DC = (1 δ)(1+d)+δ {εv DC +(1 ε) V DD } V DD = 0 Thus V CC = (1 δ)(1+d h) (1 δ)+δ (1 ε) ε 1 δε 1 δ (1 ε) 2 V CD = (1 δ)( h) 1 δε V DC = (1 δ)(1+d) 1 δε V DD = 0 Then μ s the number to satsfy That s, μ V CC +(1 μ ) V CD = μ V DC +(1 μ ) V DD μ = V CD V CC V CD V DC μ = (1 δ)h 1 δε (1 δ)+δ(1 ε)ε (1 δ)(1+d h) 1 δε 1 δ(1 ε) 2 (1 δ)(1+d h) 1 δε =... h n1 δ (1 ε) 2o = δ (1 2ε)+{1 δ (1 ε)} (h d) Therefore the necessary and suffcent condton bols down to: (1) (1 ε)2 ε h n1 δ (1 ε) 2o 1 2ε δ (1 2ε)+{1 δ (1 ε)} (h d) n o and (1 ε) (1 ε) 2 ε h n1 δ (1 ε) 2o (2) 2(1 ε) 2 ε δ (1 2ε)+{1 δ (1 ε)} (h d) Suppose that d =1and h =2. Then the above nequaltes become: δ 2 9ε +13ε 2 5ε 3 1 ε ε 2 δ 4 17ε +26ε 2 16ε 3 +3ε 4 3 6ε + ε 3 14

15 Below we plot these two nequaltes. If (ε, δ) falls n the area between the sold lne and the dashed lne, then there exsts some belef μ such that σ C becomes optmal elta Epslon For a large ε, amxtureofσ C and σ D can be an equlbrum even when players are very patent (however ths s a very neffcent equlbrum). As ε gets smaller, a range of δ where such equlbrum exsts go down. The effcency mproves when ε s small so that punshment s not accdentally trggered and/or δ s small so that eventual breakdown of cooperaton does not affect the current dscounted average payoff much. The effcent payoff can be approached as (ε, δ) 0, 1 2 wthn ths area. Ths s the effcent equlbrum obtaned n Bhaskar and Obara [1]. References [1] V. Bhaskar and I. Obara. Belef-based equlbra n the prsoners dlemma wth prvate montorng. Journal of Economc Theory, 102:16 39, [2] J. C. Ely, J. Hörner, and W. Olszewsk. Belef-free equlbra n repeated games. Econometrca, 73(2): , [3] J. C. Ely and J. Välmäk. A robust folk theorem for the prsoner s dlemma. Journal of Economc Theory, 102(1):84 105,

16 [4] J. Hörner and W. Olszewsk. The folk theorem for games wth prvate almost-perfect montorng. mmeo, [5] M. Kandor and I. Obara. Effcency n repeated games revsted: The role of prvate strateges. Econometrca, 74(2): , [6] H. Matsushma. Repeated games wth prvate montorng: Two players. Econometrca, 72: , [7] C. Phelan and A. Skrzypacz. Prvate montorng wth nfnte hstores. mmeo, [8] M. Pccone. The repeated prsoner s dlemma wth mperfect prvate montorng. Journal of Economc Theory, 102(1):70 83, [9] T. Sekguch. Effcency n repeated prsoner s dlemma wth prvate montorng. Journal of Economc Theory, 76: ,

Tit-For-Tat Equilibria in Discounted Repeated Games with. Private Monitoring

Tit-For-Tat Equilibria in Discounted Repeated Games with. Private Monitoring 1 Tt-For-Tat Equlbra n Dscounted Repeated Games wth Prvate Montorng Htosh Matsushma 1 Department of Economcs, Unversty of Tokyo 2 Aprl 24, 2007 Abstract We nvestgate nfntely repeated games wth mperfect