arxiv: v2 [q-bio.pe] 23 Nov 2017

Size: px

Start display at page:

Download "arxiv: v2 [q-bio.pe] 23 Nov 2017"

Christal Griffith
6 years ago
Views:

1 Zero-determinant strategies in finitely repeated games Genki Ichinose and Naoki Masuda 2 arxiv: v2 [q-bio.pe] 23 Nov 207 Department of Mathematical and Systems Engineering, Shizuoka University, 3-5- Johoku, Naka-ku, Hamamatsu, , Japan 2 Department of Engineering Mathematics, University of Bristol, Merchant Venturers Building, Woodland Road, Clifton, Bristol BS8 UB, United Kingdom Corresponding author (naoki.masuda@bristol.ac.uk) Abstract November 27, 207 Direct reciprocity is a mechanism for sustaining mutual cooperation in repeated social dilemma games, here a player ould keep cooperation to avoid being retaliated by a co-player in the future. So-called zero-determinant (ZD) strategies enable a player to unilaterally set a linear relationship beteen the player s on payoff and the co-player s payoff regardless of the strategy of the co-player. In the present study, e analytically study zero-determinant strategies in finitely repeated (to-person) prisoner s dilemma games ith a general payoff matrix. Our results are as follos. First, e present the forms of solutions that extend the knon results for infinitely repeated games (ith a discount factor of unity) to the case of finitely repeated games (0 < < ). Second, for the three most prominent ZD strategies, the equalizers, extortioners, and generous strategies, e derive the threshold value of above hich the ZD strategies exist. Third, e sho that the only strategies that enforce a linear relationship beteen the to players payoffs are either the ZD strategies or unconditional strategies, here the latter independently cooperates ith a fixed probability in each round of the game, proving a conjecture previously made for infinitely repeated games. Keyords Prisoner s dilemma game; Cooperation; Direct reciprocity; Discount factor Introduction The prisoner s dilemma game models situations in hich to individuals are involved in a social dilemma and each individual selects either cooperation (C) and defection (D) in the simplest setting. Although an individual obtains a larger payoff by selecting D regardless of the choice of the other individual, mutual defection, hich is the unique Nash equilibrium of the game, yields a smaller benefit to both players than mutual cooperation does. We no kno various mechanisms that enable mutual cooperation in the prisoner s dilemma game and other social dilemma games [ 3], hich inform us ho cooperation is probably sustained in society of humans and animals and ho to design cooperative organisations and society. One of the mechanisms enabling mutual cooperation in social dilemma games is direct reciprocity, i.e., repeated interaction, in hich the same to individuals play the game multiple times. An individual that defects ould be retaliated by the co-player in the succeeding rounds. Therefore, the rational decision for both players in the repeated prisoner s dilemma game is to keep mutual cooperation if the number of iteration is sufficiently large [, 4, 5]. Generous tit-for-tat [6] and in-stay lose-shift (often called Pavlov) [7, 8] strategies are strong competitors in evolutionary dynamics of the repeated prisoner s dilemma game under noise, and a population composed of them realizes a high level of mutual cooperation. In 202, hen the study of direct reciprocity seemed to be matured, Press and Dyson proposed a novel class of strategies in the repeated prisoner s dilemma game, called zero-determinant (ZD) strategies [9]. ZD strategies impose a linear relationship beteen the payoff obtained by a focal individual and its co-player regardless of the strategy that the co-player implements. A special case of the ZD strategies is the equalizer, ith hich

2 the focal individual unilaterally determines the payoff that the co-player gains regardless of hat the co-player does, ithin a permitted range of the co-player s payoff value (see [2, 0] for the previous accounts for this strategy). As a different special case, the focal individual can set an extortionate share of the payoff that the individual gains as compared to the co-player s payoff. The advent of the ZD strategies has spurred ne lines of investigations of direct reciprocity. They include the examination and extension of ZD strategies such as their evolution [ 22], multiplayer games [20, 23 26], continuous action spaces [25 28], alternating games [28], human reactions to computerized ZD strategies [29, 30], and human-human experiments [25, 3]. Most of the aforementioned mathematical and computational studies of the ZD strategies have been conducted under the assumption of infinitely repeated games. While mathematically more elegant and advantageous, finitely repeated games are more realistic than infinitely repeated games and comply ith experimental studies. In the present study, e examine the ZD strategies in the finitely repeated prisoner s dilemma game. There are a fe studies that have investigated ZD strategies in finitely repeated games. Hilbe and colleagues defined and mathematically characterized ZD strategies in finitely repeated games [32] (also see [29]). McAvoy and Hauert analyzed ZD strategies in the finitely repeated donation game (i.e., a special case of the prisoner s dilemma game) in a continuous strategy space [27, 28]. Given these studies, our main contributions in the present article are summarized as follos. First, e derive expressions for ZD strategies in finitely repeated games that are straightforard extensions of those previously found for the infinitely repeated game. Second, for the three most studied ZD strategies, e derive the threshold discount factor (i.e., ho likely the next round of the game occurs in the finitely repeated game) above hich the ZD strategy can exist. Third, e prove that imposing a linear relationship beteen the to individuals payoffs implies that the focal player takes either the ZD strategy defined for finitely repeated games[32] or an unconditional strategy(e.g., unconditional cooperation and unconditional defection), proving the conjecture in [5] in the case of finitely repeated games. 2 Preliminaries In this section, e explain the finitely repeated prisoner s dilemma game, the strategies of interest (i.e., memoryone strategies), and the expected payoffs. More thorough discussion of them is found in Refs. [2, 32, 33]. We consider the symmetric to-person prisoner s dilemma game hose payoff matrix is given by ( C D ) C R S. () D T P The entries represent the payoffs that the focal player, denoted by X, gains in a single round of a repeated game. Each ro and column represents the action of the focal player, X, and the co-player (denoted by Y), respectively. We assume that T > R > P > S, (2) hich dictates the prisoner s dilemma game. Both players obtain a larger payoff by selecting D than C because T > R and P > S. We also assume that 2R > T +S, (3) hich guarantees that mutual cooperation is more beneficial than the to players alternating C and D in the opposite phase, i.e., CD, DC, CD, DC,..., here the first and second letter represent the actions selected by X and Y, respectively [5,34]. The to players repeat the game hose payoff matrix in each round is given by Eq. (). A next round given the current round takes place ith probability (0 < < ), hich is called the discount factor. Consider to players X and Y that adopt memory-one strategies, ith hich they use only the outcome of the last round to decide the action to be submitted in the current round. A memory-one strategy is specified by a 5-tuple; X s strategy is given by a combination of p = (p CC,p CD,p DC,p DD ) (4) and p 0, here 0 p CC,p CD,p DC,p DD,p 0. In Eq. (4), p CC is the conditional probability that X cooperates hen both X and Y cooperated in the last round, p CD is the conditional probability that X cooperates hen X cooperated and Y defected in the last round, p DC is the conditional probability that X cooperates hen X defected and Y cooperated in the last round, and p DD is the conditional probability that X cooperates hen both X and Y defected in the last round. Finally, p 0 is the probability that X cooperates in the first round. Similarly, Y s strategy is specified by a combination of q = (q CC,q CD,q DC,q DD ) (5) 2

3 and the probability to cooperate in the first round, q 0, here 0 q CC,q CD,q DC,q DD,q 0. We refer to the first round of the repeated game as round 0. Because both players have been assumed to use a memory-one strategy, the stochastic state of the to players in round t (t 0) is specified by v(t) = (v CC (t),v CD (t),v DC (t),v DD (t)), (6) here v CC (t) is the probability that both players cooperate in round t, v CD (t) is the probability that X cooperates and Y defects in round t, and so forth. The normalization is given by v CC (t)+v CD (t)+v DC (t)+v DD (t) = (t = 0,,...). The initial condition is given by v(0) = (p 0 q 0,p 0 ( q 0 ),( p 0 )q 0,( p 0 )( q 0 )). (7) Because the expected payoff to player X in round t is given by v(t)s X, here the expected per-round payoff to player X in the repeated game is given by S X = (R,S,T,P), (8) π X = ( ) t v(t)sx. (9) The transition-probability matrix for v(t) is given by p CC q CC p CC ( q CC ) ( p CC )q CC ( p CC )( q CC ) M = p CD q DC p CD ( q DC ) ( p CD )q DC ( p CD )( q DC ) p DC q CD p DC ( q CD ) ( p DC )q CD ( p DC )( q CD ). (0) p DD q DD p DD ( q DD ) ( p DD )q DD ( p DD )( q DD ) t=0 By substituting in Eq. (9), one obtains v(t) = v(0)m t () π X =( )v(0) (M) t SX =( )v(0)(i M) SX, (2) here I is the 4 4 identity matrix. Similarly, the expected per-round payoff to player Y is given by π Y = ( )v(0)(i M) SY, (3) t=0 here S Y = (R,T,S,P). (4) 3 Results We search player X s strategies that impose a linear relationship beteen the to players payoffs, i.e., When α 0, e set χ = β/α and κ = γ/(α+β) to transform Eq. (5) to 3. Equalizer 3.. Expression απ X +βπ Y +γ = 0. (5) π X κ = χ(π Y κ). (6) By definition, the equalizer unilaterally sets the co-player s payoff, π Y, to a constant value irrespectively of the co-player s strategy [2, 9, 0]. To derive an expression for the equalizer strategies in the finitely repeated game, e proceed along the folloing idea: If a strategy p ensures that the payoffs of the to players are on a horizontal line in the π X -π Y space, irrespective of the co-player s strategy, then the payoffs must be on 3

4 that horizontal line if the co-player uses unconditional cooperation or unconditional defection. Substituting the co-player s unconditional cooperation and unconditional defection into the payoff formulas gives necessary conditions imposed on X s strategy. A straightforard computation then shos that these necessary conditions are in fact often sufficient; even if the co-player uses strategies that are not unconditional cooperation or defection, the to payoffs lie on the same line. We ill use the same idea in section 3.2 as ell. Because the equalizer is equivalent to α = 0 in Eq. (5) and hence not covered by Eq. (6), e start by reriting Eq. (3) as follos: here π Y =( )v(0)u eq =( )(p 0 q 0,p 0 ( q 0 ),( p 0 )q 0,( p 0 )( q 0 )) u eq u eq 2 u eq 3 u eq 4 =( )[p 0 q 0 u eq +p 0( q 0 )u eq 2 +( p 0)q 0 u eq 3 +( p 0)( q 0 )u eq 4 ], (7) u eq = u eq u eq 2 u eq 3 u eq 4 (I M) SY. (8) We denote u eq hen Y s strategy is q = (0,0,0,0) by u eq,0000. Note that u eq,0000 is independent of the probability that Y cooperates in the initial round, i.e., q 0. We denote by π Y,0000 the payoff of Y hen q = (0,0,0,0). Similarly, e denote u eq hen Y s strategy is q = (,,,) by u eq, and by π Y, the payoff of Y hen q = (,,,). The expressions of u eq,0000, π Y,0000, u eq,, and π Y, are given in Appendix A. If X applies an equalizer strategy, π Y,0000 = π Y, must hold true regardless of q 0. Therefore, e obtain [ ] ( ) p 0 q 0 u eq,0000 +p 0 ( q 0 )u eq, ( p 0 )q 0 u eq, ( p 0 )( q 0 )u eq, [ ] =( ) p 0 q 0 u eq, +p 0 ( q 0 )u eq, 2 +( p 0 )q 0 u eq, 3 +( p 0 )( q 0 )u eq, 4, (9) hich leads to [ ] q 0 p 0 (u eq,0000 u eq, ) p 0 (u eq, u eq, 2 )+( p 0 )(u eq, u eq, 3 ) ( p 0 )(u eq, u eq, 4 ) [ ] + p 0 (u eq, u eq, 2 )+( p 0 )(u eq, u eq, 4 ) = 0. (20) Equation (20) must hold true for arbitrary 0 q 0. Therefore, e obtain p 0 (u eq,0000 u eq, )+( p 0 )(u eq, u eq, 3 ) =0, (2) p 0 (u eq, u eq, 2 )+( p 0 )(u eq, u eq, 4 ) =0. (22) Combination of Eqs. (8), (2) and (22) leads to the folloing necessary conditions: p CD = p CC(T P) ( +p DD)(T R), R P (23) p DC = ( p CC)(P S)+p DD (R S), R P (24) and p CC, p DD, and p 0 are arbitrary under the constraint 0 p CC,p CD,p DC,p DD,p 0. Equations (23) and (24) extend the results previously obtained for = [9]. Surprisingly, Eqs. (23) and (24) are also sufficient for p to be an equalizer strategy. In other ords, if a strategy of player X satisfies Eqs.(23) and(24), then every co-player Y s strategy, not restricted to unconditional cooperation or unconditional defection, yields the same payoff of Y. To verify this, e substitute p = ( p CC, p CC(T P) ( +p DD)(T R), ( p CC)(P S)+p DD (R S) R P R P,p DD ) and q = (q CC,q CD,q DC,q DD ) in Eq. (8) to obtain ( p CC )P +( +p DD )R u eq = ( p CC )P +( +p DD )R ( )( p CC +p DD ) ( p CC )P +p DD R, (26) ( p CC )P +p DD R (25) 4

5 hich does not contain q. By substituting Eq. (26) in Eq. (7), e obtain π Y = ( p 0 +p 0 p CC )P +(p 0 p 0 +p DD )R p CC +p DD, (27) hich is independent of q and q 0. Therefore, the set of the equalizer strategies is given by Eq. (25), here 0 p CC,p CD,p DC,p DD, combined ith any 0 p 0. It should be noted that an equalizer does not require any condition on p 0. Hoever, Eq. (27) indicates that the payoff that an equalizer enforces on the co-player, π Y, depends on the value of p 0. Because Eq. (27) is a eighted average of P and R ith non-negative eights, an equalizer can impose any payoff value π Y such that P π Y R. If P is enforced, it holds that p 0 p 0 + p DD = 0, and hence p DD = p 0 = 0. Therefore, the equalizer is a cautious strategy (i.e., never the first to cooperate) [32]. If R is enforced, it holds that p 0 +p 0 p CC = 0, and hence p CC = p 0 =. Therefore, the equalizer is a nice strategy (i.e., never the first to detect) [32]. We remark that the equalizer is a ZD strategy for finitely repeated games as defined in Ref. [32] because it satisfies Eq. (3) of [32] ith α = Minimum discount rate In this section, e identify the condition for under hich equalizer strategies exist. Equation (25) indicates that an equalizer strategy exists if and only if ( ) 0 p CC (T P) +p DD (T R) R P (28) and 0 ( ) p CC (P S)+p DD (R S) R P (29) for some 0 p CC,p DD. Note that e used Eq. (2). Independently of, any pair of p CC and p DD satisfies the second inequality of Eq. (28) and the first inequality of Eq. (29) because they are satisfied in the most stringent case, i.e., p CC = and p DD = 0. The first inequality of Eq. (28) and the second inequality of Eq. (29) read and p DD T P T R p CC (30) p DD P S R S p CC P S R S + R P R S, (3) respectively. Equations (30) and (3) specify a p CC -p DD region in the square 0 p CC,p DD, near the corner (p CC,p DD ) = (,0) (shaded region in Fig. ). The feasible set (p CC,p DD ) monotonically enlarges as increases. Therefore, e obtain the condition under hich an equalizer exists by substituting p CC = and p DD = 0 in Eqs. (30) and (3), i.e., ( T R c max T P, P S ). (32) R S When = c, the unique equalizer strategy is given by p CC =, p DD = 0, and either p CD or p DC is equal to zero, depending on hether (T R)/(T P) is larger than (P S)/(R S) or vice versa. The condition (T R)/(T P) in Eq. (32) coincides ith that for the GRIM or tit-for-tat strategy to be stable against the unconditional defector [5]. Equation (32) is consistent ith the result for the continuous donation game [27]. Their result adapted to the caseof to discrete levels of cooperation is c = c/b, here b and c are the usual benefit and cost parameters in the donation game, respectively. We verify that Eq. (32) ith R = b c, T = b, S = c, and P = 0 yields c = c/b. 3.2 General cases All strategies but the equalizer in hich a linear relationship is imposed beteen π X and π Y are given in the form of Eq. (6). In this section, e derive expressions of X s strategy that realizes Eq. (6). 5

6 p DD 0 p CC Figure : Region in the p CC p DD space here the equalizer strategy exists (shaded region). The border line of the half plane specified by Eqs. (30) and (3) are shon by the solid and dashed lines, respectively. We set R = 3, T = 5, S = 0, P =, and = 0.8. By substituting Eqs. (2) and (3) in Eq. (6), e obtain Equation (33) yields here We set ( )v(0)(i M) S X κ = χ[ ( )v(0)(i M) S Y κ]. (33) v(0) { ( )(I M) [ SX χsy ] } +(χ )κ = 0, (34) u zd = Then, Eq. (34) is reritten as u zd u zd 2 u zd 3 u zd 4 =. (35) ( )(I M) [ SX χsy ] +(χ )κ. (36) v(0)u zd = 0, (37) hich is equivalent to [ q 0 p0 u zd p 0 u zd 2 +( p 0 )u zd 3 ( p 0 )u zd ] [ 4 + p0 u zd 2 +( p 0 )u zd ] 4 = 0. (38) Because Eq. (38) must hold true irrespectively of q 0, e require p 0 u zd +( p 0 )u zd 3 =0, (39) p 0 u zd 2 +( p 0 )u zd 4 =0. (40) Let us denote by u zd,0000 and u zd, the vector u hen q = (0,0,0,0) and q = (,,,), respectively. The expressions of u zd,0000 and u zd, are given in Appendix B. By substituting u zd,0000 and u zd, in Eqs. (39) and (40), e obtain the four necessary conditions, Eqs. (9), (92), (93), and (94), given in Appendix B. If e assume κ S +χ(t κ) 0, e can rerite Eq. (92) as p DD = ( )p 0[(χ )P +S χt]+( p CD )(χ )(κ P). (4) [κ S +χ(t κ)] 6

7 If e assume T κ+χ(κ S) 0, e can rerite Eq. (93) as p CC = ( )p 0[(χ )R+T χs]+t χs +(+p DC )(χ )κ p DC (χ )R. (42) [T κ+χ(κ S)] We ill deal ith the case κ S +χ(t κ) = 0 or T κ+χ(κ S) = 0 later in this section. By substituting Eqs. (4) and (42) in Eqs. (9), e obtain an equation containing p CD, p DC, p 0, κ, and χ as unknons. This equation can be factorized. By equating each of the to factors ith 0, e obtain to types of solutions. The one type of solution is given by ( )p 0 [(χ )R+S χt] ( p CD )(χ )R+χT S p CD (χ )κ [κ S +χ(t κ)] p CD p = ( )p 0 (χ+)(t S)+( p CD )[(χ )κ+t χs]. (43) [κ S +χ(t κ)] ( )p 0 [(χ )P +S χt]+( p CD )(χ )(κ P) [κ S +χ(t κ)] Equation (43) also satisfies Eq. (94). To verify that Eq. (43) is sufficient, e substitute Eq. (43) in Eq. (36) to obtain p 0 u zd ( )[S +(χ )κ χt] = p 0 p CD ( )p 0 p 0, (44) p 0 hich does not contain q. Using Eqs. (7) and (44), e verify Eq. (37). Therefore, Eq. (43) is a set of strategies that impose the linear relationship beteen the payoff of the to players, i.e., Eq. (6). The strategies given by Eq. (43) are ZD strategies for < as defined in Ref. [32], hich is verified as follos. Assume that α 0 in Eq. (3) of [32] because α = 0 corresponds to the equalizer. Then, let us substitute α = φ, β = φχ, and γ = φ(χ )κ in Eq. (3) of [32] ithout loss of generality. Note that this transformation is a bijection because (i) φ > 0 and (ii) either χ > or χ < 0 is required (in the notation of Ref. [32], φ > 0 and χ < because their χ is defined as the reciprocal of our χ). Then, e obtain φ(χ )(R κ) ( )p 0 +φ[(χ )κ χt +S] ( )p 0 p =, (45) φ[(χ )κ+t χs] ( )p 0 φ(χ )(κ P) ( )p 0 hich is equivalent to Eq. (33) of [32]. Equation (45) combined ith φ = p CD p 0 ( ) κ S +χ(t κ) (46) is equivalent to Eq. (43). It should also be noted that Eq. (45) extends Eq. (9) of [6], hich has been obtained for =, to general, R, and P values. The other type of solution that e obtain by substituting Eqs. (4) and (42) in Eq. (9) is given by p 0 [T R+χ(R S)] = T κ+χ(κ S). (47) Substitution of Eqs. (4) and (42) in Eq. (94) yields either Eq. (43) or The combination of Eqs. (47) and (48) is equivalent to that of p 0 [P S +χ(t P)] = (χ )(κ P). (48) κ = p 2 0R+p 0 ( p 0 )(T +S)+( p 0 ) 2 P (49) 7

8 and χ = ( p 0)(T P)+p 0 (R S) ( p 0 )(P S)+p 0 (T R). (50) Hoever, Eqs. (4), (42), (49), and (50) do not provide a sufficient condition for Eq. (37) to hold true for arbitrary q and q 0. Therefore, e additionally consider the vector u zd hen q = (,0,0,0) and q = (0,0,0,), hich e denote by u zd,000 and u zd,000, respectively. The calculations shon in Appendix C lead to p 0 = p CC = p CD = p DC = p DD (0 p 0 ). (5) To verify that the unconditional strategies given by Eq. (5) are a sufficient condition for Eq. (6) to hold true for arbitrary q and q 0, e substitute Eqs. (49), (50), and (5) in Eq. (36) to obtain ( p 0 )[ ( p 0 )P +(+p 0 )R p 0 (T +S)] u zd ( )(T S) = ( p 0 )[ (2 p 0 )P +T +S +p 0 (R S T)] ( p 0 )P +S +p 0 (R S T) p 0 [ ( p 0 )P +(+p 0 )R p 0 (T +S)], (52) p 0 [ (2 p 0 )P +T +S +p 0 (R S T)] hich does not contain q. Using Eqs. (7) and (52), e verify Eq. (37). The unconditional strategy given by Eq. (5) is not a ZD strategy in the sense of [32] unless R + P = T + S (Appendix D), hich is the same condition as that for the infinitely repeated game [5]. The obtained solution, i.e., Eq. (5) combined ith Eqs.(49) and (50), is equivalent to the previously derived solution for = [5]. This set of solutions contains the unconditional cooperator and unconditional defector as special cases, and alays realizes χ < 0 (Eq. (50)). When κ S +χ(t κ) = 0 or T κ+χ(κ S) = 0, the calculations shon in Appendices E and F reveal the folloing three types of solutions: (i) a subset of the ZD strategies given by Eq. (43) (Appendix F.2), (ii) a subset of the strategies given by Eq. (5) (Appendices E., E.2, and F.2), and (iii) the set of strategies given by p = ( p CC,, p CC (χ+)(κ T) [R (χ+)t +χκ] (κ R), (κ R) ) p CC (κ P) (R P) (κ R) (κ R) (53) here 0 p CC and κ R (Appendix E.2). Although Eq. (53) is a sufficient condition and the resulting solutions are distinct from those given by Eq. (43), in fact Eq. (53) yields χ < 0 (Appendix E.2). To summarize, the set of X s strategies that enforce Eq. (6) is the union of the strategies given by the ZD strategies, Eq. (43), and the non-zd unconditional strategies, Eq. (5). In the next sections, e examine to special cases, hich have been studied in the literature, and derive c in each case. 3.3 Extortioner 3.3. Expression The extortioner is defined as a strategy that enforces an extortionate share of payoffs larger than P [9]. We obtain the extortioner by setting κ = P in Eq. (6). By setting κ = P in Eq. (43), e obtain p = ( )p 0 [(χ )R+S χt] ( p CD )(χ )R+χT S p CD (χ )P [P S +χ(t P)] p CD ( )p 0 (χ+)(t S)+( p CD )[(χ )P +T χs] [P S +χ(t P)] ( )p 0. (54) Because p DD = ( )p 0 / 0 and <, e obtain p 0 = 0 and p DD = 0, hich is consistent ith the previously obtained result [32]. Therefore, the extortioner is never the first to cooperate and hence a so-called,p 0 =, 8

9 cautious strategy [32]. By setting p 0 = 0 in Eq. (54), e obtain p = Minimum discount rate p CD (χ )P ( p CD )(χ )R S +χt [P S +χ(t P)] p CD ( p CD )[(χ )P +T χs] [P S +χ(t P)] 0. (55) By setting κ = P and p 0 = 0 in Eq. (45), e obtain φ(χ )(R P) +φ[(χ )P χt +S] p =. (56) φ[(χ )P +T χs] 0 Because p CC and <, Eq. (56) implies that φ(χ ) > 0 must hold true. We consider the case φ > 0 and χ > in this section. We can exclude the case φ < 0 and χ < because a strategy ith χ < 0 is not considered as an extortionate strategy [9,4 6,22,24,27,28,32,35] and χ < implies χ < 0 (Appendix G.). When φ > 0, the application of 0 p CC,p CD,p DC to Eq. (56) yields (χ ) R P P S +χ T P P S φ (χ )R P P S, (57) φ +χt P P S, (58) χ+ T P P S φ. (59) The condition under hich a positive φ value that satisfies Eqs. (57), (58), and (59) exists is given by (χ ) R P P S +χ T P P S Equation (60) is alays satisfied. Equations (6), (62), and (63) yield +χt P P S, (60) (χ )R P P S, (6) χ+ T P (χ )R P P S P S, (62) χ+ T P +χt P P S P S. (63) χ[(t P) (T R)] R S (P S), (64) χ[(t S) (P S)] T P (T S), (65) χ[(r S) (P S)] T P (T R), (66) respectively. The left-hand side of Eq. (65) is alays larger than that of Eq. (66), and the right-hand side of Eq. (65) is alays smaller than that of Eq. (66). Therefore, Eq. (65) is satisfied if Eq. (66) is satisfied. The right-hand sides of Eqs. (64) and (66) are positive. Therefore, (T P) (T R) > 0 and (T S) (P S) > 0 are required for χ to be positive. On the other hand, if (T P) (T R) > 0 and (T S) (P S) > 0, Eqs. (64) and (66) guarantee that χ > and that a χ(> ) value exists. Therefore, an extortioner ith χ > 9

10 exists if and only if > c, here the c value coincides ith that for the equalizer; it is given by Eq. (32). Under > c, Eqs. (64) and (66) imply ( ) R S (P S) χ χ c () max (T P) (T R), T P (T R). (67) (R S) (P S) Equation (67) gives the range of χ values for hich the extortioner strategy exists. The conditions for the existence of an extortionate strategy are easier to satisfy for large in the sense that χ c () monotonically decreases as increases. In particular, e obtain lim c+0χ c () = and lim χ c () =. For a given χ value, the substitution of R = b c, T = b, S = c, and P = 0 in Eqs. (32) yields hich is consistent ith Eq. (7) of [27]. 3.4 Generous strategy 3.4. Expression c = χc+b χb+c, (68) The generous strategy, also called compliers, is defined as a strategy that yields a larger shortfall from the mutual cooperation payoff R for the player as compared to that for the co-player [,5,35]. We obtain the generous strategy by setting κ = R in Eq. (6). By setting κ = R in Eq. (43), e obtain p 0 ( ) p CD p = ( )p 0 (χ+)(t S)+( p CD )[(χ )R+T χs]. (69) [R S +χ(t R)] ( )p 0 [(χ )P +S χt]+( p CD )(χ )(R P) [R S +χ(t R)] Because p CC = [ ( )p 0 ]/, e obtain p 0 = and p CC =, hich is consistent ith the previously obtained result [32]. Therefore, the generous strategy is never the first to detect and hence a so-called nice strategy [5,32]. By setting p 0 = in Eq. (69), e obtain p CD p = ( )(χ+)(t S)+( p CD )[(χ )R+T χs] [R S +χ(t R)] ( )[(χ )P +S χt]+( p CD )(χ )(R P) [R S +χ(t R)] p CD = +. (70) ( pcd)[(χ )R+T χs] R S+χ(T R) + ( pcd)(χ )(R P) R S+χ(T R) Minimum discount rate By applying 0 p DC,p DD to Eq. (70), e obtain ( p CD)g, (7) ( p CD)g 2, (72) 0

11 g 2 g 2 = _ g - - p CD=0 g 2 =(-)g 0 pcd= _ - _ g Figure 2: Region in the g g 2 space here the generous strategy exists (shaded region). If (g,g 2 ) is located in this region (e.g., filled circle labeled p CD = 0), the square given by / g,g 2 / intersects the line segment connecting the assumed (g,g 2 ) and the origin. Note that any point on the line segment is realized by the solution by a value of p CD (Eqs. (7) and (72)). here (χ )R+T χs g R S +χ(t R), (73) g 2 (χ )(R P) R S +χ(t R). (74) The necessary and sufficient condition for 0 p CD that satisfies Eqs. (7) and (72) to exist is given by (Fig. 2) g, (75) g 2, (76) g 2 g. (77) In the remainder of this section, e assume χ 0, hich a generous strategy requires [5,6,27,28,32,35], and examine the conditions given by Eqs. (75), (76), and (77). For mathematical interests, the analysis of the minimum discount rate for χ < 0 is presented in Appendix G.2. First, because dg /dχ > 0, hich one can derive using Eq. (3), and g is continuous for χ 0, Eq. (75) is equivalent to χ R S (T S) (T R)+(T S) (78) and > T R T S. (79)

12 When (T R)/(T S), a positive χ value that satisfies Eq. (75) does not exist. Second, because dg 2 /dχ > 0 and g 2 is continuous for χ 0, Eq. (76) is equivalent to χ R S (P S) (T R)+(T P) (80) and > T R T P. (8) When (T R)/(T P), a positive χ value that satisfies Eq. (76) does not exist. Third, because d(g 2 /g )/dχ > 0 and g 2 /g is continuous for χ 0, Eq. (77) is equivalent to χ T P (T R) (P S)+(R S) (82) and > P S R S. (83) When (P S)/(R S), a positive χ value that satisfies Eq. (77) does not exist. By combining Eqs. (79), (8), and (83), e find that a generous strategy exists if and only if > c, here c is given by Eq. (32). Therefore, the threshold value above hich a ZD strategy exists is the same for the equalizer, extortioner, and generous strategy. It should be noted that = c is alloed for the equalizer, but not for the extortioner and the generous strategy. When > c, Eq. (80) implies Eq. (78), and hence one obtains ( ) R S (P S) χ χ c () max (T R)+(T P), T P (T R). (84) (P S)+(R S) Note that χ c () > and χ c () decreases as (> c ) increases. Equation (84) implies that lim c+0χ c () = and lim χ c () =, hich are the same asymptotic as the case of the extortioner. 4 Conclusions We analyzed ZD strategies in finitely repeated prisoner s dilemma games ith general payoff matrices. Apart from the derivation of convenient expressions for ZD strategies, the novel results derived in the present article are to-fold. First, e derived the threshold discount factor value, c, above hich the ZD strategies exist for three commonly studied classes of ZD strategies, i.e., equalizer, extortioner, and generous strategies. They all share the same threshold value. Similar to the case of the condition for mutual cooperation in direct reciprocity, ZD strategies can exist only hen there are sufficiently many rounds. Second, e shoed that the memoryone strategies that impose a linear relationship beteen the payoff of the to players are either ZD strategies (Eqs. (43) and (53)) or an unconditional strategy (Eq. (5)). The latter class includes the unconditional cooperator and unconditional defector as special cases. Therefore, for finitely repeated prisoner s dilemma games (i.e., < ), e ansered affirmatively to the conjecture posed in [5]. With a continuity argument, our results also cover the infinite case, by the consideration of the limit. In other ords, if the to payoffs are in a linear relationship for any = ǫ, here ǫ, then the payoffs are also on a line as ǫ goes to 0. For a similar argument, see Eqs. (5) and (6) in Ref. [32]. The present results also hold true hen the co-player employs a longer-memory strategy, because it is straightforard to apply the proof for the infinite case [9] to the finite case. Our analytical approach is different from the previous approaches. Press and Dyson s derivation is based on the linear algebra of matrices [9]. The proof in Ref. [4] considers certain telescoping sums. The approach considered in the present study is more elementary than theirs, i.e., to derive necessary conditions and sho that they are sufficient by straightforard calculations. We mention possible directions of future research. First, e conjecture that the c value is the same for all ZD strategies because it takes the same value for the three common ZD strategies. Second, the explicit forms of our solutions (Eqs. (25) and (43)) may be useful for exploring features of ZD strategies in finitely repeated games. For example, investigation of evolutionary dynamics and extensions to multiplayer games, hich have been examined for infinitely repeated games (see section for references), in the case of finitely repeated games may benefit from the present results. 2

13 Appendix A Expression of u eq,0000, π Y,0000, u eq,, and π Y, By substituting q = (0,0,0,0) in Eq. (0) and then substituting the obtained M in Eq. (8), e obtain u eq,0000 = ( )( p CD +p DD ) ( )( p CD +p DD )R+( p CC +p CC p CD )P +(p CC p CC +p DD )T ( p CD )P +( +p DD )T, ( )( p CD +p DD )S +( p DC p CD +p DC )P +(p DC p DC +p DD )T ( p CD )P +p DD T (85) hich leads to π Y,0000 = ( )v(0)u eq,0000. (86) Similarly, by substituting q = (,,,) in Eq. (0) and then substituting the obtained M in Eq. (8), e obtain u eq, = ( )( p CC +p DC ) ( p CC )S +( +p DC )R ( )( p CC +p DC )T +( p CD p CC +p CD )S +(p CD p CD +p DC )R, ( p CC )S +p DC R ( )( p CC +p DC )P +( p DD p CC +p DD )S +(p DD +p DC p DD )R (87) hich leads to π Y, = ( )v(0)u eq,. (88) Appendix B Expression of u zd,0000 and u zd,, and four necessary conditions in section 3.2 By substituting q = (0,0,0,0) in Eq. (0) and then substituting the obtained M in Eq. (36), e obtain ( p CC +p CC p CD )(χ )P +(p CC p CC +p DD )(S χt) +(χ )κ ( )(χ )R p CD +p DD ( p CD )(χ )P +( +p DD )(S χt) +(χ )κ p CD +p DD u zd,0000 =. ( p DC p CD +p DC )(χ )P +(p DC p DC +p DD )(S χt) +(χ )κ+( )(T χs) p CD +p DD ( p CD )(χ )P +p DD (S χt) +(χ )κ p CD +p DD (89) 3

14 By substituting q = (,,,) in Eq. (0) and then substituting the obtained M in Eq. (36), e obtain ( p CC )(T χs) ( +p DC )(χ )R +(χ )κ p CC +p DC ( p CD p CC +p CD )(T χs) (p CD p CD +p DC )(χ )R +(χ )κ+( )(S χt) p CC +p DC u zd, =. ( p CC )(T χs) p DC (χ )R +(χ )κ p CC +p DC ( p DD p CC +p DD )(T χs) (p DD +p DC p DD )(χ )R +(χ )κ ( )(χ )P p CC +p DC (90) Note that the denominator on the right-hand side of Eqs. (89) and (90) is positive. By substituting Eq. (89) in Eq. (39), e obtain ( )p 0 {( p CD +p DD )[ (χ )R T +χs]+(p CC p DC )[(χ )P +S χt]} +( p CD +p DD )[(χ )κ+( )(T χs)]+[ ( p DC p CD +p DC )(χ )P By substituting Eq. (89) in Eq. (40), e obtain +(p DC p DC +p DD )(S χt)] = 0. (9) ( )p 0 [(χ )P +S χt] ( p CD )(χ )P +p DD (S χt) By substituting Eq. (90) in Eq. (39), e obtain +( p CD +p DD )(χ )κ = 0. (92) ( )p 0 [ (χ )R T +χs] +( p CC )(T χs) p DC (χ )R+( p CC +p DC )(χ )κ = 0. (93) By substituting Eq. (90) in Eq. (40), e obtain ( )p 0 {( p CC +p DC )[(χ )P +S χt]+(p CD p DD )[ (χ )R T +χs]} +( p CC +p DC )(χ )[κ ( )P]+[( p DD p CC +p DD )(T χs) Appendix C Derivation of Eq. (5) (p DD +p DC p DD )(χ )R] = 0. (94) In this section, e derive Eq. (5) from Eqs. (49) and (50). We obtain ( p CC )( p CD +p DD ) {{ ( )( p CD +p DD )R 2 ( p CC )[ ( )p DC p CD ]P } (χ ) +( p CC ){( )( p CD +p DD )(T χs)+[( )p DC +p DD ](S χt)}}+(χ )κ ( p u zd,000 CD )(χ )P +( +p DD )(S χt) = +(χ )κ p CD +p DD. ( p DC p CD +p DC )(χ )P +(p DC p DC +p DD )(S χt) +(χ )κ+( )(T χs) p CD +p DD ( p CD )(χ )P +p DD (S χt) +(χ )κ p CD +p DD (95) 4

15 Note that the denominator on the right-hand side of Eq. (95) is positive. By substituting Eq. (95) in Eq. (40), e obtain Eq. (92). By substituting Eq. (95) in Eq. (39), e obtain p 0 ( ){{ ( p CD +p DD )R+[ ( )p DC p CD ]P}(χ ) ( )( p CD +p DD )(T χs) [( )p DC +p DD ](S χt)} +( p CC ){ [ ( )p DC p CD ](χ )P +[( )p DC +p DD ](S χt)} +( p CC )( p CD +p DD )(χ )κ+( p CC )( p CD +p DD )( )(T χs) = 0. (96) Substitution of Eqs. (4) and (42) in Eq. (96) yields either the third entry of Eq. (43) or (p 0 p DC )(κ R)( )(χ )[(χ )P +S χt] [T κ+χ(κ S)][κ S +χ(t κ)] = 0. (97) The case in hich the denominator on the right-hand side of Eq. (97) is equal to 0 is covered in Appendices E and F. We note that χ because χ = substituted in Eq. (50) yields T = S, hich contradicts Eq. (2). By combining this observation ith 0 < <, e obtain (p 0 p DC )(κ R)[(χ )P +S χt] = 0. (98) By substituting Eqs. (49) and (50) in Eq. (98), e obtain the folloing four possible cases: p 0 = p DC, p 0 =, p 0 = (R P)/(T +S R P), and p 0 = (T +S 2P)/(T +S R P). First, assume that p 0 = p DC. By substituting p 0 = p DC and Eq. (47) in Eq. (93), e obtain (p CC p DC )[T κ+χ(κ S)] = 0. Because e have excluded the case T κ+χ(κ S) = 0, hich e deal ith in Appendix E, e obtain p CC = p DC. Therefore, e obtain p 0 = p CC = p DC. (99) Second, assume that p 0 =. Substitution of p 0 = in Eq. (49) yields κ = R. Substitution of p 0 = and κ = R in Eq. (42) yields p CC =. Substitution of p 0 = in Eq. (50) yields χ = (R S)/(T R). Substitution of p 0 =, χ = (R S)/(T R), and κ = R in Eq. (92) yields ( p CD )(T S)(R P) = 0, hich implies p CD =. Therefore, p 0 = combined ith Eqs. (49) and (50) results in p 0 = p CC = p CD =. (00) Third, e note that R P p 0 (0) T +S R P because combination of p 0 = (R P)/(T + S R P) and 0 p 0 leads to T + S R P > 0 and 2R T +S, and the latter inequality contradicts Eq. (3). Fourth, assumethat p 0 = (T+S 2P)/(T+S R P). By substituting p 0 = (T+S 2P)/(T+S R P)in Eqs. (49)and (50), e obtainχ = (P S)/(T P) and κ = P, respectively. Then, e obtainκ S+χ(T κ) = 0, hich e have decided to deal ith later. To summarize, Eq. (98) leads to either Eq. (99) or (00). We obtain u zd,000 =(χ )κ + (+)( p CD )+ 2 [p DC ( p DD )+p CC p DD )] {[ +pcd + 2 ( p DC )( p DD ) 3 (p CD p DC )( p DD ) ] R +[ +( )p CC +p CD ]P}(χ )+{( p DD )[( p CD ) ( )p CC ](T χs) +[p CC 2 ( p DD )(p CC p DC )](S χt) } ( p CD )[ (P +p DD R)(χ )+( p DD )(T χs)] +[ 2 ( p DC p CC p DD +p DC p DD )](S χt). [ +p CD +( )p DC ](P +p DD R)(χ ) + { 2 p DD [ ( )p CC ] p CD ( 2 p DD ) } (T χs)+[p DC + 2 (p CC p DC )p DD ](S χt) [( p CD )P +p DD ( p CD )R](χ )+{( p DD )( p CD )(T χs) +(p DC +p CC p DD p DC p DD )(S χt)} (02) 5

16 Note that the denominator on the right-hand side of Eq. (02) is positive. By substituting Eq. (02) in Eq. (40), e obtain ( )p 0 {(P +p DD R)(χ ) ( p DD )(T χs)+(+)(s χt)} [( p CD )P +p DD ( p CD )R](χ )+{( p DD )( p CD )(T χs)+[p DC +p DD (p CC p DC )](S χt)} + { +( p CD ) 2 [p CD p DC p DD (p CC p DC )] } (χ )κ = 0. (03) Substitution of Eqs. (4) and (42) in Eq. (03) yields either the third entry of Eq. (43) or [T κ+χ(κ S)][κ S +χ(t κ)] { (χ ) 2 κ 2 [ ( )p 0 p CD ](χ )[T R+χ(R S)]P +{(T χs)+( )p 0 [T R+χ(R S)]}(S χt) { ( p CD )(χ )R [+( p CD χ)]t +[χ ( χ+p CD χ)]s}(χ )κ} = 0. (04) We examine the case in hich the denominator on the right-hand side of Eq. (04) is zero in Appendices E and F. Therefore, e ignore the denominator and substitute Eqs. (49) and (50) in Eq. (04) to obtain p 0 = p CD, p 0 = 0, p 0 = (R P)/(T +S R P), or p 0 = (T +S 2P)/(T +S R P). Among these four possible options, e have excluded p 0 = (R P)/(T +S R P) and p 0 = (T +S 2P)/(T +S R P) in the course of the analysis of u zd,000. First, assume that p 0 = p CD. By substituting p 0 = p CD and Eq. (48) in Eq. (92), e obtain (p CD p DD )[κ S +χ(t κ)] = 0. Because e have excluded the case κ S +χ(t κ) = 0, hich e deal ith in Appendix E, e obtain p DD = p CD. Therefore, e obtain p 0 = p CD = p DD. (05) Second, assume that p 0 = 0. Substitution of p 0 = 0 in Eq. (49) yields κ = P. Substitution of p 0 = 0 and κ = P in Eq. (4) yields p DD = 0. Substitution of p 0 = 0 in Eq. (50) yields χ = (T P)/(P S). Substitution of p 0 = 0, χ = (T P)/(P S), and κ = P in Eq. (93) yields p DC (R P) = 0, hich implies p DC = 0. Therefore, p 0 = 0 combined ith Eqs. (49) and (50) results in p 0 = p DC = p DD = 0. (06) A solution must simultaneously satisfy either Eq. (99) or (00), and either Eq. (05) or (06). The combination of Eqs. (99) and (05) provides the set of unconditional strategies, i.e., Eq. (5). The combination of Eqs. (99) and (06) provides a subset of the strategies given by Eq. (5). The combination of Eqs. (00) and (05) also provides a subset of the strategies given by Eq. (5). Equations (00) and (06) are inconsistent ith each other. Therefore, the set of solutions is given by Eq. (5). Appendix D An unconditional strategy is not a ZD strategy unless R+P = T +S In this section, e sho that the unconditional strategy given by Eq. (5) is not a ZD strategy in the sense of [32] if R+P T +S. By substituting p DC = p DD in Eq. (45), e obtain hich leads to φ[(χ )κ+t χs] ( )p 0 φ[(χ )κ+t χs] = φ(χ )(κ P). = φ(χ )(κ P) ( )p 0, (07) If φ = 0, e substitute φ = 0 in the expression of p DD in Eq. (45) to obtain p DD = ( )p 0 /. This equation holds true if and only if p 0 = p DD = 0. Next, e substitute φ = 0 in the expression of p CC in Eq. (45) to obtain p CC = [ ( )p 0 ]/. This equation holds true if and only if p 0 = p CC =, hich contradicts p 0 = 0. Therefore, e obtain φ 0. Given φ 0, Eq. (07) implies χ = T P P S. (08) 6

17 By setting p CC = p CD in Eq. (45) and using φ 0, e obtain By combining Eqs. (08) and (09), e obtain χ = R S T R. (09) R+P = T +S. (0) Equation (0) is a sufficient condition for the unconditional strategy to be a ZD because substitution of Eqs. (49), (50), (0) and T R φ = () (T S)(R P) in Eq. (45) yields Eq. (5). Appendix E Case κ S +χ(t κ) = 0 In this section, e assume and derive the set of strategies that satisfy Eq. (6). By substituting Eq. (2) in Eq. (92), e obtain κ S +χ(t κ) = 0 (2) (χ )(κ P)[( )p 0 +p CD ] = 0. (3) Equation (2) does not allo χ = because substitution of χ = in Eq. (2) yields T = S, hich contradicts Eq. (2). Substitution of κ = P in Eq. (2) yields χ = (P S)/(T P). Alternatively, if e set ( )p 0 +p CD = 0, e obtain p 0 = p CD =. Therefore, e consider the folloing to subcases, i.e., subcase (A) specified by κ = P (4) and and subcase (B) specified by and E. Subcase (A): κ = P and χ = (P S)/(T P) By substituting Eqs. (4) and (5) in Eq. (9), e obtain χ = P S T P, (5) κ S +χ(t κ) = 0 (6) p 0 = p CD =. (7) ( )[ (p CD p DD )][ p 0 (T +S R P)+T +S 2P](T S) T P = 0. (8) Because T > P > S, 0 < <, and there exists no pair of p CD and p DD (0 p CD,p DD ) that satisfies p CD p DD = /, e obtain p 0 (T +S R P) = T +S 2P. (9) If e set T + S R P = 0, e obtain T + S 2P = R P > 0, hich contradicts Eq. (9). Therefore, Eq. (9) leads to T +S R P 0, and hence p 0 = T +S 2P T +S R P. (20) If T + S R P > 0, the condition p 0 applied to Eq. (20) yields R P, hich contradicts Eq. (2). Therefore, e obtain T +S R P < 0 and hence T +S 2P 0. By substituting Eqs. (4) and (5) in Eq. (96), e obtain ( )[ (p CD p DD )]{p 0 [R ( )(T +S)]+( p CC )(T +S) [2 ( 2)p 0 2p CC ]P}(T S) = 0. T P (2) 7

18 Because (p CD p DD ) > 0, Eq. (2) implies p 0 [R ( )(T +S)]+( p CC )(T +S) [2 ( 2)p 0 2p CC ]P = 0. (22) Substitution of Eq. (20) in Eq. (22) yields [ p CC (T +S R P)+T +S 2P](T +S 2P) T +S R P = 0. (23) We ill deal ith the case T +S 2P = 0 later in this section. Therefore, by assuming T +S 2P < 0, e obtain T +S 2P p CC = T +S R P. (24) By substituting Eqs. (4) and (5) in Eq. (93), e obtain {( )p 0 (R S T)+p DC R p CC (T +S) [2 ( )p 0 2p CC +p DC ]P +T +S}(T S) = 0. T P (25) By substituting p 0 = p CC = (T +S 2P)/(T +S R P) in Eq. (25), e obtain hich leads to [ p DC (T +S R P)+T +S 2P](P R)(T S) (T P)(T +S R P) p DC = By substituting Eqs. (4) and (5) in Eq. (03), e obtain = 0, (26) T +S 2P T +S R P. (27) [ p CD ( )p 0 ][ p DD (T +S R P)+T +S 2P](T S) T P = 0. (28) If p CD ( )p 0 = 0, e obtain p 0 = p CD =, hich contradicts Eq. (20). Therefore, Eq. (28) implies p DD = T +S 2P T +S R P. (29) To derive another condition, e use the vector u hen player Y adopts the tit-for-tat strategy, i.e., q = (,0,,0). This vector, denoted by u zd,00, is given by u zd,00 =(χ )κ + ( p CC )( 2 p DC )+ 2 p CD p DC ( )+(+)p DD ( p CC )+ 3 p CD p DD { [ ( pdd ) 2 p DC ( p CD )+ 3 ( p CD )(p DC p DD )]R 2 ( p CC )( p DC )P }(χ )+( p CC )[ ( p DD )](T χs) + 2 ( p CC )[p DC (p DC p DD )](S χt) { 2 ( p DC )[ ( )p CD p CC ]P p CD [ ( p DD )]R } (χ ) +[ ( )p CD p CC ][ ( p DD )](T χs)+( p CC )[ ( p DD )](S χt). { ( p DC )( p CC )P p CD [( )p DC +p DD ]R}(χ ) +( p CC )[ ( p DD )](T χs)+[( )p DC +p DD ]( p CC )(S χt) { { 2 p DC [ p CD ( )] p CC ( 2 p DC )}P 2 p CD p DD R } (χ ) + 2 p DD [ p CD ( ) p CC ](T χs)+p DD ( p CC )(S χt) (30) 8

19 Note that the denominator on the right-hand side of Eq. (30) is positive. By substituting Eq. (30) in Eq. (40), e obtain ( )p 0 {{ pcd R+[+( p CC ) 2 (p CC p CD )]P } (χ )+[ ( )p CD p CC ](T χs) +( p CC )(S χt)}+ { { + 2 p DC [ ( )p CD ]+p CC ( 2 p DC )}P 2 p CD p DD R } (χ ) + 2 p DD [ ( )p CD p CC ](T χs)+p DD ( p CC )(S χt) +{ 2 p DC +( ) 2 p CD p DC +p DD (++ 2 p CD ) p CC [ 2 p DC +(+)p DD ]}(χ )κ = 0. (3) By substituting κ = P and χ = (P S)/(T P) in Eq. (3), e obtain [( )p 0 +p DD ]{p CD [R ( )(T +S)]+( p CC )(T +S) [2 p CD 2(p CC p CD )]P}(T S) = 0. T P (32) By substituting p 0 = p CC = p DD = (T +S 2P)/(T +S R P) in Eq. (32), e obtain [ p CD (T +S R P)+T +S 2P][ (T +S 2P)+T +S R P](T +S 2P) (T +S R P) 2 = 0. (33) If (T +S 2P)+T +S R P = 0, Eq. (20) implies that = /p 0, i.e., = p 0 =, hich contradicts 0 < <. Because e decided to treat the case T +S 2P = 0 later, Eq. (33), implies p CD = T +S 2P T +S R P. (34) In sum, e obtain p 0 = p CC = p CD = p DC = p DD = (T + S 2P)/(T + S R P) if T + S 2P < 0. Substitution ofp 0 ineqs.(49)and(50) yieldsχ = (P S)/(T P)and κ = P, respectively, coincidingith the condition forsubcase(a). Therefore,the strategyp 0 = p CC = p CD = p DC = p DD = (T+S 2P)/(T+S R P), here T +S 2P < 0, is a special case of Eq. (5). Finally, let us consider the case T + S 2P = 0. By combining this condition ith Eq. (20), e obtain p 0 = 0. By substituting T +S 2P = 0 and p 0 = 0 in Eq. (25), e obtain (R P)p DC = 0, hich implies that p DC = 0. By substituting T +S 2P = 0 and p 0 = 0 in Eq. (28), e obtain ( p CD )(R P)p DD = 0, hich implies that p DD = 0. Because p 0 = p DC = p DD = 0, the focal player X never uses p CC and p CD. Therefore, p 0 = p DC = p DD = 0 specifies a strategy. By substituting p 0 = 0 in Eqs. (49) and (50) and using T+S 2P = 0, e obtain χ = (T P)/(P S) = (P S)/(T P) = and κ = P, respectively, coinciding ith the condition for subcase (A). Therefore, the strategy p 0 = p DC = p DD = 0 is a special case of Eq. (5). E.2 Subcase (B): κ S +χ(t κ) = 0 and p 0 = p CD = By substituting Eqs. (6) and (7) in Eq. (9), e obtain ( )(χ )[p DD (κ R) p CC (κ P)+(R P)+κ R] = 0. (35) Note that 0 < <. Because χ = is inconsistent ith κ S +χ(t κ) = 0, Eq. (35) yields p DD = p CC(κ P) (R P) (κ R) (κ R) (36) provided that κ R. We ill deal ith the case κ = R later in this section. By substituting Eqs. (6) and (7) in Eq. (93), e obtain hich yields (χ ){p DC (κ R) p CC (χ+)(κ T)+[R (χ+)t +χκ]+κ R} = 0, (37) p DC = p CC(χ+)(κ T) [R (χ+)t +χκ] (κ R) (κ R) provided that κ R. Therefore, e obtain ( p CC (χ+)(κ T) [R (χ+)t +χκ] (κ R) p = p CC,,, (κ R) (38) ) p CC (κ P) (R P) (κ R),p 0 =, (κ R) (39) 9

20 i.e., Eq. (53), as a necessary condition for the linear relationship beteen the payoff of the to players, i.e., Eq. (6). To verify that Eq. (53) is sufficient, e substitute Eq. (53) (i.e., Eq. (39)) in Eq. (36) to obtain ( u zd = 0, 0, ( )(χ )(κ R), ( )(χ )(κ R) ), (40) ( p CC ) ( p CC ) hich is independent of q. By combining Eqs. (7), (40), and p 0 =, e obtain v(0)u zd = 0, i.e., Eq. (37). Therefore, Eq. (53) is a solution that satisfies Eq. (6). ThestrategygivenbyEq.(53)isexpressedintheformofEq.(45)ifesetφ = ( p CC )/[(κ R)(χ )] (and use κ S + χ(t κ) = 0 and p 0 = ). As an example, e consider the repeated PD game defined by R = 3, T = 5, S = 2, P =, and = 0.8. We set κ = 2. Because this solution requires κ S+χ(T κ) = 0 (Eq. (6)), e obtain χ = 4/3. If e set p CC = 0, e obtain p = (0,, 3/4, 3/4) and p 0 =. This solution cannot be represented in the form of Eq. (43) because Eq. (43) requires κ S +χ(t κ) 0. Consistent ith this example, Eq. (45) combined ith φ = ( p CC )/[(κ R)(χ )], κ S +χ(t κ) = 0, and p 0 = yields χ < 0. This can be shon as follos. By substituting κ S +χ(t κ) = 0 and p 0 = in Eq. (45), e obtain Because p CC must hold true in Eq. (4), e obtain Because p DD 0 must hold true in Eq. (4), e obtain φ(χ )(R κ) p = + φ[(χ )κ+t χs]. (4) + φ(χ )(κ P) φ(χ )(R κ) 0. (42) φ(χ )(κ P) 0. (43) Given φ(χ ) 0 (section 3.2) and R > P, e find that P κ R must hold true for Eqs. (42) and (43) to be simultaneously satisfied. Therefore, using κ S +χ(t κ) = 0 e obtain χ = (κ S)/(T κ) < 0. Finally, let us consider the case κ = R. By substituting κ = R in Eq. (35), e obtain p CC (R P) = (R P), hich implies that p CC =. By combining this result ith Eq. (7), e obtain p 0 = p CC = p CD =, hich implies that player X never uses p DC and p DD. Therefore, p 0 = p CC = p CD = specifies a strategy. By substituting p 0 = in Eqs. (49) and (50), e obtain χ = (R S)/(T R) and κ = R, respectively, and the former equality coincides ith Eq. (7) hen κ = R. Therefore, the strategy p 0 = p CC = p CD = is a special case of Eq. (5). Appendix F Case T κ+χ(κ S) = 0 In this section, e assume and derive the set of strategies that satisfy Eq. (6). By substituting Eq. (44) in Eq. (93), e obtain T κ+χ(κ S) = 0 (44) (χ )(κ R)[( )p 0 +p DC ] = 0. (45) Equation (44) does not allo χ = because substitution of χ = in Eq. (44) yields T = S, hich contradicts Eq. (2). Substitution of κ = R in Eq. (44) yields χ = (T R)/(R S). Alternatively, if e set ( )p 0 + p DC = 0, e obtain p 0 = p DC = 0. Therefore, e consider the folloing to subcases, i.e., subcase (C) specified by κ = R (46) and χ = T R R S, (47) 20

21 and subcase (D) specified by and T κ+χ(κ S) = 0 (48) p 0 = p DC = 0. (49) F. Subcase (C): κ = R and χ = (T R)/(R S) By substituting Eqs. (46) and (47) in Eq. (94), e obtain ( )[ (p CC p DC )][ p 0 (T +S R P)+R P](T S) R S = 0. (50) Equation (50) does not hold true because 0 < <, (R P) p 0 (T +S R P) 0 due to Eq. (0), and (p CC p DC ) > 0. Therefore there is no solution in this case. F.2 Subcase (D): T κ+χ(κ S) = 0 and p 0 = p DC = 0 By substituting Eqs. (48) and (49) in Eq. (9), e obtain (χ )[p DD (κ S)(χ+)+( p CD )(κ P)] = 0. (5) We obtain χ because χ = substituted in Eq. (48) yields T = S, hich contradicts Eq. (2). Therefore, Eq. (5) implies p CD = p DD(χ+)(κ S)+κ P (52) (κ P) provided that κ P. We ill deal ith the case κ = P later in this section. By substituting Eqs. (48) and (49) in Eq. (94), e obtain Because 0 < < and χ, e obtain (χ )( )[p DD (κ R)+( p CC )(κ P)] = 0. (53) p CC = p DD(κ R)+κ P (κ P) (54) provided that κ P. Therefore, e obtain ( pdd (κ R)+κ P p =, (κ P) ) p DD (χ+)(κ S)+κ P, 0, p DD, p 0 = 0, (55) (κ P) here 0 p DD is a necessary condition for the linear relationship beteen the payoff of the to players, i.e., Eq. (6). In fact, e substitute p CD given by Eq. (55) in p CD given by Eq. (43) and use Eqs. (48) and (49) to find that p CC, p DC, p DD given by Eq. (43) coincide ith those given by Eq. (55). Therefore, Eq. (55) is a special case of ZD strategies given by Eq. (43). Finally, let us consider the case κ = P. By substituting κ = P in Eq. (53), e obtain p DD (R P) = 0, hich implies that p DD = 0. By combining this result ith Eq. (49), e obtain p 0 = p DC = p DD = 0, hich implies that player X never uses p CC and p CD. Therefore, p 0 = p DC = p DD = 0 specifies a strategy. By substituting p 0 = 0 in Eqs. (49) and (50), e obtain χ = (T P)/(P S) and κ = P, respectively, and the former equality coincides ith Eq. (5) hen κ = P. Therefore, the strategy p 0 = p DC = p DD = 0 is a special case of Eq. (5). 2

22 Appendix G Minimum discount rate for χ < 0 G. ZD strategies ith κ = P Let us consider Eq. (56) under φ < 0 and χ <. In this case, e obtain Eqs. (57), (58), and (59), but ith all the inequalities flipped (i.e., instead of ). Then, e obtain Equations (56) (59) yield (χ ) R P P S +χ T P P S +χt P P S, (56) (χ )R P P S, (57) χ+ T P (χ )R P P S P S, (58) χ+ T P +χt P P S P S. (59) χ P S +( )(R P) T R+(R P) < 0, (60) [(R P) ( )(T P)]χ R P +( )(P S), (6) [(R P) ( )(P S)]χ (R P)+( )(T P), (62) [ (P S)+(T S)]χ (P S)+( )(T P), (63) respectively. When is sufficiently large, the coefficients of χ on the left-hand sides of Eqs. (6), (62), and (63) are positive. In this situation, Eqs. (60) (63) are satisfied by a sufficiently negative large χ(< 0). This result is consistent ith the previously obtained result [32]. G.2 ZD strategies ith κ = R In this section, e examine Eqs. (75), (76), and (77) under the assumption that χ < 0. First, because dg 2 /dχ > 0, g 2 is discontinuous at χ = (R S)/(T R), and g 2 < 0 for (R S)/(T R) < χ < 0, Eq. (76) is equivalent to χ < R S (64) T R if (T R)/(T P) and R S (P S) (T R)+(T P) < χ < R S T R if < (T R)/(T P). Second, using Eq. (64), dg /dχ > 0, and that g is discontinuous at χ = (R S)/(T R), e find that Eq. (64) implies Eq. (75) if (T R)/(T S) and that Eq. (75) is equivalent to (65) R S (T S) (T R)+(T S) < χ < R S T R (66) if < (T R)/(T S). Third, because d(g 2 /g )/dχ > 0, g 2 /g is discontinuous at χ = (T R)/(R S), and g 2 /g < 0 for (T R)/(R S) < χ < 0, Eq. (77) is equivalent to χ < T R R S (67) if (P S)/(R S) and T P (T R) R χ < T (P S)+(R S) R S if < (P S)/(R S). To summarize these results, if c, generous strategies ith ( χ < min R S ) R, T < (69) T R R S 22 (68)

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013 n-person February 25, 2013 n-person Table of contents 1 Equations 5.4, 5.6 2 3 Types of dilemmas 4 n-person n-person GRIM, GRIM, ALLD Useful to think of equations 5.4 and 5.6 in terms of cooperation and