1 AUTOCRATIC STRATEGIES

Size: px

Start display at page:

Download "1 AUTOCRATIC STRATEGIES"

Anne Osborne
5 years ago
Views:

1 AUTOCRATIC STRATEGIES. ORIGINAL DISCOVERY Recall that the transition matrix M for two interacting players X and Y with memory-one strategies p and q, respectively, is given by p R q R p R ( q R ) ( p R )q R ( p R )( q R ) M = p S q T p S ( q T ) ( p S )q T ( p S )( q T ) p T q S p T ( q S ) ( p T )q S ( p T )( q S ). (.) p P q P p P ( q P ) ( p P )q P ( p P )( q P ) M is a stochastic matrix (all rows sum up to ) and hence has an eigenvalue of. The stationary vector v of M satisfies v M = v or, equivalently v M = 0 (.2) with M = M I where I denotes the identity matrix. Using v we can immediately derive the average payoffs per round, π X and π Y, in an infinitely iterated game for players X and Y. All we need is to rewrite the payoff matrix for each player as a vector: P X = (R, S, T, P ) and P Y = (R, T, S, P ), respectively, to obtain π X = v P X v π Y = v P Y v. (.4) Note that the division by v, with = (,,, ), is only necessary in case v was not normalized, i.e. its elements v i did not sum up to. In the following we now set out to establish an interesting relationship between determinants derived from M and its stationary distribution v with rather unexpected consequences. For the adjugate of the matrix M, we have adj(m )M = det(m )I = 0 (.5) where the first equality reflects a property of the adjugate matrix, whereas the second equality follows because M is singular. Note that each element a ij of adj(a) is given by a ij = ( ) i+j m ji where m ij is the (i, j)- minor of A, i.e. the determinant of A after removing row i and column j. Because of Eq. (.2), every row of adj(m ) must be proportional to v and hence we can find v based on determinants of M. Since det(a) is invariant to adding one column of A to another one, we can rewrite M to obtain p R q R p R q R f M = p S q T p S q T f 2 p T q S p T q S f 3. (.6) p P q P p P q P f 4 (.3)

2 2 CHAPTER. AUTOCRATIC STRATEGIES by adding the first column to the second and third one, as well as replacing the last column by an arbitrary vector f, without affecting the derivation of v based on the last row of adj(m ). More specifically, the entries a 4j in the last row of adj(m ) are the determinants of the 3 3 matrices after deleting the last column and row j from M. Although every row of adj(m ) is proportional to v, the last row does not depend on the last column of M and hence is the same for adj(m ) regardless of the choice of f. With this, we have established a link between the stationary distribution v and the determinant of M via the adjugate adj(m ): det(m ) = a 4, f + a 4,2 f 2 + a 4,3 f 3 + a 4,4 f 4 = v f. (.7) The reason why this exercise is worthwhile and will turn out to be rather rewarding is due to the particular form of M. More specifically, note that (i) the second column of M is under the sole control of player X; (ii) the third column is similarly under the sole control of player Y ; and, finally, (iii) the last column is an arbitrary vector f that ends up in the dot-product with the stationary vector v. For convenience let us define a new function D(p, q, f) := det(m ), which also highlights the fact that M is a function of the strategies, p and q, of the two players X and Y, as well as the arbitrary vector f. Since f is arbitrary, we can actually set it to the payoff vectors P X and P Y to obtain the average payoffs for each player: π X = v P X v = D(p, q, P X) D(p, q, ) π Y = v P Y v = D(p, q, P Y) D(p, q, ). (.9) Because π X and π Y are linear in P X and P Y, respectively, we can also form a linear combination of the payoffs: απ X + βπ Y + γ = D(p, q, αp X + βp Y + γ) D(p, q, ) (.8) (.0) for some α, β, γ R. Interestingly and notably after over 50 years of research in game theory and the prisoner s dilemma in particular, the rather technical Eq. (.0) caused quite a stir in the scientific community because it enables players to unilaterally exert an unexpected level of control over iterated interactions. This happens because either player can unilaterally set the right-hand-side of Eq. (.0) to zero. Player X achieves this by choosing a strategy p R = αr + βr + γ + p S = αs + βt + γ + p T = αt + βs + γ p P = αp + βp + γ (.) (.2) (.3) (.4) for suitable α, β, γ R such that p i [0, ] and player Y can accomplish the same feat by choosing his strategy q analogously. Either case yields απ X + βπ Y + γ = 0 (.5) and hence either player can unilaterally enforce a linear relationship between his or her payoff and the one of the opponent. Because these strategies set a determinant in Eq. (.0) to zero, this class of strategies was termed zero-determinant strategies or ZD-strategies, for short. The level of control that these zero-determinant strategies offer was previously thought to be impossible. The essential feature of zero-determinant strategies is not so much the technical aspect that they render a particular determinant in Eq. (.0) zero but rather that they unilaterally enforce the linear payoff relation in Eq. (.5). In order to emphasize the latter, we name those strategies in the following autocratic strategies because players adopting such strategies gain an unprecedented degree of control over interactions.

3 .2. MORE TRADITIONAL DERIVATION 3.. EXAMPLES Let us now explore different particularly interesting scenarios where player X adopts an autocratic strategy. SET OPPONENT S PAYOFF The first interesting case arises when setting α = 0, β 0 and hence π Y = γ/β. The corresponding strategy enables player X to fix the payoff of player Y completely independent of the strategy of player Y. As we will see below, this even remains true if player Y has an arbitrarily long memory and employs more sophisticated strategies than just based on the previous interaction as for memory-one strategies. However, player X is not completely free in setting her opponent s payoff because p needs to be a probabilistic strategy, i.e. p i [0, ]. As a result, P π Y R must hold. These strategies are traditionally termed equalizers because they result in equal payoffs regardless of player Y s actions. Exercise: Show that P π Y R. In order to see this, note that from p R follows βr + γ 0 and similarly from p P 0 requires βp + γ 0. Since R P, without loss of generality, we establish β 0 and hence, using the same two inequalities, we find R γ/β and P γ/β or P π Y R, as required. SET OWN PAYOFF Choosing α 0, β = 0 allows player X to set her own score to π X = γ/α, or, does it? Exercise: () Find range of possible π X. (2) Does it depend on the game, i.e. ranking of R, S, T, P? (3) Discuss features of the resulting strategies. EXTORTIONERS.2 MORE TRADITIONAL DERIVATION The discovery of autocratic strategies by? makes an impressive example of how the unbiased view of outsiders can result in original discoveries and spark novel lines of research. This also explains their ingenious but rather unusual approach to a game theoretical problem. For this reason the more traditional and direct approach to autocratic strategies followed only afterwards by?. Alternate and more direct way of showing that autocratic strategies enforce a linear relation between the payoffs of player X and Y Definition.. An autocratic strategy p for player X in infinitely iterated 2 2 games is given by p R = αr + βr + γ + p S = αs + βt + γ + p T = αt + βs + γ p P = αp + βp + γ (.6) (.7) (.8) (.9) for some α, β, γ R such that p i [0, ]. Theorem.. If player X uses an autocratic strategy then απ X + βπ Y + γ = 0 (.20) where π X and π Y denote the average payoff per round to players X and Y, respectively, in the limit of infinitely many rounds.

4 4 CHAPTER. AUTOCRATIC STRATEGIES Proof. For the proof we need to first introduce some notation. Let π X (n) and π Y (n) denote the payoffs to player X and Y in round n; s i (n) the probability that player X experiences outcome i {R, S, T, P }; and q i (n) the conditional probability that player Y plays C in round n + given outcome i in round n. With this we can write the probability that player X cooperates in round n + as p C (n + ) = s R (n + ) + s S (n + ) = s(n) p, (.2) with s(n) = (s R (n), s S (n), s T (n), s P (n)) and p the strategy of player X. The second equality in Eq. (.2) follows from s R (n + ) = s R (n)p R q R (n) + s S (n)p S q T (n) + s T (n)p T q S (n) + s P (n)p P q P (n) (.22) s S (n + ) = s R (n)p R ( q R (n)) + s S (n)p S ( q T (n)) + s T (n)p T ( q S (n)) + s P (n)p P ( q P (n)). (.23) Note that when summing s R (n + ) + s S (n + ) all terms involving q i (n) cancel and a simple dot-product remains. This is the reason that no assumptions regarding the strategy of player Y were necessary. If player X uses an autocratic strategy, see Eq. (.6), we obtain p C (n + ) = s(n) (αr + βr + γ +, αs + βt + γ +, αt + βs + γ, αp + βp + γ) (.24) = s(n) (αp X + βp Y + γ + g), (.25) with = (,,, ) and g = (,, 0, 0). Note that g represents a stubborn strategy that ignores the opponent s moves and continues with whatever it started with. Let us now consider w(n) := p C (n + ) p C (n) = s(n) p s R (n) s S (n) (.26) = s(n) (αp X + βp Y + γ) (.27) and determine the average of w(n) in the limit of infinitely many rounds. The left-hand-side is straight forward and we obtain LHS: lim N N w(n) = lim N N (p C(N) p C (0)) = 0 (.28) because p C (n) is bounded. For the right-hand-side we get RHS: lim N N s(n) (αp X + βp Y + γ) = (αp X + βp Y + γ) lim N N s(n) (.29) but the limit is just the (normalized) stationary probability distribution v for the four outcomes R, S, T, P with v =. Thus, we find that (αp X + βp Y + γ) v = απ X + βπ Y + γ, (.30) which equals zero to match the left-hand-side. Note that if player Y also adopts a memory-one strategy then v is simply the stationary state of the Markov chain given by the transition matrix, Eq. (.).

5 .3. GENERALIZATIONS 5.3 GENERALIZATIONS.3. DISCOUNTED INTERACTIONS.3.2 ARBITRARY NUMBER OF PLAYERS.3.3 ARBITRARY STRATEGIES

16.1 L.P. Duality Applied to the Minimax Theorem

16.1 L.P. Duality Applied to the Minimax Theorem CS787: Advanced Algorithms Scribe: David Malec and Xiaoyong Chai Lecturer: Shuchi Chawla Topic: Minimax Theorem and Semi-Definite Programming Date: October 22 2007 In this lecture, we first conclude our