Applications of Game Theory to Social Norm Establishment. Michael Andrews. A Thesis Presented to The University of Guelph

Size: px

Start display at page:

Download "Applications of Game Theory to Social Norm Establishment. Michael Andrews. A Thesis Presented to The University of Guelph"

Aleesha Porter
5 years ago
Views:

1 Applications of Game Theory to Social Norm Establishment by Michael Andrews A Thesis Presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Mathematics and Statistics Guelph, Ontario, Canada c Michael Andrews, December, 212

2 ABSTRACT APPLICATIONS OF GAME THEORY TO SOCIAL NORM ESTABLISHMENT Michael Andrews University of Guelph, 212 Advisors Dr. Monica Cojocaru Dr. Edward Thommes We create pure strategy versions of Robert Axelrod s well known norms and metanorms games. To analyze the evolutionary behaviour of these games, we utilize replicator dynamics complemented with agent based model simulations. Our findings show that the only evolutionarily stable strategy in the norms game is one in which a player defects and is lenient. The metanorms game, however, has two evolutionarily stable strategies. The first is a repeat from the norms game, that is, a player defects and is always lenient. The other is one in which a player follows the norm and punishes those who are lenient and those who defect. We also introduce the concept of providing an incentive for players to play a certain strategy in our controlled norms game. This particular game has two evolutionarily stable strategies. In the first, a player follows the norm, while in the second, a player does not. We wish to transition the population of players from a state in which the majority of players initially do not follow the norm to one in which the majority of players do. During this transition, we look to minimize the total use of our incentive. We also utilize agent based model simulations to explore the effect of imposing simple network connections and heterogeneity onto a population of agents playing these games.

3 Acknowledgments Foremost, I would like to thank my supervisors, Monica Cojocaru and Edward Thommes. Their advice and guidance has been instrumental to my work at the University of Guelph. I would also like to thank my family, who have always been supportive of my academic endeavours. iii

4 Contents List of Figures vi 1 Introduction and Literature Review Evolutionary Game Theory, Agent Based Models, and Social Norms Overview Preliminaries Evolutionary Game Dynamics Evolutionary Dynamics and Game Equilibria Replicator Dynamics Dynamical Systems and Equilibrium Points Lyapunov Functions and Stability Linearization Positive Definite Matrices and Monotonicity Control Theory Evolutionary Norms Game Social Norms Evolutionary Game Replicator Dynamics of the Norms Game Stability of the Norms Game Equilibria iv

5 3.3 ABM Simulations of the Norms Game Axelrod s Simulation of the Norms Game Pure Strategy Simulation of the Norms Game Chapter Summary Evolutionary Metanorms Game Replicator Dynamics of the Metanorms Game Stability of the Metanorms Game Equilibria Simulation of the Metanorms Game Axelrod s Simulation of the Metanorms Game Pure Strategy Simulation of the Metanorms Game Chapter Summary Utilizing Controls to Establish a Norm Controlled Norms Game Replicator Dynamics and Stability for the Controlled Norms Game Optimal Control for the Controlled Norms Game Agent Based Models for the Controlled Norms Game Conclusion 68 Bibliography 71 v

6 List of Figures Norms Game Vector field of our system (3.2.3) within the feasible region (3.2.4). The system will evolve to a state where we see no hypocrites, that is, players using strategy DP Vector field of our system with S DP = and N = Replicating Axelrod s norms game simulation with a population size of Replicating Axelrod s norms game simulation with a population size of Pure strategies simulation of the norms game with a population size of Pure strategies simulation of the norms game with a population size of Axelrod type simulation of the norms game with N = 2. This example shows the possibility of a population evolving to much different states over the same generation span (5) when starting with an average vengefulness and boldness of 1. We note that all simulations will still collapse to B = 1, V =, given enough time Pure strategies simulation of the norms game with N = 2. This shows an example of a population transitioning from norm followers to lenient defectors Metanorms game vi

7 4.2 Vector field of our system (4.1.2) within the feasible region (3.2.4). We see that the system again evolves to a state where there are no hypocrites. Also, there is a strong attraction to the point V = 1, B =, S DP = Vector field of our metanorms system with S DP = and N = Replicating Axelrod s metanorms simulation with a population size of Replicating Axelrod s metanorms simulation with a population size of Pure strategies simulation of the metanorms game with a population size of Pure strategies simulation of the metanorms game with a population size of Controlled norms Game. A player can choose to follow the norm (strategy F ), or be apathetic towards the norm (strategy A) Optimal control problem for two different values of I Optimal control problem for two different values of u max Cumulative usage of u for given combinations of u max and cutoff fractions Visualization of agents strategies over time for our optimal control problem with 8 interaction partners Total payout for our optimal control problem with 8 random interaction partners Total rounds for our optimal control problem with 8 random interaction partners Total payout for our optimal control problem for 8 neighbour interaction partners Total rounds for our optimal control problem for 8 neighbour interaction partners vii

8 5.1 Total payout for our optimal control problem with 8 random interaction partners with heterogeneous gratification Total rounds for our optimal control problem with 8 random interaction partners with heterogeneous gratification viii

9 Chapter 1 Introduction and Literature Review Game theory is a branch of mathematics that studies the strategic interaction of competing agents taking part in a contest. There are many different applications of game theory, and we proceed to discuss the three main approaches that will be used in this thesis. 1.1 Evolutionary Game Theory, Agent Based Models, and Social Norms The goal of evolutionary game theory is to model the behaviour of an evolving population of players from generation to generation. Much like classical game theory, an evolutionary approach involves a player employing a chosen strategy in some contest against one or more adversaries. The rules of this contest, or game, dictate the payoffs each player will receive when these strategies are played against one another. In classical game theory, players are thought of as rational decision makers. This means that they will choose their strategy based on an evaluation of possible outcomes. For example, in a two-player game, each player will analyze their opponent s possible strategy 1

10 choices before selecting their own most suitable strategy. On the contrary, evolutionary games disregard this rationality requirement. In fact, players will not choose a strategy themselves, but are simply born with one. Also, with an evolutionary approach, we now consider the payoff of a game to be associated with a player s fitness. A player s fitness is related to their ability to reproduce, and consequently have their strategy carried on into the next generation. In an evolutionary game, we are concerned with a player s strategy and fitness. The players who have the best fitness are selected to have offspring with identical strategies to their own, whereas the players with a lower fitness have either fewer offspring or are removed from the population altogether. In addition, there is mutation introduced to the population each generation, where players are stochastically chosen to have their strategies altered from what they were originally given. With an evolutionary model, a strategy is considered successful if it performs well against both other strategies and itself. Consider a population of players and their respective strategies. If no player will benefit from changing their strategy while all other players strategies remain the same, then this set of strategies adopted by the population is said to be a Nash equilibrium [3, 31]. More specifically, the population is in a Nash equilibrium if each players strategy is a best response to the strategies of every other player. It is important to note that a Nash equilibrium is not necessarily the set of strategies that obtains the maximum fitness for each player. Another important concept of an evolutionary game is that of an evolutionarily stable strategy (ESS), introduced by Maynard-Smith and Price [36,37]. An ESS is a Nash equilibrium that survives through evolution in the population. That is, the mutation introduced each generation will not successfully invade the population that is playing this particular 2

11 strategy. In the case of smaller populations, Fogel et al. argue that an ESS may not be a particularly robust concept [7]. In fact, population states may not evolve to stable states at all. Agent based models (ABM) have become an increasingly popular modelling method amongst various scientific disciplines in recent years [14, 33]. Some such disciplines include economics and finance [39], ecology [16], and, as seen in this thesis, sociology and political science [3, 15]. Unlike differential equation models, ABMs are able to readily introduce heterogeneity into individual attributes and populations [33]. In a typical ABM, a population consists of multiple agents who behave individually. In addition, each agent may have unique characteristics that sets their behaviour apart from the rest of the population. In a social setting, a norm can be defined as an established set of rules or behaviours that individuals are expected to follow, and be punished for not following [2]. Recently, the concept of social norms has become an increasingly popular topic, given that they are an essential part of group living [4, 9, 1, 35]. Thus, studying these norms may allow us to understand group behaviour on a more fundamental level [2]. Axelrod first introduced a game-theoretic approach to the social sciences [1], and in his well cited paper [2], he constructs two n-player evolutionary games that seek to mimic the establishment of norms. More recently, these games have been subject to rigorous testing using ABM simulations. For examples of this, see Mahmoud et al. [27], and Galan and Izquierdo [14]. In our work, we recreate Axelrod s norm establishment games using a pure strategy analytical approach. This allows us to more clearly illuminate the population dynamics associated with these two evolutionary games. In addition, we propose a method of norm establishment from a gametheoretic perspective, and complement our analysis of these topics with ABM simulations. 3

12 1.2 Overview We begin in Chapter 2 with a review of the background mathematical theory used in our analysis found in Chapters 3, 4, and 5. Thus, this thesis is mostly self contained. In Chapters 3 and 4, we discuss Axelrod s well known article, An Evolutionary Approach to Norms [2]. Here, we provide an analytical approach to study his norms and metanorms games. Also, we recreate Axelrod s original ABM simulations as well as present alternate versions based on our construction of the games. Chapter 5 introduces the concept of using an incentive to entice players into playing a certain strategy for a given game. This is done by adding a control function to the game s payoffs. We present a more simple norms game and seek to establish a dominant strategy by utilizing a control. During this time, we wish to minimize the use of our control, or incentive, to persuade a population into playing a desired strategy. Finally, we construct simulations that include control functions in a network setting and compare the results with those obtained in our non spatial versions. We conclude with a brief summary of results and discuss possible future work. 4

13 Chapter 2 Preliminaries Our analysis of games in this thesis relies strongly on systems of differential equations. In this chapter, we review the relevant and well established background pertaining to evolutionary games and dynamical systems. 2.1 Evolutionary Game Dynamics Evolutionary Dynamics and Game Equilibria Games are often represented in normal form. A game presented this way conveys each player s utility for every strategy that can be employed [23]. In a normal form game with m players, each player i is associated with a pure strategy space S i and a payoff function π i = m i=1s i R, where the product space π i = m i=1s i contains all possible combinations of the players strategies [5, 34]. Consider a game with two players, the payoffs for each player are given by the n m matrices A and B, with n and m as the cardinalities of the sets of pure strategies. Our interests lie in discussing a population of players competing against each other. In 5

14 this population setting, we wish to ensure that all players are interchangeable. That is, all players have the same strategy set and payoffs. This is known as a symmetric game [11]. Definition (Symmetric Game) In a game with n players, each player i has a set of strategies S i (s i S i ) and a payoff function π i (s 1,..., s i ). This normal form game is symmetric if the players have identical strategy spaces, S 1 = S 2 = = S n, and π i (s i, s i ) = π j (s j, s j ), for s i = s j and s i = s j for all i, j {1,..., n}. We note that s i denotes all the strategies in s except for s i. Now, every player has an identical n n payoff matrix, A. That is, A = B T. Players may also choose to play mixed strategies by playing each pure strategy s i S with some probability. A mixed strategy is denoted by the vector x = (x 1, x 2,..., x n ) T, which is an element of the simplex S n R n. S n = {x R n n x i = 1, x i i}. i=1 Here, S i is spanned by the unit vectors (pure strategies) {e 1,..., e n }. The expected payoff of a player using a mixed strategy x against a player using mixed strategy y is given by n n P r(s i )P r(s j )π(s i, s j ) = x T Ay. i=1 j=1 When two strategies are played against one another, one strategy is said to be a best response to another if it produces the highest payoff against it compared to any other strategy. That is, Definition (Best Response) The strategy x S n is said to be a best response to y S n if z T Ay x T Ay 6

15 for all z S n. Our previous discussion of a Nash equilibrium in a symmetric game will now be defined formally [5, 18]. Definition (Nash Equilibrium) For strategies x S n and z S n, x is a Nash equilibrium if z T Ax x T Ax for all z. It is called a strict Nash equilibrium if equality holds only for z = x. We see that the strategy x is a Nash equilibrium if it is a best response to itself Replicator Dynamics The selection process of the strategies in a population can be described by replicator equations [38]. Replicator equations assume that the rate of growth of individuals in a population playing a certain strategy is proportional to the total payoff received by these individuals. We write this in the following way ẏ i = y i π i (x) where y i is the number of players employing strategy e i, x S n is the strategy state of the population, and π i (x) is the payoff an individual will receive when playing strategy e i. We now consider the change in the proportion of the population playing a certain strategy. 7

16 That is, we let x i = y i n j=1 y j. Then n ẋ i = ẏi j=1 y j y n i j=1 ẏj ( n j=1 y j) 2 n = x i π i (x) x i x j π j (x) j=1 = x i (π i (x) π(x)) (2.1.1) n π = x j π j (x) j=1 where i = 1,..., n and π is the average payoff of the population in state x. The differential equation system (2.1.1) is known as the aforementioned replicator equations, introduced by Taylor and Jonker [38]. If we have a linear π, then there exists an n n payoff matrix A such that π i (x) = (Ax) i. Now, (2.1.1) can take the form ẋ i = x i ((Ax) i x T Ax) (2.1.2) Similarly, (Ax) i is the expected payoff of an individual of type i, and x T Ax is the average payoff of the population in state x [18]. The dynamical system (2.1.1) can be analyzed to gain insight on the behaviour of the population s evolution of strategy choices for a given game. However, this system can be difficult to solve analytically. Nevertheless, we can analyze it in terms of stability using methods from differential equations. Next, we wish to find all equilibrium points of the system (2.1.2). Let us denote one such point as ˆx. The folk theorem of evolutionary game theory relates these points with Nash equilibria in the game. For this theorem, we follow Hofbauer and Sigmund [17]. Theorem (Folk Theorem of Evolutionary Game Theory) (a) If ˆx S n is a Nash equilibrium of the game described by payoff matrix A, then ˆx is an 8

17 equilibrium point of (2.1.2). (b) If ˆx is the ω-limit of an orbit x(t) in int(s n ), then ˆx is a Nash equilibrium. (c) If ˆx is Lyapunov stable, then it is a Nash equilibrium. Proof. (a) If ˆx is a Nash equilibrium, then there exists a constant c such that (Aˆx) i = c for all i with ˆx i >. Hence ˆx satisfies the equilibrium criteria for a point in the face spanned by the e i with i supp(ˆx). (b) Let us assume that x(t) S n converges to ˆx, but that ˆx is not a Nash equilibrium. Then there exists an i and an ɛ > such that e T i Aˆx ˆx T Aˆx > ɛ, and hence such that ẋ i /x i > ɛ, for t sufficiently large, which is impossible. (c) Suppose that ˆx is not a Nash equilibrium. Then there exists an i and an ɛ > such that (Ax) i x T Ax > ɛ for all x in a neighbourhood of ˆx. For such x, the component x i increases exponentially, which contradicts the Lyapunov stability of ˆx. Every interior equilibrium point is a Nash equilibrium [18]. At a boundary equilibrium point ˆx, (Aˆx) i ˆx T Aˆx is an eigenvalue for the Jacobian of the replicator equation, and hence an equilibrium point ˆx is a Nash equilibrium if and only if all its transversal eigenvalues are non-positive [18]. We wish to determine which Nash equilibrium strategies are ESS. If we consider a game s payoff matrix, an ESS is defined in the following way [38, 4]. Definition (ESS) If x and y are strategies and S n is the set of all strategies in an evolutionary game, then a strategy x S n is an ESS if for every y S n, y x : π(x, x ) π(y, x ); and if π(x, x ) = π(y, x ), then: 9

18 π(x, y) > π(y, y). When we relate ESS to the continuous replicator equations (2.1.1), the following theorem proves useful [41]. Theorem In any continuous time evolutionary game, every ESS strategy is asymptotically stable. The proof of this theorem is given in Taylor and Jonker [38]. 2.2 Dynamical Systems and Equilibrium Points In the next three sections, we look at some general definitions surrounding dynamical systems. (See Perko and Wiggins [32, 42], which we closely follow). Consider a system of differential equations ẋ = f(x), (2.2.1) where x R n is a state vector, = d, and t is time. An equilibrium point of this system dt is a vector ˆx that satisfies f(ˆx) =. Equilibrium points may also be referred to as a fixed point, critical point, or zero Lyapunov Functions and Stability Let us consider the following dynamical system in two dimensions ẋ =f(x, y) ẏ =g(x, y) (x, y) R 2, (2.2.2) 1

19 which has a stable equilibrium point (ˆx, ŷ). We define a C 1 continuous function V (x, y), where V : R 2 R 1 and V (ˆx, ŷ) =. The following theorem describes the stability properties of an equilibrium point ˆx, with respect to a function V. Theorem Consider the following differential equation ẋ = f(x), x R n. Let ˆx be an equilibrium point of this equation and let V : U R be a C 1 function defined on some neighbourhood U of ˆx. ˆx is stable if conditions i and ii hold, and asymptotically stable if conditions i and iii hold. i) V (ˆx) = and V (x) > if x ˆx. ii) V (x) in U {ˆx}. iii) V (x) < in U {ˆx}, Here, V is called a Lyapunov function Linearization Generally, we wish to determine the behaviour of a system near its equilibrium points. For example, let us consider x = ˆx + y. Substituting this into our differential equation system (2.2.1) and performing a Taylor expansion around ˆx gives ẋ = ˆx + ẏ = f(ˆx) + J (f(ˆx))y + O( y 2 ). 11

20 Here, J (f) is the Jacobian matrix associated with f and is a norm on R n. We now use the relation ˆx = f(ˆx) to obtain ẏ = J (f(ˆx))y + O( y 2 ). We wish to understand the behaviour around a hyperbolic equilibrium point ˆx. This can be determined by the behaviour of the associated linear system ẏ = J (f(ˆx))y. First, we define a hyperbolic equilibrium point. Definition (Hyperbolic Equilibrium Point) Let x = ˆx be an equilibrium point of ẋ = f(x), x R n. Then ˆx is called a hyperbolic equilibrium point if none of the eigenvalues of J (f) have real part. We can now classify the stability of equilibrium points based on the signs of the real parts of the eigenvalues of J (f). Definition (Stability) An equilibrium point ˆx is called a sink if all of the eigenvalues of the matrix J (f(ˆx)) have negative real part; it is called a source if all of the eigenvalues of J (f(ˆx)) have positive real part; and it is called a saddle if J (f(ˆx)) has at least one eigenvalue with positive real part and one with negative real part. Furthermore, we are interested in a sink s type of stability. Theorem Suppose all the eigenvalues of J f(ˆx) have negative real part. Then the equilibrium solution x = ˆx of the differential equation system (2.2.1) is locally asymptotically stable. 12

21 2.3 Positive Definite Matrices and Monotonicity Some stability analysis in this thesis will make use of positive definite matricies and monotonicity. Definition (Positive Definite Matrix) An n n matrix M(x) whose elements m ij (x); i = 1... n; j = 1... n, are functions defined on the set S R N, is said to be positive semidefinite on S if v T M(x)v, v R N, x S. It is said to be positive definite on S if v T M(x)v >, v, v R N, x S. Positive definite matrices are closely related to inner products of vector spaces, which will be observed within this section. We now look at the concept of monotonicity of a vector function F [6, 21, 28]. Definition (Monotonicity) F(x) is said to be locally monotone at ˆx if there is a neighbourhood N (ˆx) of ˆx such that (F(x) F(ˆx)) T, x ˆx, x N(ˆx). F(x) is monotone at ˆx if the above inequality holds true for all x K, where K is a set. F(x) is said to be monotone if the above inequality holds for all x, ˆx K. Definition (Strict Monotonicity) F(x) is said to be locally strictly monotone at ˆx if there is a neighbourhood N (ˆx) of ˆx such that (F(ˆx) F(ˆx)) T, x ˆx >, x N(ˆx), x ˆx. 13

22 F(x) is strictly monotone at ˆx if the above inequality holds true for all x K. F(x) is said to be strictly monotone if the above inequality holds for all x, ˆx K, x ˆx. Now we will relate monotonicity with positive definiteness using the following theorem. The proof utilizes the Mean Value Theorem and is given by Nagurney [28]. Theorem Suppose F(x) is continuously differentiable on K and the Jacobian matrix F 1 F x x n J (F(x)) =.. F n F x 1... n x n is positive semidefinite (positive definite). Then F(x) is monotone (strictly monotone). Proof. For all x 1, x 2 K, let φ(t) = F(x 2 + t(x 1 x 2 )) T, x 1 x 2 ; t 1. (2.3.1) Then φ(t) is continuously differentiable on [, 1] and φ(1) φ() = F(x 1 ) T, x 1 x 2 F(x 2 ) T, x 1 x 2 = (F(x 1 ) F(x 2 )) T, x 1 x 2. (2.3.2) By the Mean Value Theorem, there exists some θ [, 1], such that φ(1) φ() = φ (θ) (1 ) = (x 1 x 2 ) T J (F(x 2 + θ(x 1 x 2 ))) (x 1 x 2 ) = (x 1 x 2 ) T J (F(x)) (x 1 x 2 ), (2.3.3) where x = x 2 + θ(x 1 x 2 ) K. Letting v = x 1 x 2, since J (F(x)) is positive definite 14

23 the expression in (2.3.3) must be. Hence, (F(x 1 ) F(x 2 )) T, x 1 x 2, that is F(x) is monotone. 2.4 Control Theory At times we wish to influence the behaviour of a dynamical system. This can be done by introducing an input called a control, which is used to obtain a desired output of a given system. A branch of control theory, called optimal control, wishes to achieve a desired output of a system, while also minimizing (or maximizing) a given cost functional. We first briefly review the calculus of variations and Hamiltonian dynamics before the introduction of a fundamental aspect of optimal control, Pontryagin s maximum principle. We closely follow the definitions given by Fleming, Knowles, and Macki [12, 22, 26]. We Let L : R n R n R, L = L(x, v) be called the Lagrangian. Also, we introduce T >, x, x 1 R n. We generally seek to solve the problem of finding an x (t) : [, T ] R n that minimizes I[x(t)] = T L(x(t), ẋ(t))dt. Where x satisfies x() = x and x(t ) = x 1. Theorem (Euler-Lagrange Equations) Let x (t) solve the calculus of variations problem above. Then x (t) solves the Euler-Lagrange differential equations d dt [ vl(x(t), ẋ(t))] = x L(x(t), ẋ(t)). 15

24 We now convert the Euler-Lagrange equations into Hamilton s equations. First, we define p(t) = v L(x(t), ẋ(t)) ( t T ) for a given curve x(t). We assume we can solve p = v L(x, v) x, p R n for v = v(x, p), or v in terms of x and p. We now introduce the dynamical systems Hamiltonian. Definition (Dynamical Systems Hamiltonian) The dynamical systems Hamiltonian H : R n R n R is defined by the formula H(x, p) = p v(x, p) L(x, v(x, p)). Theorem (Hamiltonian Dynamics) Let x(t) solve the Euler-Lagrangian equations and define p(t) as above. Then the pair (x(t), p(t)) solves Hamilton s equations ẋ(t) ṗ(t) = p H(x(t), p(t)) = x H(x(t), p(t)) Proof. Recall that H(x, p) = p v(x, p) L(x, v(x, p)), where v = v(x, p), or equivalently, p = v L(x, v). Then, x H(x, p) = p x v v L(x, v(x, p)) x v x L(x, v(x, p)) = x L(x, v(x, p)) using p = v L. Now, p(t) = v L(x(t), ẋ(t)) if and only if ẋ(t) = v(x(t), p(t)). Therefore, 16

25 the Euler-Lagrange equations imply ṗ(t) = x L(x(t), ẋ(t)) = x L(x(t), v(t), p(t))) = x H(x(t), p(t)). Also, p H(x, p) = v(x, p) + p p v v L p v = v(x, p) using p = v L. This implies p H(x(t), p(t)) = v(x(t), p(t)). However, we note p(t) = v L(x(t), ẋ(t)) and ẋ(t) = v(x(t), p(t)). Therefore, ẋ(t) = p H(x(t), p(t)). Finally, we note additionally that d dt H(x(t), p(t)) = xh ẋ(t) + p H ṗ(t) = x H p H + p H ( x H) =. Let us now state the basic problem arising in optimal control. We are given A R m and also f : R n A R n, x R n. We denote the set of admissible controls by A = {u(t) : [, ) A u(t) is measurable} 17

26 We wish to maximize (or minimize) the cost functional T C[u(t)] = Ψ(x(T )) + r(x(t), u(t))dt subject to the evolution of the system ẋ(t) x() = x = f(x(t), u(t)) where the terminal time T >, r : R n A R, and the terminal payoff Ψ : R n R are given. That is, we wish to find a control u(t) such that C[u (t)] = max u A C[u(t)]. Pontryagin s maximum principle states that if u (t) is an optimal control, then there exists a function p (t), called the costate, that satisfies a maximization principle. Definition (Control Theory Hamiltonian) The control theory Hamiltonian is the function H(x, p, a) = f(x, a) p + r(x, a) x, p R n, a A. We now formally state Pontryagin s maximum principle. Theorem (Pontryagin s Maximum Principle) Assume u (t) is optimal for our control problem above, and x (t) is the corresponding trajectory. Then, there exists a function p : [, T ] R n such that ẋ (t) = p H(x (t), p (t), u (t)), ṗ (t) = x H(x (t), p (t), u (t)), 18

27 and with the terminal condition H(x (t), p (t), u (t)) = max a A H(x (t), p (t), a) ( t T ), p (T ) = Ψ x (x (T )). We note the similarities of ẋ (t) and ṗ (t) to Hamilton s equations found in Theorem The proof of Theorem can be found in [12, 22, 26] 19

28 Chapter 3 Evolutionary Norms Game In this chapter, we provide our analysis for Axelrod s evolutionary norms game. In Section 3.1, we review the basic layout of this game as well as introduce a pure strategy construction of it. Section 3.2 offers a dynamical systems approach to study the behaviour and stability of this game, and our analysis is finalized in Section 3.3 with ABM simulations. 3.1 Social Norms Evolutionary Game In Axelrod s norms game, players in a population (size N) can choose to defect, and also choose to punish those they have seen defecting. Players who defect receive a temptation payoff of 3 (T = 3), but also have a chance of being seen (S). S is chosen to be uniform between and 1, (U(, 1)), and thus has expected value 1 ( + 1) =.5. The players who 2 are seen have a chance of being punished for a payoff of 9 (P = 9) by all of those who see them. However, each player that chooses to punish must pay an enforcement cost of 2 (E = 2). Players that do not defect (that is, follow the norm) will get hurt by all those that do, receiving a payoff of 1 (H = 1) each time. In this game, players choose to defect or punish based on their boldness (B) and venge- 2

29 fulness (V ), respectively. A high boldness corresponds to a high probability of a player defecting, and a high vengefulness corresponds to a high probability of a player punishing another player they have seen defecting. In our social setting, the evolution in an evolutionary game will not be thought of as players dying and producing offspring with certain strategies. Instead, we will interpret a generation as a period of time where players within the population play this social game with their counterparts. After this time period, players may decide to change their strategies based on the fitness other strategies have obtained. Finally, mutation will be introduced into the population by allowing a 1% chance that each player s boldness and/or vengefulness level will be changed by 1 7 [2]. As Axelrod states, the evolutionary aspect of this game can simply be thought of as players who perform poorly will decide to mimic the strategies of those who have superior fitness. We can view this as a game with four possible pure strategies to play. These are to follow the norm and punish (NP ), follow the norm and be lenient (NL), defect and punish (DP ), and defect and be lenient (DL). The payoff matrix corresponding to this game can be written as follows 21

30 NP NL DP DL NP SE + H SE + H NL H H DP T N + SP T N T T N + SP + SE + H N + SE + H DL T N + SP T N T N + SP + H T N + H Figure 3.1: 4 4 Norms Game This game is in fact symmetric. We also note that defectors receive a payoff of T. N This reflects Axelrod s construction of his game, where instead of two players competing against each other, one player will play against the entire population. For example, in his simulations, it is possible for a defector to be punished by all other players at once. If this happens, the payoffs of this game dictate that the defector will obtain the full temptation payoff T, but also receive punishment from every other player, (N 1)P. We choose to replace N 1 with N, which will become a better approximation when N is large. 3.2 Replicator Dynamics of the Norms Game Let us model our game using the continuous replicator equations (2.1.1). The payoffs, π, corresponding to each of the four strategies are 22

31 π NP =H(S DP + S DL )N + SE(S DP + S DL )N π NL =H(S DP + S DL )N π DP =T + SP (S NP + S DP )N + SE(S DP + S DL )N+ (3.2.1) H(S DP + S DL )N π DL =T + SP (S NP + S DP )N + H(S DP + S DL )N. Where S NP is the fraction of the population that plays strategy NP, and so forth. Our differential equation system then looks as follows. Ṡ NP = S NP [π NP π] Ṡ NL = S NL [π NL π] Ṡ DP = S DP [π DP π] (3.2.2) Ṡ DL = S DL [π DL π] with π given in (2.1.1). Also, we note that S NP + S NL + S DP + S DL = 1, with each fraction taking values between and 1. In order to remain consistent with Axelrod [2], we wish to convert these variables into terms of vengefulness and boldness. Thus, we will say S NP + S DP = V and S DP + S DL = B. Moreover, we apply the fact that all fractions sum to 1 and obtain the relation S NL = 1 V B + S DP. Our differential equation system (3.2.2) transforms into Ḃ = S DP BSEN + BT + BV SP N B 2 V SEN B 2 T B 2 V SP N V = V BSEN V 2 BSEN V BT V 2 BSP N + S DP T + S DP V SP N (3.2.3) Ṡ DP = S DP ( T SP NV SENB + V BSEN + BT + BV SP N). 23

32 and our constraints now become B + V S DP 1 B S DP 1 V S DP 1 B 1 (3.2.4) V 1 S DP Stability of the Norms Game Equilibria We will analyze (3.2.3) in terms of stability, as reviewed in Chapter 2. The equilibrium points associated with this system that lie in the feasible region (3.2.4) are i) B =, V = V, S DP = ii) B = 1, V =, S DP = iii) B = 1, V = 1, S DP = 1 iv) B = SP N + T SN(E + P ), V = SEN T SN(E + P ), S DP = T v) B = SN(E + P ), V = T SN(E + P ), S T DP = SN(E + P ) To compute a linearization around these points, we calculate the Jacobian with respect 24

33 Figure 3.2: Vector field of our system (3.2.3) within the feasible region (3.2.4). The system will evolve to a state where we see no hypocrites, that is, players using strategy DP. to (3.2.3), J = Ḃ B V B ṠDP B Ḃ V V V ṠDP V Ḃ S DP V S DP ṠDP S DP 25

34 where Ḃ B = S DP SEN + T + SP NV 2V SENB 2BT 2BSP NV Ḃ V = BSP N SENB2 B 2 SP N Ḃ = SENB S DP V B = V SEN V 2 SEN V T SP NV 2 V V = SENB 2V SENB BT 2BSP NV + S DP SP N V = T + SP NV S DP ṠDP B = S DP ( SEN + V SEN + T + SP NV ) ṠDP V = S DP ( SP N + SENB + BSP N) ṠDP S DP = T + SP NV + SENB V SENB BT BSP NV. Given this J, a linearization about point (i) gives an eigenvalue equal to zero, which means we cannot directly determine the stability of this point using this method. Instead, we will choose a Lyapunov function candidate, V L (X) = 1 2 X ˆX 2 where X(t) = (B(t), V (t), S DP (t)) T, and ˆX is our equilibrium point i. We clearly see that if X(t) = (, V, ) T then X ˆX = and thus V L ( ˆX) =. Moreover, we see that V L (X) is strictly > for X ˆX because the norm assigns a positive length to the vector X ˆX. This length will only equal zero when X = ˆX. Thus, V L is a Lyapunov function. We wish 26

35 now to find d dt V L = V L. That is, We see that d dt ( ) d 1 2 X ˆX dt 2 =< X ˆX, d ( X dt ˆX ) > ( X ˆX ) is equal to (3.2.3), and thus obtain < X ˆX, F(X) > Recall that for point i to be stable, this must be. Assume F is strictly monotone on a neighbourhood around ˆX, N ( ˆX). Then, < F(X) + F( ˆX), X ˆX > > X ˆX. That is, < F(X), X ˆX > > < F( ˆX), X ˆX >. Or equivalently, < F(X), X ˆX > < F( ˆX), X ˆX >. Note that if < F( ˆX), X ˆX >, then < F(X), X ˆX >. We see that F( ˆX) =, and so < F(X), X ˆX > X ˆX, X N ( ˆX). If F is C 1 then it is strictly monotone at ˆX if J ( F) is positive semi-definite on the neighbourhood around ˆX. That is, X T J ( F( ˆX))X (3.2.5) 27

36 We first compute the Jacobian matrix J ( F ( ˆX)) NV J ( F( ˆX)) = NV 5.5V 2 N + 3V NV NV This matrix is positive definite for V [, 1] and N > if the eigenvalues of its symmetric part are greater than. We note that the symmetric part of this matrix is J ( F( ˆX)) S = 1 (J ( F( 2 ˆX)) + J ( F( ˆX)) ) T NV 1 NV 11V 2 N + 3V = 1 NV 11V 2 N + 3V 3 + 9NV NV 3 + 9NV 2 2 The eigenvalues of this matrix are λ 1 = NV λ 2 = 9 4 NV (166N 2 V NV N 2 V 3 132V 3 N + 121V 4 N V V 2 N) λ 3 = 9 4 NV (166N 2 V NV N 2 V 3 132V 3 N + 121V 4 N V V 2 N). Maple 15 indicates that there is no V [, 1] and N > that allows λ 1, λ 2, λ 3. Thus, J ( F) is not positive definite or positive semi-definite on the neighbourhood around ˆX. Therefore, F is not monotone at ˆX. We conclude that equilibrium point (i) is not stable. 28

37 We find that the only locally asymptotically stable equilibrium point (all eigenvalues < ) is (ii). This directly corresponds to the population playing strategy DL. By the folk theorem, we say that this state is a Nash equilibrium. Given Theorem and the fact that (ii) is a Nash equilibrium, a population playing the strategy DL is at a potential ESS. We will check this using our payoff matrix and Definition In this case, x = (,,, 1). Moreover, π(x, x ) = (x ) T Ax = T N + H. We consider a strategy y, with y X and y x. That is, we say y = (y 1, y 2, y 3, y 4 ) T with y 1 + y 2 + y 3 + y 4 = 1. Now, SE + H π(y, x ) = (y) T Ax H = (y 1, y 2, y 3, y 4 ) T N + SE + H T + H N ( ) ( ) T T = y 1 (SE + H) + y 2 H + y 3 N + SE + H + y 4 N + H. Which simplifies to H + SE(y 1 + y 3 ) + T N (y 3 + y 4 ). We now compare π(x, x ) with π(y, x ) by inspecting the following inequality T N + H H + SE(y 1 + y 3 ) + T N (y 3 + y 4 ). 29

38 This is stated equivalently as T N (1 y 3 + y 4 ) SE(y 1 + y 3 ). (3.2.6) We note that 1 y 3 + y 4 = y 1 + y 2, and see that (3.2.6) always holds. If y 1 = y 2 = y 3 = and y 4 = 1, then T (1 y N 3 + y 4 ) SE(y 1 + y 3 ) =, however, this y is equivalent to x. Therefore, (3.2.6) is a strict inequality except when y = x. Thus, we conclude that the population playing strategy x, or DL, is evolutionarily stable V B Figure 3.3: Vector field of our system with S DP = and N = ABM Simulations of the Norms Game Axelrod s Simulation of the Norms Game Axelrod [2] originally constructed the simulation of his norms game to determine how the players strategies evolve over time. In his simulation, each player is assigned a vengefulness 3

39 and boldness level, which can take on values 7, 1 7, Intuitively, a boldness level of 7 would mean a player will not defect. Contrarily, a boldness level of 7 7 implies that a player will always defect. The same is true for the vengefulness level, except a player will be either punishing or not punishing accordingly. Axelrod initialized his simulation with twenty players, with their strategies (boldness and vengefulness levels) chosen at random. In one generation, a player is given four opportunities to defect. A player will choose to defect if their boldness level (B) is higher than a number chosen at random between and 1. If a player does indeed choose to defect, every other player in the population has a chance to see and punish them for their defection. The chance of being seen (S) is also a number chosen at random between and 1. All other players in the population subsequently draw their own random number in an identical way, and if it is larger than S, chosen earlier, that player will have seen the original defector. If a certain player does indeed see a defection, they draw yet another number between and 1. If this number is larger than their vengefulness (V ), this player chooses to punish the defector. Once a generation is completed, successful strategies are replicated and unsuccessful ones are eliminated from the population. Strategies that have a fitness one standard deviation above the population average are duplicated, while strategies with fitness one standard deviation below the average are removed. Axelrod states that the population size is kept constant, but he does not state how he does this. Galan and Izquierdo [14] chose to resolve this by randomly duplicating or removing individuals in the population until they reached their original population size. This is also the method used in this thesis. Finally, after the strategies have either been duplicated or removed, mutation is introduced into the population. This is done by allowing a 1% chance that each player s venge- 31

40 fulness and each player s boldness is altered by 1. Axelrod s original simulations proceeded 7 with these steps for 1 generations. After the simulation is complete, the population s final average vengefulness and boldness is plotted. This thesis recreates Axelrod s simulations using Java. We alter his original simulation figures by instead plotting the population state after each generation. By doing this, we are able to more clearly see the evolution of the players strategies. We also allow the simulation to run for more than 1 generations when necessary in order to observe the population state move towards an equilibrium (Figures 3.4 and 3.5). in our figures of the Axelrod type simulations, each colour represents a separate simulation run. An interesting result from Axelrod s simulations can be observed in Figure 3.5. When using a larger population size (N > 25), the system tends to evolve towards a state where the average boldness is < 1. This behaviour contradicts that of the pure strategies analytical and simulation results from Sections and 3.3.2, respectively. These varying results are due to the selection process Axelrod uses. The population will remain in certain states when each agent s fitness falls within ±1 standard deviation of the average. During these times, only mutation can alter the agents strategies Pure Strategy Simulation of the Norms Game We will now run simulations that are based on this thesis construction of the norms game. That is, a player can choose one of four possible pure strategies to play. In these simulations, players are replicated and removed in the same way as the previous Axelrod type simulations, but mutation will be handled differently. With this construction of the game, we allow a 1% chance that a player will have their strategy changed to one of the three different strategies not currently being used by that player. 32

41 Average V Average B Figure 3.4: Replicating Axelrod s norms game simulation with a population size of 2. Looking down on the V-B plane, each of the five simulation runs are tracked by plotting the population s average vengefulness and boldness at the end of each generation for 1 generations. The runs began with an initial average vengefulness and boldness of.1 and.9,.1 and.5,.2 and.2,.65 and.65, and 1. and 1.. The simulation starting with an average vengefulness and boldness of 1. and 1. was allowed to run for 5 generations in order to observe the full behaviour towards an equilibrium. Aside from the behaviour mentioned at the end of Section 3.3.1, the pure strategy simulations (Figures 3.6 and 3.7) and the Axelrod type simulations produce similar results. An interesting difference, however, is the fact that the pure strategies simulation did not require the extra generations in simulation 5 to reach the equilibrium of the population playing either strategy NL or NP. This could be because the pure strategy version of selection and mutation allows the system to evolve at a faster rate. There are two noteworthy aspects produced from the simulations that have not been addressed thus far. First is the possibility of vastly different end behaviours after 5 generations stemming from an initial state of full average vengefulness and boldness (entire population playing DP in the pure strategies version). By referring to Figure 3.2, this is explained by noting how the system s long term behaviour appears to be dependent on 33

42 Average V Average B Figure 3.5: Replicating Axelrod s norms game simulation with a population size of 5. Each of the five runs have the same initial starting values as Figure 3.4. All simulations ran for 1 generations. mutations and other stochastic elements as the population starts with larger fractions of players playing strategy DP. We illustrate this in Figure 3.8 by using Axelrod s method as it is much more suited to displaying multiple simulation runs in one plot. Second is the evolution of a population from being in a state with a mix of players playing strategies N P and N L to the population solely adopting strategy DL. We recall that in the norms game, a state in which the population plays any strategy except DL is not stable. Thus, after a sufficient amount of time, the norm will always collapse. Figure 3.9 illustrates the evolution of the strategies in the pure strategies version of the norms game, starting with an initial mix of strategies being played. We see that after being present in early generations, strategy N P diminishes from the population. In generation 2, strategy DL has been fully adopted by the players. We note that this equilibrium shift does not always occur after the same amount of time. 34

43 Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Figure 3.6: Pure strategies simulation of the norms game. Initial fractions of the population playing strategies NL, NP, DL, DP are.,.1,.9,.;.4,.1,.5,.;.6,.2,.2,.;.,.35,.35,.3;.,.,., 1., read left to right, top to bottom. All simulations ran for 1 generations with a population size of Chapter Summary We have re-analyzed Axelrod s norms game [2] using a pure strategy approach and compared our analytical results with ABM simulations. We have found from using both forms of analysis that one ESS exists in this game corresponding to a complete norm collapse (all players choosing to defect). We also note that an equilibrium in which the boldness of a population is, or there is a mix of strategies NL and NP being played, is unstable and will ultimately lead to a norm collapse over time. We finally observe that an Axelrod type simulation with a larger population shows behaviour near the ESS that does not agree with our analytical results or our pure strategy ABMs. For example, Figure 3.5 shows persistent states that correspond with mixtures of strategies DL and NL being played in the pure 35

44 Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL.7.7 Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Figure 3.7: Pure strategies simulation of the norms game. Initial fractions of the population playing particular strategies correspond to values used in Figure 3.6, All simulations ran for 1 generations with a population size of 5. strategies form of the norms game. We find that this is due to the evolutionary selection process used in these ABMs. 36

45 Average V Average B Figure 3.8: Axelrod type simulation of the norms game with N = 2. This example shows the possibility of a population evolving to much different states over the same generation span (5) when starting with an average vengefulness and boldness of 1. We note that all simulations will still collapse to B = 1, V =, given enough time Fraction NL Fraction NP Fraction DL Average V Average B Figure 3.9: Pure strategies simulation of the norms game with N = 2. This shows an example of a population transitioning from norm followers to lenient defectors. We also show the same simulation transformed into the vengefulness and boldness coordinates on the right. 37

46 Chapter 4 Evolutionary Metanorms Game We now consider Axelrod s second game, his so-called metanorms game. In this version, players have similar strategies and payoffs as the norms game, but now also have the opportunity to punish a non-punisher, provided that they are seen not punishing. Axelrod makes the assumption that a player who will punish a defector will also punish a nonpunisher. Similarly, a player who is lenient towards defectors will also be lenient towards non-punishers. Section 4.1 gives our pure strategy construction of Axelrod s metanorms game. Here, we again discuss the dynamics and stability of this game using methods from differential equations. We conclude the chapter s analysis in Section 4.2 by complementing our results with ABM simulations. 4.1 Replicator Dynamics of the Metanorms Game The payoffs for the metanorms game are as follows. The punishment a player receives for being lenient towards defectors is denoted P, and the enforcement cost associated with punishing non-punishers is E. In our case, we will use P = P and E = E. The symmetric 38

47 game matrix looks as follows (Figure 4.1) NP NL DP DL NP SE H + SE H + SE + SE NL SP H + SP H DP T N + SP T N + SE T N + SE + H T N + SE + SE + H DL T N + SP + SP T N T N + SP + SP + H T N + H Figure 4.1: 4 4 Metanorms game And the modified payoffs for this game are then π NP =H(S DP + S DL )N + SE(S DP + S DL )N + SE (S NL + S DL )N π NL =H(S DP + S DL )N + SP (S NP + S DP )N π DP =T + SP (S NP + S DP )N + SE(S NL + S DL )N+ SE (S DP + S DL )N + H(S DP + S DL )N (4.1.1) π DL =T + SP (S NP + S DP )N + SP (S NP + S DP )N+ H(S DP + S DL )N. Using the same process and change of variables as in the regular norms game, we obtain the same feasible region (3.2.4), and our differential equation system in terms of V, B, and S DP becomes 39

48 Ḃ =S DP SEN V BSEN + S DP BSEN S DP V SEN + BV SP N S DP V SP N + V 2 BSP N + V 2 BSEN B 2 V SEN B 2 V SP N + BT B 2 T V = SP NV 2 + SP NV 3 + SENV 3 + SENV 2SENV 2 V BT + V BSEN + S DP V SP N V 2 BSP N (4.1.2) V 2 BSEN + S DP T Ṡ DP = S DP ( SEN T SP NV 2 SENB + 2SENV SENV 2 + V BSEN + BV SP N + BT ) Stability of the Metanorms Game Equilibria The equilibrium points associated with our metanorms system that also fall within the feasible region (3.2.4) are i) B =, V =, S DP = ii) B = 1, V =, S DP = iii) B =, V = 1, S DP = iv) V = 1, B = 1, S DP = 1 v) B =, V = E P + E, S DP = vi) B = 1 2SP N + T 2 SN(P + E), V = 1 2SEN T 2 SN(P + E), S DP = 4

49 Figure 4.2: Vector field of our system (4.1.2) within the feasible region (3.2.4). We see that the system again evolves to a state where there are no hypocrites. Also, there is a strong attraction to the point V = 1, B =, S DP =. Similarly to the norms game, we calculate the Jacobian with respect to (4.1.2). Ḃ B = V SEN + S DP SEN + SP NV + SP NV 2 + SENV 2 2V SENB 2BSP NV + T 2BT 41

50 Ḃ V = BSEN S DP SEN + BSP N S DP SP N + 2BSP NV + Ḃ S DP 2V SENB SENB 2 B 2 SP N = SEN + BSEN V SEN SP NV V B = V T + V SEN SP NV 2 SENV 2 V V = 2SP NV + 3SP NV 2 + 3SENV 2 + SEN 4V SEN V S DP BT + BSEN + S DP SP N 2BSP NV 2V SENB = T + SP NV ṠDP B = S DP ( SEN + V SEN + SP NV + T ) ṠDP V = S DP ( 2SP NV + 2SEN 2V SEN + BSEN + BSP N) ṠDP S DP = SEN + T + SP NV 2 + BSEN 2V SEN+ SENV 2 V SENB BSP NV BT. In this game, the system shows bistability with two locally asymptotically stable equilibrium points. These correspond to points (ii) and (iii). By the folk theorem, both of these strategies are Nash equilibria, and we will use the game s payoff matrix to determine if they are also ESS. We will first inspect point (ii), which is a repeat from the original norms game. Again, x = (,,, 1) and π(x, x ) = (x ) T Bx = T N + H. 42

51 The strategy y = (y 1, y 2, y 3, y 4 ) T results in the expected payoff 2SE + H π(y, x ) = (y) T Bx H = (y 1, y 2, y 3, y 4 ) T N + 2SE + H T + H N ( ) T = y 1 (2SE + H) + y 2 H + y 3 N + 2SE + H + y 4 ( T N + H ). This simplifies to H + 2SE(y 1 + y 3 ) + T N (y 3 + y 4 ). Next, we compare π(x, x ) with π(y, x ) as before by inspecting the following inequality T N + H H + 2SE(y 1 + y 3 ) + T N (y 3 + y 4 ). This is equivalent to T N (1 y 3 + y 4 ) 2SE(y 1 + y 3 ). (4.1.3) As with the norms game, we see that 1 y 3 + y 4 = y 1 + y 2. Similarly, the inequality (4.1.3) always holds. Moreover, it is a strict inequality except when y = x. Therefore, x, or DL, is an ESS in the metanorms game as well. Next, we consider point (iii). This corresponds to each player in the population playing strategy NP. Here, x = (1,,, ) and π(x, x ) = (x ) T Bx =. 43

52 With strategy y, we now have π(y, x ) = (y) T Bx = (y 1, y 2, y 3, y 4 ) = y 2 (SP ) + y 3 ( T N + SP ) SP T + SP N T + 2SP N + y 4 ( T N + 2SP ). We simplify this and obtain SP (y 2 + y 3 + 2y 4 ) + T N (y 3 + y 4 ). Again, our final step is to compare π(x, x ) with π(y, x ) using the inequality SP (y 2 + y 3 + 2y 4 ) + T N (y 3 + y 4 ). Rearranging, we have T N (y 3 + y 4 ) SP (y 2 + y 3 + 2y 4 ) (4.1.4) Given Axelrod s parameter values, we observe that T N is in fact smaller than SP. Also, noting the fact that y 2, y 3, y 4, we conclude that inequality (4.1.4) is always strict unless y = (1,,, ) = x. Therefore, the population playing strategy NP is also an ESS in the metanorms game. 44

53 V B Figure 4.3: Vector field of our metanorms system with S DP = and N = Simulation of the Metanorms Game Axelrod s Simulation of the Metanorms Game Axelrod s potential solution to prevent norm collapse was to introduce the metanorms game. Figures 4.4 and 4.5 show the results of his method of simulation with a population size of 2 and 25, respectively. We see that three of the five runs with the same initial conditions used in our simulations of the norms game result in the norm being enforced. The most notable difference is the initial condition of a population average vengefulness and boldness of.2 for N = 2, and average vengefulness of.18 for N = 25. In the metanorms simulation, similar initial population states depending on N can in fact evolve to either a norm following, or defecting player base. We can find such initial states from the replicator dynamics (4.1.2) and also deduce from the vector field shown in Figure 4.3 (N = 2 case) that both possibilities are likely given a favourable mutation in an early generation. 45

54 We also observe that the norm will still collapse when the population s initial vengefulness is not high enough, which agrees with Axelrod s claim. Average V Average B Figure 4.4: Replicating Axelrod s metanorms simulation with a population size of 2. Looking down on the V-B plane, each of the five simulation runs are tracked by plotting the population s average vengefulness and boldness at the end of each generation for 1 generations. The runs began with an initial average vengefulness and boldness of.1 and.9,.1 and.5,.2 and.2,.65 and.65, and 1. and 1.. The simulation starting with an average vengefulness and boldness of 1. and 1. was allowed to run for 3 generations in order to observe it reaching an equilibrium Pure Strategy Simulation of the Metanorms Game The pure strategies simulation approach shows similar results to the Axelrod type simulations (Figures 4.6 and 4.7). We now also note as an example that using (4.1.2) reveals the critical value of a population s average vengefulness (or players playing strategy N P ), with the absence of hypocrites (or players playing strategy DP ), for a norm to be enforced when N = 2 ranges from.181 to.195, depending on the initial boldness of the population. 46

55 Average V Average B Figure 4.5: Replicating Axelrod s metanorms simulation with a population size of 25. Each of the five runs have the same initial starting values as Figure 4.4, with the exception of average vengefulness and boldness of.18 and.2. All simulations ran for 1 generations. 4.3 Chapter Summary Axelrod s metanorms game has two ESSs. We see that given certain initial conditions, the population of players can evolve to a state in which the norm collapses, or a state in which the norm is established. Our vector field based on the replicator dynamics of the system has shown to be an accurate predictor of the selection process in the population. This is evidenced by our ABM simulations evolving to different states stemming from the same initial condition, as seen in Figures 4.4 and

56 Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Figure 4.6: Pure strategies simulation of the metanorms game. Initial fractions of the population playing particular strategies correspond to values used in Figures 4.4 and 4.5, read left to right, top to bottom. All simulations ran for 1 generations with a population size of 2. 48

57 Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Fraction NL Fraction NP Fraction DL Fraction DP Figure 4.7: Pure strategies simulation of the metanorms game. Initial fractions of the population playing particular strategies correspond to values used in Figure 4.6. All simulations ran for 1 generations with a population size of

58 Chapter 5 Utilizing Controls to Establish a Norm Let us now consider the problem of establishing a norm when the majority of a population does not initially follow it. We noted earlier that in his work, Axelrod states that a norm in the metanorms game will collapse if the population is not initialized with a sufficiently high vengefulness [2]. Thus, to establish a norm in this case, we must transition the population s strategy distribution from one equilibrium to another. In this chapter, we explore the effect of introducing a control in a game to act as an incentive for players to play a desired strategy. We begin in Section 5.1 by introducing our controlled norms game. Firstly, we identify all evolutionarily stable strategies associated with this game. Secondly, we propose an optimal control function that transitions the population from one evolutionarily stable state to another. Section 5.2 contains our spatial and non-spatial ABM simulations for this game. 5

59 5.1 Controlled Norms Game We construct a relatively simple game in which players can choose whether or not to follow some norm (Figure 5.1). Unlike Axelrod s games, we only allow for two possible strategies to be played. We will, however, maintain the similarity that each player will play against every other player in the population each round. The central idea of this game is that if a player decides to follow the norm, they will receive some negative payoff, H, for going out of their way to do so. However, if they meet another player who has also chosen to follow the norm, they will receive an additional positive payoff, G. The reasoning behind this is that a player will feel a certain amount of contentment, or gratification, when they see others taking the same initiative as themselves [43]. Finally, if a player chooses to be apathetic towards the norm, they will simply receive no payoff. One can see that as the amount of norm followers rises, the cost of following the norm is consequently offset. Finally, we wish to introduce a control to this game. This is done by rewarding players who choose to follow the norm with some positive payoff dictated by the value of a control, u. Player 2 F A F G + H + u H + u Player 1 A Figure 5.1: 2 2 Controlled norms Game. A player can choose to follow the norm (strategy F ), or be apathetic towards the norm (strategy A). 51

60 5.1.1 Replicator Dynamics and Stability for the Controlled Norms Game The expected payoffs for the controlled norms game are as follows π F = GNx F + H + u(t) π A = (5.1.1) where N is the population size, u(t) is the value of the control at time t, and x F and x A are the fractions of the population that play strategy F and strategy A, respectively. We note that x A is simply equal to 1 x F. As before, the replicator equations for this game are ẋ F = x F [π F π] = x F (1 x F )(GNx F + H + u(t)) (5.1.2) and ẋ A is simply ẋ F. If we let u(t) =, we can easily see that the equilibrium points associated with this system are i) x F = ii) x F = 1 iii) x F = H GN. Calculating ẍ F = (1 x F )(NGx F + H) x F (NGx F + H) + x F (1 x F )NG, we can determine the stability of these points. For the following analysis, we will choose GN > H. Substituting point i into ẍ F, we get ẍ F = H <, ii we get ẍ F = GN H <, and iii we get ẍ F = H(GN+H) GN >. Thus, points (i) and (ii) are asymptotically stable equilibria, 52

61 while point (iii) is not. By the folk theorem, (i) and (ii), corresponding to scenarios in which the entire population plays strategy A or F, respectively, are Nash equilibria, and therefore potential ESS. Let us first check the potential ESS x = (, 1). That is, playing strategy A with 1% probability. We see the expected payoff, π(x, x ) =. The strategy y = (y 1, y 2 ) T results in expected payoff π(y, x ) = Hy 1. We see that the inequality Hy 1 always holds since H < and y 1. Moreover, it is strict unless y 1 = which implies y = x. Thus, the population playing strategy A is an ESS. We now inspect the second potential ESS x = (1, ). That is, playing strategy F with 1% probability. If the entire population is playing F, the expected payoff is π(x, x ) = GN + H, and the strategy y = (y 1, y 2 ) T has the expected payoff against x of π(y, x ) = (GN + H)y 1. The inequality GN + H (GN + H)y 1 always holds as GN > H and y 1 1. For the case that y 1 = 1, we have y = x, and so the population playing strategy F is also an ESS Optimal Control for the Controlled Norms Game We wish to utilize the control u(t) to transition the population from an initial state in which the majority of players play strategy A, to one in which players favour strategy F. Over this time, we seek to minimize the use of our control, which can be interpreted as minimizing 53

62 the total payout given to the population during the transition period. Quantitatively, this is stated as the integral T N x(t)u(t)dt. We formulate this as a fixed time, free endpoint problem and thus the time period extends from [, T ]. We also write x F = x for notation simplicity. Additionally, we introduce Ψ to act as an incentive for x to approach 1 by the terminal time, T. Ψ = I(1 x(t)). Here, the choice of the parameter I dictates the relative importance of x being near 1 at time T. With this addition, our problem statement is to Minimize: I(1 x(t)) + N x(t)u(t)dt. T (5.1.3) Subject to: ẋ(t) = x(t)(1 x(t))(gnx(t) + H + u(t)). The Hamiltonian of this problem is H(x(t), p(t), u(t)) = p(t)x(t)(1 x(t))(gnx(t) + H + u(t)) + Nx(t)u(t) with ṗ (t) = H x = 2p (t)gnx (t) p (t)h p (t)u (t) + 3p (t)gnx (t) 2 + 2p (t)x (t)h + 2p (t)x (t)u (t) Gu (t) and the terminal condition p (T ) = Ψ x = I. 54

63 The minimality condition declares H(x (t), p (t), u (t)) = min {p (t)x (t)(1 x (t))(gnx (t) + H + u(t)) + Nx (t)u(t))} u u max which leads to min {p(t)x (t)(1 x (t))u(t) + Nx (t)u(t)}. u u max Thus, the control u (t) must be chosen to minimize p (t)x (t)(1 x (t))u(t) + Nx (t)u(t) for u u max. We see that u (t) = u max if p (t) N x (t) 1 if p (t) > N x (t) 1 (5.1.4) Intuitively, for some interval [, t ], t T, u (t) = u max. After this time, we switch off u (t) and allow x to continue evolving in the absence of a control. We will call this t the optimal switching time. In order to solve and find t, we first need a temporary solution for x (t) on [, T ]. To solve for this function, we inspect ẋ(t) and notice that for any initial x > H GN, x will grow for u(t) =. Thus, we solve for x(t) numerically using x() = H GN + ɛ, where we choose ɛ = This initial choice of x gives us our desired solution for x (t). We note that for a given t, this x (t) will provide a minimal value of x that is guaranteed to grow to x 1 by time T. Using this x (t) and the condition p (T ) = I, we can now solve for the optimal switching time. For this problem, T is always chosen to be the first time step for which x (T ).99. We numerically solve ṗ (t) backwards in time starting at time T until we reach the time step that satisfies the condition in As mentioned earlier, this is the 55

64 optimal switching time t. Finally, we solve ẋ(t) backwards using u(t) = u max starting from an initial time t. This curve gives us our final solution for x (t) on [, t ], and the x (t) we solved for earlier provides the remaining optimal trajectory on (t, T ]. Our choices of I and u max will dictate the optimal switching times t and initial conditions x (). In our case, the initial condition means that given some I and u max, a certain fraction of the population must initially be playing strategy F to achieve our desired end conditions. An example that illustrates the solution process is shown in Figure 5.2. Here, we use the parameter values N = 5, H = 1, G =.1, and use MATLAB s ode45 function to numerically solve the differential equations. In both plots, the blue curve indicates x (t) with u (t) = u max = 1, and the green curve indicates x(t) with x() =.2 + ɛ. The two solutions meet at t, and the merged curves indicate the remaining x (t) with u (t) =. We note that the plot with I = 1 utilizes the control for a shorter time than when I = 1. However, it requires a larger x(),.226 as compared to.115 for the I = 1 case. Fraction x Time Step (a) I = 1 Fraction x Time Step (b) I = 1 Figure 5.2: Optimal control problem for two different values of I. A similar result can be seen when we vary our choices of u max. Figure 5.3 shows that in order to maintain the same t, an x() of.11 is required for u max =.5, as opposed to an x() of.139 for u max =

65 Fraction x Time Step (a) u max = 1 Fraction x Time Step (b) u max =.5 Figure 5.3: Optimal control problem for two different values of u max. We have presented the option of providing players with an incentive to play a desired strategy by introducing a control to a game. While doing this, our goal is to minimize the total incentive used over some specified time. The game we have chosen is relatively simple, but we see that it is certainly possible to shift a strategy distribution from one ESS to another by utilizing this technique. 5.2 Agent Based Models for the Controlled Norms Game In this section we discuss our ABM simulations for the controlled norms game. In our first approach to these simulations, agents will play the controlled norms game in a similar manner to our models in Chapters 3 and 4. That is, each agent will play against every other agent in the population, and strategies are replicated or removed in a similar way. The difference between the two methods is that the number of agents changing their strategies per round is capped to being no more than duplicating two and removing two strategies. For our simulations, we choose parameter values N = 5, G =.1, and H = 1. Our choices of u max and cutoff fraction (x value for which the control is switched off) range 57

66 from.8 to 1.5 and.15 to.85, respectively. Figure 5.4 shows the total payout given to a population of agents when using certain values of u max and cutoff fractions, with an initial 1% of the population following the norm. For each pair of points, the simulation ran until either the population reached 9% norm followers, or 5 rounds passed. The total payout displayed for a given pair is simply the average payout of five consecutive runs. If one of the runs produced a fraction of the population following the norm of less than.9 after 5 rounds, then the corresponding payout for that pair is shown as. In reality, however, a payout was used, but did not achieve the desired outcome. From Figure 5.4, we see that the combination that produces the lowest total usage of u is one in which the cutoff fraction is set to.2 and u max 1. We note that at least one of the five simulation runs where u max = 1.3 and cutoff =.2 failed to achieve at least 9% norm followers in the population, evidenced by the total payout of. Thus, to more safely ensure favourable end conditions, a cutoff fraction of.25 and u max 1 could, for example, be used instead. Revisiting our differential equation analysis from Section 5.1.2, our model suggests a u max of 1.16 with a cutoff fraction of.21 when starting with 1% norm followers. For the same initial condition and a cutoff fraction of.25, the differential equation model shows a u max = 1.2, which is in close agreement to our ABM. We also take a different approach that is similar to the voter model, which is described well by Liggett [24]. In this scenario, agents will interact with their neighbours and choose an opinion on some issue based on these interactions. In a majority voter model, agents will adopt the view that is held by the majority of their neighbours. Interactions take place on a two dimensional lattice with periodic boundary conditions, so that everyone will have the same number of neighbours [8, 19, 25, 29]. In our simulations, we choose to design the interaction between agents in two ways. First, each agent is assigned eight random interaction partners chosen throughout the population. Second, each agent will 58

67 x Total Payout u max Cutoff Fraction Figure 5.4: Cumulative usage of u for given combinations of u max and cutoff fractions. interact with their eight neighbours. For an example of these methods being used in a similar capacity, see Nakamaru and Levin [29], and Nowak et al. [25]. We note that the set of interaction partners each agent is initialized with remains constant over the entire simulation. The rules governing the strategy choice of each agent are as follows. Every agent plays our controlled norms game with their eight interaction partners, and is assigned an 59

68 appropriate fitness corresponding directly to the payoffs they received. If a particular agent is already following the norm, then the hurt they receive from doing so must exceed their fitness in order to change strategies. On the other hand, if an agent is not following the norm, then we simply calculate their fitness as if they were. If this hypothetical fitness exceeds the agent s assigned hurt value, they change their opinion and begin to follow the norm. We use a population size of 1,, and allow a random 1% of the population to consider changing strategies each round. In our first simulations, each agent will have identical hurt (H = 2) and gratification (G =.5) values. We note that half of an agent s interaction partners will need to be following the norm to equal their hurt value. Figure 5.5 shows a visualization of our two interaction scenarios using the arbitrary values of u max = 1.7, a cutoff fraction of.6, and 23 initial norm followers. It is clear that the random mixing of interaction partners allows the system to evolve into a state where a larger fraction of the population is following the norm when compared to the nearest neighbours approach after the same time span of 5 rounds. We also see from the nearest neighbours approach that after 5 rounds we are left with a coexistence of strategies being played. This result is mentioned by Nowak et al., where strategies can coexist with this type of interaction scenario. However, this is not the case when interactions are random [25]. Figure 5.6 shows the total usage of u for given values of u max and cutoff fractions in our 8 random interaction partner simulations. In addition, 5.7 shows the number of rounds each pair took to reach 9% norm followers in the population. These simulations were initialized with 1% of the population following the norm, and ran for up to 1 rounds. If a given combination of u max and cutoff fraction produced less than 9% of the population following the norm after 1 rounds, the corresponding bar is shown as in both figures. We see that the minimum u max and cutoff fraction for a norm to consistently become 6

(a) 8 Nearest Neighbours, time step 25 (b) 8 Random Interaction Partners, time step 25 (c) 8 Nearest Neighbours,

5: Optimal control problem for 8 interaction partners.

(a) and (c) show our model using an agent s 8 nearest neighbours at time steps 25 and 5.

8 nearest neighbours concludes with 73% of the population following the norm, while 8 random neighbours concludes

69 (a) 8 Nearest Neighbours, time step 25 (b) 8 Random Interaction Partners, time step 25 (c) 8 Nearest Neighbours, time step 5 (d) 8 Random Interaction Partners, time step 5 Figure 5.5: Optimal control problem for 8 interaction partners. Grey indicates agents who are following the norm, black indicates those who are not. (a) and (c) show our model using an agent s 8 nearest neighbours at time steps 25 and 5. (b) and (d) show our model using 8 random interaction partners throughout the population at time steps 25 and 5. 8 nearest neighbours concludes with 73% of the population following the norm, while 8 random neighbours concludes with 96%. established are 1.55 and.55, respectively. However, using this cutoff fraction means the population will take a considerably larger amount of time to reach a state of 9% norm followers than any other fraction.6. We also note the decrease in total rounds needed 61

EVOLUTIONARY GAMES WITH GROUP SELECTION

EVOLUTIONARY GAMES WITH GROUP SELECTION Martin Kaae Jensen Alexandros Rigos Department of Economics University of Leicester Controversies in Game Theory: Homo Oeconomicus vs. Homo Socialis ETH Zurich 12/09/2014