The Dynamics of Generalized Reinforcement Learning

Size: px
Start display at page:

Download "The Dynamics of Generalized Reinforcement Learning"

Transcription

1 The Dynamics of Generalized Reinforcement Learning Ratul Lahkar and Robert M. Seymour August 21, 2012 Abstract We consider reinforcement learning in games with both positive and negative payoffs. The Cross rule is the prototypical reinforcement learning rule in games that have only positive games. We extend this rule to incorporate negative payoffs to obtain the generalized reinforcement learning rule. Applying this rule to a population game, we obtain the generalized reinforcement dynamic which describes the evolution of mixed strategies of agents in the population. We show that pure strategy Nash equilibria in negative payoffs are not stationary points of this dynamic. Therefore, in simple two strategy games like the stag hunt and prisoner s dilemna, the population moves away from a Pareto inferior Nash equilibrium in negative payoffs towards a more cooperative state. Finally, simulations reveal convergence of the dynamic to interior stationary points in all monocyclic games including the bad RSP game. Keywords: Reinforcement learning; Negative reinforcement; Generalized Reinforcement Dynamic. JEL classification: C72; C73. IFMR, 24, Kothari Road, Nungambakkam, Chennai, , India. r.lahkar@ifmr.ac.in. My coauthor passed away on July 24, He was fully involved in the initial stages of this paper when we were discussing the idea and preparing the preliminary draft. Unfortunately, he expired before we could finish the final version of the paper. I dedicate this paper to his memory Deceased. Formerly, Professor Emeritus of Mathematics, University College London. 1

2 1 Introduction Learning and evolutionary game theory seek to provide foundations to social equilibrium behavior on more realistic norms of human behavior. Reinforcement learning models form a significant part of the literature in these fields of research. Such models are based on the general psychological principle that higher the benefit from using an action in the past, the greater is its likelihood in the present (Estes, 1950; Estes and Burke, 1953; Bush and Mosteller, 1951a, 1951b). More formally, in reinforcement models, an agent carries an internal mixed strategy, construed as the agent s behavioral disposition. If an action has yielded a high payoff in the past, then the probability assigned to it increases in the present; or the behavior associated with the action gets reinforced. Young (2004) provides a review of the several variants of strategic models of reinforcement learning that have been developed around this general principle. Experimental tests on some of these models (for example, Roth and Erev, 1995; Erev and Roth, 1998) have also yielded significant support for the predictions of reinforcement learning. In reinforcement learning models, payoffs in a game are not interpreted as von Neumann Morgenstern utilities. Instead, as emphasized in Börgers and Sarin (1997), they represent reinforcement stimuli. The payoffs parameterize the direction and magnitude by which an agent s behavioral disposition changes in light of his experience. The agent s experience consists of the action he has chosen and the payoff he consequently receives. A common feature of reinforcement models have been that they allow agents to only experience positive payoffs. Hence, in such models, actions are always positively reinforced in proportion to the magnitude of the payoff obtained. For example, Börgers and Sarin (1997, 2000) consider a model in which all payoffs exceed an exogenously fixed aspiration parameter common to all agents. Whichever action an agent may play, his aspiration is always satisfied and therefore, there is no stimulus to reduce the propensity of that action. Instead, the probability of that action always increases. By normalizing the aspiration parameter to zero, Börgers and Sarin (1997, 2000) arrive at the canonical learning model of Cross (1973) in which positive payoffs signify fulfilment of aspiration. In this model, the current probability of an action increases by a fraction, equal to the payoff obtained, of the residual probability that had been assigned to the other actions. In this paper, we generalize the principle of reinforcement to incorporate the idea of negative reinforcement. We consider games which may contain both positive and negative payoffs, with all payoffs lying between 1 and 1. 1 Positive payoffs represent, as usual, positive reinforcement stimuli. However, if an action yields negative payoff, then we allow its probability to decrease. Negative payoffs, therefore, serve as negative reinforcement stimuli which reduces an agent s propensity to use that action in his next round of play. We formalize negative reinforcement with a rule analogous to the Cross (1973) rule of positive reinforcement. Whereas the Cross rule transfers a fraction of the residual probability to the current action, our rule of negative reinforcement diverts a part 1 This is a technical assumption required to ensure that strategy revision under generalized reinforcement generates a sensible probability distribution. This is a generalization of the assumption in the Cross rule of positive reinforcement that all payoffs are between 0 and 1. 1

3 of current action s probability to other actions. Therefore, the probability of the current action declines in proportion to the negative payoff it generates. Following Börgers and Sarin (1997, 2000), we interpret negative reinforcement as arising from payoffs falling short of a common aspiration parameter of zero. A negative payoff, therefore, represent failure to meet aspiration which acts as a stimulus to reduce the propensity of the action that generates that payoff. We then combine the Cross rule with the negative reinforcement rule to obtain our rule of generalized reinforcement. In the generalized reinforcement rule, an action that yields a positive payoff gets positively reinforced as per the Cross rule. On the other hand, if the action yields a negative payoff, it gets negatively reinforced in a manner analogous to the Cross rule. We analyze our model of generalized reinforcement in the setting of a population game. In this setting, members of a large population are repeatedly randomly matched in pairs to play a two player symmetric normal form game. Every agent in the population revises strategy using the generalized reinforcement rule. Our objective is to assess the impact of such individual behavior on aggregate population behavior. To make our analysis tractable, we start with the critical assumption that all agents use the same mixed strategy in the first round of matching in the population game. We also follow the standard procedure, as in Borgers and Sarin (1997), of formulating our reinforcement rule in a way such that the extent of strategy revision declines as the duration of each matching declines. With these assumptions, we establish that as the duration of each matching becomes vanishingly small, the evolution of the mixed strategy of each agent in the population in approximated arbitrarily well by the solution trajectories of an ordinary differential equation system we call the generalized reinforcement (GR) dynamic. Due to our large population setting, we may also interpret the solution of this dynamic as representing the distribution of agents in the population across the different actions. This interpretation corresponds to the conventional definition of a population state in evolutionary game theory models. For the special case in which all payoffs in the game are between 0 and 1, the GR dynamic is identical to the replicator dynamic. This recovers the result in Borgers and Sarin (1997) that the expected change in mixed strategy generated by the Cross rule of positive reinforcement is given by the replicator dynamic. 2 However, if some payoffs are negative, the GR dynamic differs in certain crucial aspects from the replicator dynamic. The most striking difference is that not all Nash equilibria of the game are stationary points of the GR dynamic. For example, in a pure Nash equilibrium in negative payoffs, negative reinforcement reduces the propensity of the equilibrium action. Therefore, such a Nash equilibrium is not a stationary point of the dynamic. Generalized reinforcement, therefore, has the important implication that it enables a society to move away from an equilibrium that fails to meet the aspiration levels of the members of the society. We also show that interior stationary 2 We note that in the Borgers and Sarin (1997) model, the same two players are repeatedly matched to play the game, with each player updating strategies using the Cross rule. Our model, on the other hand, is a population game model in which, in each new matching, a player encounters a new opponent. However, due to our assumption that all players start by using the same strategy, we are able to follow the same technical procedure as in Borgers and Sarin (1997) to generate the GR dynamic from the generalized reinforcement strategy revision process. 2

4 points of the dynamic do not typically correspond to the mixed equilibria of the game. However, monomorphic states, including pure equilibria, in positive payoffs do represent stationary points of the dynamic. We apply the GR dynamic to 2 2 normal form games. The application is not straightforward because the dynamic depends intricately on the distribution of positive and negative signs among the payoff parameters of the game. We use this analysis to discuss the operation of the dynamic in certain interesting 2 2 games, for example, the stag hunt game and the prisoner s dilemma game. These games are used as prototypical models of the problem of establishing cooperation in society. The stag hunt has two pure equilibria, one of them payoff dominated. Under a conventional evolutionary dynamic like the replicator dynamic, society may get trapped in the inferior equilibrium if it starts near that equilibrium. In the prisoner s dilemma, the unique Nash equilibrium is payoff dominated but attracts the replicator dynamic. However, under the generalized reinforcement paradigm, if the inferior equilibrium is in negative payoffs, it cannot be an attracting state. Instead, society is able to move away from that state towards one where the cooperative action may have significant presence. This suggests that if a status quo equilibrium fails to meet aspiration in society, generalized reinforcement is able to facilitate greater cooperation. But we need to modify this conclusion if the inferior equilibrium is in positive payoff in which case, it may remain attracting even under generalized reinforcement. Our final application of the GR dynamic is to a class of n strategy games known as monocyclic games (Hofbauer, 1995). This class of games is characterized by a unique mixed strategy equilibrium and no pure equilibria. An example of such games is a Rock Scissor Paper (RSP) game. We focus on the bad RSP game where the loss from losing is greater than the gain from winning. The bad RSP game is interesting because all standard evolutionary dynamic including the replicator dynamic display non-convergence to the interior equilibrium. Instead, such dynamics cycle away from the equilibrium. However, our simulations suggest that the GR dynamic globally converges to its interior stationary point, which in this case happens to coincide with the Nash equilibrium. Indeed, these simulations show convergence in all RSP games. This gives rise to the conjecture that the GR dynamic converges in all monocyclic games, including such monocyclic games in which other dynamics display cyclical behavior. It is important to note that such convergence typically would not imply convergence to mixed equilibrium. Unfortunately, we have not been able to rigorously prove this conjecture. However, simulations on another such game where other dynamics are known to cycle lend support to this conjecture. The rest of the paper is organized as follows. Section 2 introduces generalized reinforcement. Section 3 derives the GR dynamic and Section 4 analyzes some of its properties, including the stationary points of the dynamic. After the general analysis of 2 2 games in Section 5, we consider four categories of such games games with two pure equilibria, prisoner s dilemma, games with payoff dominant equilibria in dominant strategies and the hawk-dove game in Sections 6 9. The stag hunt game belongs to the first of these categories. Section 10 is on the application to monocyclic games. Section 11 concludes. 3

5 2 Generalized Reinforcement Learning Let U be a n n symmetric two player normal form game. The game has set of pure actions A = {A 1, A 2,, A n }. We denote by u ij the payoff to the row player when the row player plays action A i and the column player A j. 3 We assume that u ij [ 1, 1], for all A i, A j A. We consider a population consisting of a continuum of agents who are randomly matched to play this game. We refer to the game U, the population and the random matching framework as a population game. Each agent in the population carries a mixed strategy, interpreted as their behavioral disposition, which they use to choose their pure action when called upon to do so. We denote the set of mixed strategies of an agent by { } n = x R n : x i 0 for each A i A, with x i = 1. (1) At time t, all agents are randomly matched in pairs and each pair plays the game. Matchings last for a period τ, 0 < τ 1, after which they are rearranged. Once matched, an agent adopts a mixed strategy from which he then (possibly) revises for use during the next matching. In revising his strategy, the agent uses some method of heuristic learning by recalling his personal experience in his previous matching. The agent can observe the actions used by his opponents in the past, or at least infer them from the payoffs received, but has no knowledge of the strategy used by his opponents. We may describe the process of heuristic learning in the population game as follows. Suppose that in a current matching, a player using strategy x plays A i and his opponent uses A j. The row player updates his strategy to x given by i=1 x = x + τf ij (x). (2) We call f ij (x) as the potential strategy revision function. With this interpretation, (2) implies that the proportion of the total strategy revision potential that is realized depends upon τ, the time difference between two successive pairwise matchings in the population game. In particular, we are assuming that as τ 0, then the difference between x and x also becomes infinitesimally low. This assumption becomes crucial when we derive the continuous time limit of the population dynamics generated by a strategy revision rule of the form (2). We assume that the strategy revision function f ij (x) satisfies the following properties: 1. f ij (x) extends to a differentiable function f ij : IR n IR n. 2. n r=1 f ij,r(x) = 0 for x. 3. x r + f ij,r (x) 0 for all 1 r n, x n. 3 We confine ourselves to two-player symmetric games merely for notational convenience. All the ideas involved can be extended easily to multi-player symmetric as well as asymmetric games at the cost of more cumbersome notation. 4

6 Condition 1 is a technical property. The other two conditions ensure that the new strategy x. The strategy revision function can take any form subject to the three conditions. Our main interest is in the case when agents use generalized reinforcement learning to update their strategies. In order to introduce generalized reinforcement, we consider three cases depending upon the range of values the payoffs can take. The first is the well known case of positive reinforcement where all payoffs are positive. In the second case, we introduce negative reinforcement to accommodate the case where all payoffs are negative. The third case combines positive and negative reinforcement to yield generalized reinforcement where payoffs can be both positive and negative. 2.1 Positive Reinforcement Let 0 < u ij 1, for all A i, A j A. Conventional models of reinforcement learning have only considered the case where all payoffs are positive. In such models, payoffs are interpreted as positive stimuli which serve to increase the likelihood of an action the agent uses. The most well known rule of positive reinforcement is the Cross (1973) rule. To specify this rule, we assume an agent with strategy x plays A i encounters an opponent using A j. Under the Cross rule, the agent then revises his strategy to x given by x i = x i + τu ij (1 x i ), (3) x k = x k τu ij x k, k i. (4) We may also describe this rule by specifying the strategy revision function as f ij (x) = (e i x)u i j, (5) where e i R n is the i-th standard basis vector. The Cross rule has been extensively analyzed, for example, in Börgers and Sarin (1997, 2000) and Börgers et al. (2004). Under the Cross rule, the likelihood of the action the agent currently uses always increases. The rule transfers a fraction τu ij of the probability (1 x i ) assigned to other actions to A i. Therefore, higher is the payoff obtained from using A i, the greater is the increase in its likelihood in the next opportunity the agent gets to play the game. Börgers and Sarin (2000) interprets the Cross rule in terms of the aspiration level of an agent. Suppose that an agent aspires to a payoff of s [0, 1]. The probability of playing an action A k A i is then x k = x k + (s τu ij )x k. Hence, if u ij > s, then A i gets reinforced. Setting s = 0 for all agents, we obtain the Cross rule. 2.2 Negative Reinforcement We now extend the fundamental idea behind the Cross rule to develop the notion of negative reinforcement when all payoffs are strictly negative. Let 1 u ij < 0, for all A i, A j A. The payoffs now represent negative stimuli which, by a logical extension of the notion of positive 5

7 reinforcement, should act to reduce the likelihood of the current action. Positive reinforcement rewards a current action by transferring a fraction of the remaining probability to the current action. In contrast, negative reinforcement should penalize the current action by shifting some of its probability to other actions. Of course, x i cannot decrease below zero, so it is reasonable to expect that x i decreases by an amount proportional to x i. The proportion, in turn, is determined by the payoff obtained from the action. To formalize this notion, we assume as before that an agent with strategy x plays A i and encounters an opponent using A j. The updated probability of A i is then x i given by x i = x i + τu ij x i. (6) Since 1 u ij < 0, x i < x i. Therefore, f ij,i (x) = u ij x i < 0. While x i falls, we expect x k for k i to compensate by increasing. Clearly x k cannot increase to more than x k = 1. So it is reasonable to assume that x k increases by an amount proportional to 1 x k. That is, we assume that x k = x k ταu ij (1 x k ), k i. (7) Hence, f ij,k (x) = αu ij (1 x k ), for all A k A i. The value of α is determined by the requirement that n l=1 f ij,l(x) = 0 or n r=1 x r = 1. From (6) and (7), this condition gives, from which we obtain α {(n 1) (1 x i )} = x i, α = α(x i ) = x i n 2 + x i. (8) From (6), (7) and (8), we obtain the reinforcement learning rule for negative payoffs: x i = x i + τu ij x i, x k = x k τu ij x i n 2 + x i (1 x k ), k i. (9) We note that we can also write (9) as x k = x k τu ij x i 1 x k l i (1 x l) τu ij x i available for redistribution among the other actions, A k receives a fraction lower is x k, the higher is the increase in its mass following the redistribution.. Therefore, of the total mass 1 x k l i (1 x l). The In positive reinforcement, we note that x i = 1 implies x i = 1. Once an action is played with certainty, it simply gets repeated on all future occasions. However, this is not so under negative reinforcement. From (6), even if x i = 1, u ij < 0 implies x i < 1. This, in turn, means that all other 1 strategies x k are reinforced to the positive value, u ij n 1. Thus, actions that were not previously available become so. Therefore, negative reinforcement allows an agent to revise his strategy in the future even when he currently plays an action with complete certainty. As we show later, this has significant implications on the population dynamics under generalized reinforcement. In particular, it allows a population to escape from a monomorphic state, which may be a Nash equilibrium, if 6

8 that state has negative payoffs. In the standard interpretation of positive reinforcement, positive parameters represent the amount by which the payoffs exceed the aspiration level of zero. Since all payoffs are positive, agents always obtain more than the aspiration level irrespective of whichever action they use. Therefore, the reinforcement stimulus is always in the direction of increasing the likelihood of the current action. Furthermore, the greater the payoff, the higher is the positive reinforcement. We may provide an analogous interpretation of negative reinforcement. If we set an agent s aspiration level at zero, then negative parameters measure the degree to which the agent s payoffs fall short of the aspired level. Therefore, reinforcement stimulus acts towards reducing the likelihood of the current action with the fall in likelihood greater the lower the payoff. This interpretation also provides the intuition why the likelihood of A i can decline from x i = Generalized Reinforcement We now combine positive and negative reinforcement to define generalized reinforcement. This case applies when when a game has both positive or negative payoffs. We assume that for all actions A i, A j A, u ij [ 1, 1]. We now define the following parameters from the payoff matrix U. u + ij = 1 2 (u ij + u ij ), (10) u ij = 1 2 (u ij u ij ), (11) so that u ij = u + ij + u ij ; u ij = 0 if u ij > 0, and u + ij = 0 if u ij < 0. Therefore, if u ij > 0, then u + ij = u ij and u ij = 0. If, instead, u ij < 0, then u + ij = 0 and u ij = u ij. As in Cases 1 and 2, we assume an agent with strategy x plays A i and encounters an opponent using A j. Combining Cases 1 and 2, we then obtain the agent s updated strategy x i = = x i + τu + ij (1 x i) + τu ij x i, (12) x i x k = x k τu + ij x k τu ij (1 x k ), k i. (13) n 2 + x i In terms of the strategy revision function f, we have f ij,i (x) = u + ij (1 x i) + u ij x i, (14) x i f ij,r (x) = u + ij x r u ij (1 x r ), r i. (15) n 2 + x i Equations (12)-(13) define the rule of generalized reinforcement learning. The updated probability of an action, therefore, depends upon whether the payoff u ij the agent obtains is positive or negative. If u ij > 0 so that u ij = 0, then generalized reinforcement is equivalent to positive reinforcement. On the other hand, if u ij < 0 so that u + ij = 0, then generalized reinforcement reduces to negative reinforcement. We may extend the interpretation of reinforcement learning in terms of an aspiration level to 7

9 generalized reinforcement. We set the aspiration parameter at zero as usual. Then a game with both positive and negative payoffs imply that sometimes, an agent obtains a higher than aspired payoff whereas on other occasions, the payoff is less than the aspiration level. If the payoff is more than zero, then positive reinforcement holds and the likelihood of the current action increases in the future. On the other hand, in the case of negative payoffs, negative reinforcement makes it less likely that the agent uses the current action in the future. This also implies that x i may decline from 1 if the payoff obtained from using A i is negative. 3 Generalized Reinforcement Dynamic We identify a population state with a probability measure over. Formally, a probability measure P is a population state such that P (A), A, denotes the proportion of the population using strategies in A. Our objective is to analyze the way the population state changes as agents revise their strategies using the generalized reinforcement rule (12)-(13) in successive matchings of the population game. A general solution to this problem can be quite complex since it would involve the analysis of a partial differential equation system in an abstract space of probability measures. 4 Since the primary aim of this paper is to introduce generalized reinforcement, we adopt a simpler approach. In particular, we assume that at the initial time t = 0, all agents start with the same behavioral disposition x(0) = x 0. We may describe this as the population state δ x0, the Dirac distribution on x 0. As we argue below, with this simplifying assumption, we can analyze population dynamics using a much simpler ordinary differential equation system. We first provide an intuitive explanation of this approach. Consider an agent during his first matching as he uses strategy x 0. Since the population state is δ x0, the opponent he encounters also uses x 0. As the agent then revises his strategy to x 0, the expected change in his strategy is given by τl(x 0 ) R n, where L k (x) = i,j x i f ij,k (x)x j, (16) with f ij,k (x) being the strategy revision function (12)-(13) under generalized reinforcement. The expected value of x 0 is, therefore, x 0 + τl(x 0 ). Typically, this value will be different from x 0. However, the agent s adjustment of his strategy, τf i,j (x 0 ), slows down as the duration of a matching, τ, declines. Hence, as τ 0, x 0 + τl(x 0 ) becomes an increasingly close approximation of x 0. The argument applies to every agent. Since every agent starts with the same strategy, each agent s strategy at time τ is close to x τ = x 0 + τl(x 0 ) if τ is sufficiently small. But then, a repetition of the earlier argument implies that in the next round of matching in time 2τ, every agent uses a strategy close to x 2τ = x τ + τl(x τ ). We can continue this chain of reasoning for any finite number of matching rounds. Hence, if we have N rounds of matching and Nτ = T, then up to time T, the change in the mixed strategy x of an agent between any two matchings is 4 See Lahkar and Seymour (2012) for an application of this approach to the Cross learning rule of positive reinforcement in population games. 8

10 well approximated by τl(x), where L(x) is given by (16). Formally, for τ small enough, x (mτ) x(mτ) E(x (mτ) x(mτ)) = τl(x(mτ)), for m < N, or x (mτ) x(mτ) τ L(x(mτ)). To make the approximation increasingly accurate, we take the continuous time limit of the strategy adjustment process as τ 0. But τ is simply the time differential between two matchings. We, therefore, conclude that for t [0, T ], the continuous time mixed strategy trajectory x(t) for any agent is given the solution to the differential equation system with initial condition x(t) = x(0). dx dt = L(x), To make this argument rigorous, we can follow the approach adopted by Borgers and Sarin (1997) to derive the replicator dynamic as the continuous time limit of the Cross rule of positive reinforcement learning. Their analysis is in the context of a learning model in which the same two players are repeatedly matched to play the game. But their proof can be easily adapted to the large population random matching context in our model. We, therefore, provide the formal statement of our result in Proposition 3.1 below while referring the reader to Proposition 1 in Borgers and Sarin (1997) for details of the proof. Let x τ (m) be the strategy of an agent in his m-th matching, with the superscript τ denoting the duration of a matching. We rewrite the generalized reinforcement rule (12)-(13) as x τ i (m + 1) = = x τ i (m) + τu + ij (1 xτ i (m)) + τu ij xτ i (m), (17) x τ k (m + 1) = xτ k (m) τu+ ij xτ k (m) x τ τu i (m) ij n 2 + x τ i (m)(1 xτ k (m)), k i. (18) With a slight abuse of notation, we treat x τ as a random variable. We, therefore, obtain a Markov process {x τ (m)} m N in discrete time if we specify the initial vale of the random variable x τ (0) = x(0). This Markov process describes the the process of strategy change of an agent in. If we now assume that every agent starts with the strategy x(0) at time t = 0, then the same Markov process holds for every agent. Since each matching lasts for duration τ, the variable x τ (m) describes the agent s strategy at time mτ. We are interested in the continuous time limiting behavior of the process as τ 0. In the following proposition, we characterize this limit for some finite time T 0, under the conditions that τ 0 and mτ T. Proposition 3.1 Suppose all agents revise strategies according the generalized reinforcement learning rule (17) (18). Further suppose that for all agents, x τ (0) = x(0). Let T [0, ) and assume τ 0 and mτ T. Let x(t) be the solution to the differential equation ẋ = L(x), (19) where L k (x) = i,j x if ij,k (x)x j, with f i,j (x) being the potential strategy revision function (14) (15) 9

11 under generalized reinforcement. Then, x τ (m) converges in probability to x(t) for every agent. For the formal details of the proof, we refer the reader to Borgers and Sarin (1997). Their proof is a straightforward application of results from Norman (1972) on the the continuous time limit of discrete time Markov processes with infinite state spaces. The proof requires that the function L k (x) be polynomial, which as we calculate below, is obviously the case. We call (19) the generalized reinforcement dynamic or the GR dynamic. In order to obtain the precise form of the GR dynamic, we need to calculate the vector field L(x) corresponding to generalized reinforcement. For this purpose, we define two n n matrices, both derived from the payoff matrix U. The first is U + whose components are u + ij defined in (10). The second is U consisting of elements u ij defined in (11). Therefore, U + = [u + ij ] such that u+ ij = u ij if u ij > 0 and u + ij = 0 if u ij 0 (20) U = [u ij ] such that u ij = u ij if u ij < 0 and u ij = 0 if u ij 0 (21) Along with these matrices, we also use a property derived from the function α(x i ) defined in (8). Given a scalar h, we have from (8), α(h) = h 1 n 2+h. Clearly, 0 α(h) n 1 for all 0 h 1, and h + (1 h)α(h) = (n 1)α(h). (22) With these preliminaries, we now compute L k (x) = i,j x if ij,k (x)x k with f ij (x) being the potential strategy revision function under generalized reinforcement defined in (12)-(13). L k (x) = x i f ij,k x j i,j = x kf kj,k + x i f ij,k x j j i k = x k(1 x k )u + kj + x2 k u kj x k x i u + ij (1 x k) α(x i )x i u ij x j j i k i k = { x k u + kj x k x i u + ij + x2 k u kj + x k(1 x k )α(x k )u kj (1 x k) } α(x i )x i u ij x j j i i { = x k u + kj } x i u + ij x j + { x k [x k + (1 x k )α(x k )] u kj (1 x k) } α(x i )x i u ij j i j i { } = x k {e k x} U + x + j (n 1)α(x k )x k u kj (1 x k) i α(x i )x i u ij = x k {e k x} U + x + {(n 1) [α(x k )x k ] e k (1 x k ) [α(x)x]} U x, x j from(22) where α(x)x IR n is the vector with components α(x i )x i. We, therefore, obtain the vector field x j 10

12 L(x) on whose i-th component is L i (x) = x i {e i x} U + x + {(n 1) [α(x i )x i ] e i (1 x i ) [α(x)x]} U x, (23) for 1 i n. From (23), we obtain the GR dynamic ẋ = L(x), with L i (x) being the rate at which the probability associated with action A i, x i, changes. Clearly, the first term in (23), involving U +, is the standard replicator dynamics component associated with positive payoffs. The second term is the component associated with the negative payoffs U. By Proposition 3.1, we may interpret solutions to the GR dynamic in two equivalent ways. The initial common strategy implies that at all future times t, all agents play the same strategy x(t) given by the solution of the GR dynamic with initial point x(0). This is, of course, a limiting result obtained when the duration of each matching, τ, becomes vanishingly small. For τ that is small but remains positive, agents play different strategies at time t > 0, but all those strategies are extremely close to x(t). Equivalently, if the initial population state is δ x0, then at time t > 0, the population state is δ x(t) where x(t) is the solution to the GR dynamic from initial condition x(0). This also means x i (t) is the proportion of agents who play action A i in time t. Therefore, to simplify notation, in the following sections, we identify the population state δ x(t) with the strategy x(t). 3.1 The case n = 2 We now derive the GR dynamic for the special case of a two strategy symmetric game. When the number of strategies, n = 2, α(h) = 1 (see 22), and hence α(x)x = x. Therefore, (23) reduces to L i (x) = x i {e i x} U + x + {x i e i (1 x i )x} U x, i = 1, 2. (24) However, since x 1 + x 2 = 1, only one of these two equations is independent. Writing x = (x 1, x 2 ) = (x, 1 x), we may then specify the GR-dynamic completely by ẋ = L(x), the rate of change in the probability of action A 1. The relevant state space is therefore the interval [0, 1]. To specify L(x), we can expand the two components of (24) more explicitly. The first component, which is the replicator component associated with U +, has the explicit form x 1 {e 1 x} U + = x(1 x)(u + 11 x + u+ 12 (1 x) u+ 21 x u+ 22 (1 x)). (25) The second component, associated with U, is {xe 1 (1 x)x} U x = x { u 11 x + u 12 (1 x)} (1 x) { u 11 x2 + (u 12 + u 21 )x(1 x) + u 22 (1 x)2} = u 11 x3 + u 12 x2 (1 x) u 21 x(1 x)2 u 22 (1 x)3. (26) 11

13 Adding (25) and (26), we obtain the GR dynamic ẋ = L(x) where L(x) = ( u + 11 x2 (1 x) + u + 12 x(1 x)2 u + 21 x2 (1 x) u + 22x(1 x)2) + ( u 11 x3 + u 12 x2 (1 x) u 21 x(1 x)2 u 22 (1 x)3). (27) We may also rewrite L(x) as L(x) = ( u + 11 x2 (1 x) + u + 12 x(1 x)2 u 21 x(1 x)2 u 22 (1 x)3) + ( u 11 x3 + u 12 x2 (1 x) u + 21 x2 (1 x) u + 22 x(1 x)2). (28) The first component in the right hand side of (28) represents the inflow of probability mass into action A 1. Note that this component is positive. The value of x increases if playing A 1 generates a positive payoff or playing A 2 generates a negative payoff. For example, u + 11 x2 (1 x) is the product of the increase in x for a player, u + 11 (1 x), multiplied with the probability x2 of both players in the matching playing A 1. The second component denotes the outflow of mass from A 1. This component is negative. The probability of A 1 declines if it generates a negative payoff or if the other action, A 2, generates a positive payoff. We note that a similar inflow-outflow interpretation holds for the more general n strategy version of the GR dynamic (23). 4 Some Properties of the GR Dynamic The strategy revision rule f under generalized reinforcement defined in (12)-(13) is clearly differentiable. Therefore, from any initial point x(0), the GR dynamic (19) admits a unique solution which is continuous with respect to initial condition. However, for this dynamic to be relevant in a game theoretic context, we also require that it satisfy forward invariance, i.e. from initial condition x(0), the solution trajectory x(t) for all t > 0. This condition ensures that x(t) remains a meaningful description of strategy for all times t. To establish forward invariance, it is sufficient to show that at any point on the boundary of, the GR dynamic never points outward from. To state and prove this property formally, we first define the (n k) dimensional face of. Definition 4.1 Let A = {A 1, A 2,..., A n } be the set of pure strategies. A i = {A i1,..., A ik } A, defines an (n k)-dimensional face of Then a proper subset, (i) = {x x i = 0 for i A i }. (29) In an (n k) dimensional face, x ij = 0 for k actions {A i1,..., A ik }. Therefore, any such face represents a part of the boundary of. The following proposition establishes that from any such face of, the GR dynamic either remains on the face or points inwards into. This is equivalent to showing that for any x (i), ẋ ij 0 for any A ij A i. Clearly, this implies that the dynamic never points outward from the boundary of. 12

14 Proposition 4.2 The (n k)-dimensional face (i) is invariant under the GR dynamic ẋ = L(x) if and only if u ij 0 for all A i, A j / A i. If there exists A i, A j x int i. Proof. For x int (i), we have, for 1 r k, / A i for which u ij < 0, then L(x) points into the interior of for all L ir (x) = [α(x)x] U x = u ij α(x i)x i x j A i,a j / A i 0. Since x i, x j > 0 for x int (i) and i, j / A i, it follows that L ir (x) = 0 if and only if u ij = 0 for all i, j / A i. That is, if and only if u ij 0 for all i, j / A i. If there exists i, j / A i for which u ij < 0, then L ir (x) u ij α(x i)x i x j > 0 for all 1 r k. Hence, the vector field L(x) points into the interior of at x int (i). The calculation in the proof involves points in the interior of (i). However, any point in the boundary of (i) is in the interior of a lower dimensional face. Therefore, the proposition covers every point in the boundary of. The calculation of L ir (x) in the proof of the proposition then suffices to establish forward invariance. However, Proposition 4.2 goes beyond establishing forward invariance. It also establishes a basic distinction between the replicator dynamic and the GR dynamic. It is well known that the boundary faces of is invariant under the replicator dynamic. Hence, any solution trajectory of the replicator dynamic that starts in a particular face remains in that face at all times in the future. However, under the GR dynamic, this is only true in the first part of Proposition 4.2. A face is invariant if and only if actions present in that face always yield positive payoffs. If all payoffs are positive, then all faces are invariant. But this is also the case in which the GR dynamic is equivalent to the replicator dynamic. In the more general case in which some action leads to negative payoff, then the GR dynamic diverts a part of that action s probability into an unused action. This pushes the state variable into the interior of. This also implies that any pure strategy that does not yield a positive payoff is not a stationary point of the GR dynamic. We can establish this result formally as a corollary of Proposition 4.2. Corollary 4.3 The pure strategy e i is a stationary point of the reinforcement dynamics if and only if u ii 0. Proof. Take A i = A \ {i} in Proposition 4.2. In a large population context, a monomorphic state e i is a state in which all agents play the same action A i. Corollary 4.3 implies that from any monomorphic state in which the payoff in negative, the GR dynamic moves away from that state. In particular, a pure strategy Nash equilibrium in 13

15 negative payoffs in not a stationary point of the GR dynamic. This behavior is in contrast to the replicator dynamic in which all monomorphic states are stationary points. Corollary 4.3 establishes that if all monomorphic states of the population game are in negative payoffs, the then only possible stationary points of the GR dynamic are in the interior of. The following proposition establishes the existence of at least one such interior stationary point. To prove the proposition, we define the forward flow of the GR dynamic. The forward flow of the GR dynamic is the function φ t (ξ) = x t where {x t } t [0, ) is the solution to the dynamic with initial condition x 0 = ξ. The existence of a stationary point follows from an application of the Brouwer s fixed point to φ t (ξ). Proposition 4.4 If u ii < 0 for all A i A, then the GR dynamic ẋ = L(x) is inward pointing everywhere on the boundary, and hence the dynamic has only interior stationary points. Proof. If u ii < 0 for all A i A, the Corollary 4.3 implies the GR dynamic is inward pointing. Hence, no pure strategy can be a stationary point of the dynamic. To establish the existence of an interior rest point, note that by forward invariance, the forward flow φ t is a function from to itself. Furthermore, through the standard results on existence, uniqueness and continuity of solutions, φ t : is a continuous function for any t [0, ). Since is a compact set, the Brouwer s fixed point theorem then implies the existence of a fixed point φ t (x) = x for any t [0, ). Clearly, such a fixed point is a stationary point of the GR dynamic. Since u ii < 0 for all A i A, such a stationary point can only be in the interior of. Any mixed Nash equilibrium of U is a stationary point of the replicator dynamic. Therefore, a fully mixed equilibrium of U is an interior stationary point of the replicator dynamic. For the GR dynamic, however, it is not readily apparent how to arrive at such an intuitive characterization of an interior stationary point. We can establish that a mixed Nash equilibrium is not generally a stationary point of the GR dynamic using a simple 2 strategy game. Consider the 2 strategy game U in which u 11 > 0 but the other payoffs are negative. Therefore, u 12, u 21, u 22 < 0. We assume that u 22 > u 12 or u 22 < u 12. So the game has three Nash equilibria, x = 0, x = 1 and the mixed equilibrium x u = 22 u 12 u 22 u 12 +u 11 u 21. If we apply the replicator dynamic to the game, then the three Nash equilibria constitute the set of stationary points of the dynamic. To obtain the stationary points of the GR dynamic, let us apply (28) to U to obtain L(x) = u + 11 x2 (1 x) u 21 x(1 x)2 u 22 (1 x)3 + u 12 x2 (1 x) = (1 x) ( u 11 x 2 + u 12 x 2 u 21 x(1 x) u 22 (1 x) 2). (30) From (30), it is readily apparent that x = 1 is a rest point of the GR dynamic. But at x = 0, L(x) = u 22 > 0. At x = 0, all agents play action A 2 with certainty. Due to the negative payoff, all agents negatively reinforce A 2. Hence, there is an inflow of mass into A 1 and so, x rises from 0. The interior stationary point is given by the solution to u 11 x 2 + u 12 x 2 u 21 x(1 x) u 22 (1 x) 2 = 0. At that solution, the inflow into A 1 is matched by the outflow from that action. However, it is 14

16 evident that this solution bears no relation to the mixed Nash equilibrium x, unless the payoffs are such that x = 1 2. In the special case in which all payoffs are positive so that the GR dynamic reduces to replicator dynamic, mixed equilibria of a game are stationary points of the GR dynamic. Except for this case, the above example in (30) verifies that typically, such Nash equilibria do not constitute stationary points of the GR dynamic. 5 Application: Two Strategy Games We now apply the GR dynamic to two strategy symmetric games. The GR dynamic depends intricately on the signs of the payoffs in the game U. We, therefore, expect the behavior of the dynamic, particularly its long-run asymptotic state, to also depend up on the signs of the payoffs. This is totally unlike the case with conventional evolutionary dynamics like the replicator dynamic. For example, in a prisoner s dilemma game, the replicator dynamic always converges to the unique Nash equilibrium irrespective of the signs of the payoffs. It is readily apparent that such a general conclusion does not hold for the GR dynamic if we consider a prisoner s dilemma where the Nash equilibrium is in negative payoffs. In that case, the Nash equilibrium is not even a stationary point of the GR dynamic. We, therefore, need to distinguish between various distributions of positive and negative signs for a complete analysis of the the GR dynamic in two strategy games. While there are various ways to make the distinction, the most convenient one is to classify such a game U according to the signs of u 11 and u 22. These are the payoffs corresponding to the monomorphic states x = 1 and x = 0, where x is the probability of action A 1 or the proportion of agents using A 1. This gives us four cases to analyze. 5.1 Case 1: u 11 0 and u 22 < 0. In this case, u 11 = u+ 22 = 0. Therefore, from (27), the GR dynamic takes the form L(x) = (1 x) ( u + 11 x2 + u + 12 x(1 x) u 21 x(1 x) u 22 (1 x)2 + u 12 x2 u + 21 x2) = (1 x) ( u + 11 x2 + u + 12 x(1 x) + u 21 x(1 x) + u 22 (1 x)2 u 12 x2 u + 21 x2). (31) We write this equation as L(x) = (1 x)q(x), where q(x) is quadratic in x. We note that q(0) = u 22 > 0 and q(1) = (u+ 11 u 12 u+ 21 ). It is obvious that x = 1 is a stationary point of the GR dynamic in this case. The only other possible stationary points of the dynamic are such roots of q(x) that belong to the interval [0, 1]. We now show that if q(1) > 0, then x = 1 is the globally asymptotic stationary point of the GR dynamic. Let us apply the change of variable ξ = 1 x x and note x [0, 1] = ξ 0. With the 15

17 transformation, we can write q(x) as p(ξ) where p(ξ) = 1 (1 + ξ) 2 ( (u + 11 u 12 u+ 21 ) + (u u 21 )ξ + u 22 ξ2). This transforms the condition q(x) = 0 into the equivalent condition p(ξ) = 0 in terms of the positive variable ξ. If p(ξ ) = 0 for some ξ 0, then x = 1 1+ξ is a root of q(x). However, if q(1) = (u + 11 u 12 u+ 21 ) > 0, then then p(ξ) > 0 for all ξ 0. Therefore, in this case, q(x) has no root in [0, 1]. But with with q(0) > 0, this further implies that q(x) > 0 for all x [0, 1]. Hence, the only stationary point of L(x) is x = 1. For any other x [0, 1], L(x) > 0 which implies that x = 1 is globally asymptotically stable. However, if q(1) < 0, then p(ξ) has the root ξ = 1 } { (u (u u 22 + u 21 ) u 21 )2 4 u 22 (u+11 u 12 u+21 ) Therefore, q(x) has the unique root x = 1 1+ξ (0, 1). In this case, x is another stationary point of the GR dynamic along with x = 1. Furthermore, for ξ > ξ, p(ξ) > 0 and for ξ < ξ, p(ξ) < 0. Given the inverse relationship between x and ξ, this implies that for x < x, q(x) > 0 and for x > x, q(x) < 0. dynamic. (32) This implies that x is the globally asymptotic stationary point of the GR For the borderline case q(1) = 0, ξ = 0. Therefore, x = 1 is the root of q(x). Hence, x = 1 is the only stationary point of the GR dynamic which, furthermore, is globally asymptotically stable. We summarize this discussion in the following proposition. Proposition 5.1 Consider a 2 2 symmetric normal form game with u 11 0 and u 22 < 0. Let ẋ = L(x) be the GR dynamic for this game, where L(x) is given by (31). Then, 1. If u + 11 u 12 u+ 21 0, then x = 1 is the unique stationary point and is globally asymptotically stable. 2. If u + 11 u 12 u+ 21 < 0, then there is a unique interior stationary point x = 1 1+ξ where ξ is given by (32). Further, x is globally asymptotically stable on 0 x < 1. The other stationary point x = 1 is unstable. 5.2 Case 2: u 11 < 0 and u 22 0 Here, u + 11 = u 22 = 0. The GR dynamic, therefore, takes the form L(x) = x ( u + 12 (1 x)2 + u 21 (1 x)2 u 11 x2 u 12 x(1 x) u+ 22 (1 x)2). (33) Instead of going through the formal analysis of the dynamic, we can solve this case directly from Case 1 in Section 5.1 by interchanging the strategies 1 2 and the roles of x and (1 x). We, therefore, obtain the following proposition. 16

18 Proposition 5.2 Consider a 2 2 symmetric normal form game with u 11 < 0 and u Let ẋ = L(x) be the GR dynamic for this game, where L(x) is given by (33). Then, 1. If u + 22 u 21 u+ 12 0, then x = 0 is the unique stationary point and is globally asymptotically stable. 2. If u + 22 u 21 u+ 12 < 0, then there is a unique interior stationary point x = ξ 1+ξ where ξ is given by ξ = 1 } { (u (u u 11 + u 12 ) u 12 )2 4 u 11 (u+22 u 21 u+12 ) Further, x is globally asymptotically stable on 0 < x 1. The other stationary point x = 0 is unstable. 5.3 Case 3: u 11 0 and u In this case, u 11 = u 22 = 0. Hence, the dynamics ẋ = L(x), with L(x) given by (27) reduces to ẋ = x(1 x)l(x), where l(x) = ( u 21 + u+ 12 u+ 22 ) { ( u 21 + u+ 12 u+ 22 ) + ( u 12 + u+ 21 u+ 11 )} x. (34) Note that l(0) = ( u 21 + u+ 12 u+ 22 ), and l(1) = ( u 12 + u+ 21 u+ 11 ). Thus, if ( u 21 + u+ 12 u+ 22 ) and ( u 12 + u+ 21 u+ 11 ) have opposite signs, then l(x) has no zero in the range 0 x 1. Hence, x = 0 and x = 1 are the only stationary points in this case. Further, if l(x) > 0, then x = 1 is globally asymptotically stable, and if l(x) < 0, then x = 0 is globally asymptotically stable. On the other hand, if ( u 21 + u+ 12 u+ 22 ) and ( u 12 + u+ 21 u+ 11 ) have the same sign, then there is an interior equilibrium at x = u 21 + u+ 12 u+ 22 ( u 21 + u+ 12 u+ 22 ) + ( u 12 + u+ 21 (35) u+ 11 ). In this case, we can write the dynamics in the form ẋ = γx(1 x)(x x), (36) where γ = ( u 21 + u+ 12 u+ 22 ) + ( u 12 + u+ 21 u+ 11 ). (37) If γ > 0, then ẋ > 0 for 0 < x < x, and ẋ < 0 for x < x < 1. In this case, x is globally asymptotically stable on 0 < x < 1. On the other hand, if γ < 0, then ẋ < 0 for 0 < x < x and ẋ > 0 for x < x < 1. In this case, x = 0 is locally asymptotically stable, with basin of attraction 0 x < x, and x = 1 is locally asymptotically stable with basin of attraction x < x 1. We summarize the analysis in the following proposition. 17

19 Proposition 5.3 Consider a 2 2 symmetric normal form game with u 11 0 and u Let ẋ = L(x) be the GR dynamic, with L(x) = x(1 x)l(x) and l(x) given by (reflx). Then 1. If ( u 21 + u+ 12 u+ 22 ) and ( u 12 + u+ 21 u+ 11 ) have the opposite signs, then x = 0 and x = 1 are the only stationary points of the GR dynamic. Further, if l(x) > 0, then x = 1 is globally asymptotically stable, and if l(x) < 0, then x = 0 is globally asymptotically stable. 2. If ( u 21 +u+ 12 u+ 22 ) and ( u 12 +u+ 21 u+ 11 ) have the same sign, then there are three stationary points; x = 0, x = 1 and the interior point x defined in (35). Given γ defined in (37), if γ > 0, then x is globally asymptotically stable on 0 < x < 1. If γ < 0, x = 0 and x = 1 are locally asymptotically stable with respective basins of attraction 0 x < x and x < x 1. We leave the reader to explicate the special cases in which one or both of ( u 21 + u+ 12 u+ 22 ) and ( u 12 + u+ 21 u+ 11 ) are zero. 5.4 Case 4: u 11 < 0 and u 22 < 0 In this final case, u + 11 = u+ 22 = 0. Therefore, the GR dynamic (27) reduces to L(x) = u 11 x3 + (u 12 u+ 21 u+ 12 )x2 (1 x) u 21 x(1 x)2 u 22 (1 x)3 + u + 12x(1 x). (38) We note that L(0) = u 22 > 0 and L(1) = u 11 < 0. Therefore, this case provides an example of Proposition 4.4 in which the GR dynamic has only interior stationary points when u ii < 0, for all actions A i A. However, the problem of finding an interior stationary point in this case cannot be reduced to solving for the roots of a quadratic equation. Therefore, we cannot obtain an explicit expression for such a rest point x as a function of the payoffs u ij. Nevertheless, we can establish that there exist exactly one stationary point for the GR dynamic in this case. To prove this claim, apply the change of variable ξ = x / (1 x) and note that x [0, 1] = ξ 0. We can then express the stationarity condition L(x) = 0 in terms of the ξ as P (ξ) = 0, where P (ξ) = L(x) / (1 x) 3 is the cubic P (ξ) = u 11 ξ3 + ( u 12 + u u+ 21 )ξ2 u 21 ξ u 22 u+ 12ξ(1 + ξ) = u 11 ξ3 + ( u 12 + u+ 21 )ξ2 (u u 21 )ξ u 22. Positive solutions of P (ξ) = 0 are equivalent to solutions of k(ξ) = m(ξ), where k(ξ) = u 11 ξ3 + ( u 12 + u+ 21 )ξ2 and m(ξ) = ( u 21 + u+ 12 )ξ + u 22. Now m(ξ) is a straight line with non-negative slope and m(0) > 0. On the other hand, k(0) = 0, and k(ξ) is a positive, strictly convex, increasing function for ξ > 0, with k(ξ) as ξ. It follows immediately that there is exactly on solution ξ > 0 of k(ξ) = m(ξ). The unique stationary point x of L(x) is then determined by ξ = x / (1 x ). Moreover, L(x) > 0 for 0 x < x, and L(x) < 0 for x < x 1. asymptotically stable on 0 x Hence, x is globally

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it) Game theory Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions

More information

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative

More information

Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai

Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai Distributional stability Dai Zusai 1 Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai Tohoku University (Economics) August 2017 Outline

More information

Reinforcement Learning in Evolutionary Games

Reinforcement Learning in Evolutionary Games Reinforcement Learning in Evolutionary Games Ratul Lahkar and Robert M. Seymour October 5, 2010 Abstract We study an evolutionary model in which strategy revision protocols are based on agent specific

More information

Evolution & Learning in Games

Evolution & Learning in Games 1 / 28 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 5. Revision Protocols and Evolutionary Dynamics 2 / 28 Population Games played by Boundedly Rational Agents We have introduced

More information

Irrational behavior in the Brown von Neumann Nash dynamics

Irrational behavior in the Brown von Neumann Nash dynamics Irrational behavior in the Brown von Neumann Nash dynamics Ulrich Berger a and Josef Hofbauer b a Vienna University of Economics and Business Administration, Department VW 5, Augasse 2-6, A-1090 Wien,

More information

Survival of Dominated Strategies under Evolutionary Dynamics. Josef Hofbauer University of Vienna. William H. Sandholm University of Wisconsin

Survival of Dominated Strategies under Evolutionary Dynamics. Josef Hofbauer University of Vienna. William H. Sandholm University of Wisconsin Survival of Dominated Strategies under Evolutionary Dynamics Josef Hofbauer University of Vienna William H. Sandholm University of Wisconsin a hypnodisk 1 Survival of Dominated Strategies under Evolutionary

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Conjectural Variations in Aggregative Games: An Evolutionary Perspective

Conjectural Variations in Aggregative Games: An Evolutionary Perspective Conjectural Variations in Aggregative Games: An Evolutionary Perspective Alex Possajennikov University of Nottingham January 2012 Abstract Suppose that in aggregative games, in which a player s payoff

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics (MIT Press, 200x; draft posted on my website) 1. Population games 2. Revision protocols and evolutionary dynamics 3. Potential games and their applications 4.

More information

Computation of Efficient Nash Equilibria for experimental economic games

Computation of Efficient Nash Equilibria for experimental economic games International Journal of Mathematics and Soft Computing Vol.5, No.2 (2015), 197-212. ISSN Print : 2249-3328 ISSN Online: 2319-5215 Computation of Efficient Nash Equilibria for experimental economic games

More information

Brown s Original Fictitious Play

Brown s Original Fictitious Play manuscript No. Brown s Original Fictitious Play Ulrich Berger Vienna University of Economics, Department VW5 Augasse 2-6, A-1090 Vienna, Austria e-mail: ulrich.berger@wu-wien.ac.at March 2005 Abstract

More information

Lecture Notes on Game Theory

Lecture Notes on Game Theory Lecture Notes on Game Theory Levent Koçkesen Strategic Form Games In this part we will analyze games in which the players choose their actions simultaneously (or without the knowledge of other players

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Zero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory.

Zero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory. Public Strategies Minimax Theorem and Nash Equilibria Appendix 2013 Public Strategies Minimax Theorem and Nash Equilibria Appendix Definition Definition A zero-sum game is a strategic game, in which for

More information

Logit Dynamic for Continuous Strategy Games: Existence of Solutions.

Logit Dynamic for Continuous Strategy Games: Existence of Solutions. Logit Dynamic for Continuous Strategy Games: Existence of Solutions. Ratul Lahkar August 1, 2007 Abstract We define the logit dynamic in the space of probability measures for a game with a compact and

More information

Entropic Selection of Nash Equilibrium

Entropic Selection of Nash Equilibrium Entropic Selection of Nash Equilibrium Zeynel Harun Alioğulları Mehmet Barlo February, 2012 Abstract This study argues that Nash equilibria with less variations in players best responses are more appealing.

More information

EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium

EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium Reading EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium Osborne Chapter 4.1 to 4.10 By the end of this week you should be able to: find a mixed strategy Nash Equilibrium of a game explain why mixed

More information

Algorithmic Game Theory and Applications. Lecture 4: 2-player zero-sum games, and the Minimax Theorem

Algorithmic Game Theory and Applications. Lecture 4: 2-player zero-sum games, and the Minimax Theorem Algorithmic Game Theory and Applications Lecture 4: 2-player zero-sum games, and the Minimax Theorem Kousha Etessami 2-person zero-sum games A finite 2-person zero-sum (2p-zs) strategic game Γ, is a strategic

More information

Fast Convergence in Evolutionary Equilibrium Selection

Fast Convergence in Evolutionary Equilibrium Selection Fast Convergence in Evolutionary Equilibrium Selection Gabriel E Kreindler H Peyton Young September 26, 2011 Abstract Stochastic learning models provide sharp predictions about equilibrium selection when

More information

Irrational behavior in the Brown von Neumann Nash dynamics

Irrational behavior in the Brown von Neumann Nash dynamics Games and Economic Behavior 56 (2006) 1 6 www.elsevier.com/locate/geb Irrational behavior in the Brown von Neumann Nash dynamics Ulrich Berger a,, Josef Hofbauer b a Vienna University of Economics and

More information

Evolution & Learning in Games

Evolution & Learning in Games 1 / 27 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 2. Foundations of Evolution & Learning in Games II 2 / 27 Outline In this lecture, we shall: Take a first look at local stability.

More information

Chapter 9. Mixed Extensions. 9.1 Mixed strategies

Chapter 9. Mixed Extensions. 9.1 Mixed strategies Chapter 9 Mixed Extensions We now study a special case of infinite strategic games that are obtained in a canonic way from the finite games, by allowing mixed strategies. Below [0, 1] stands for the real

More information

A Generic Bound on Cycles in Two-Player Games

A Generic Bound on Cycles in Two-Player Games A Generic Bound on Cycles in Two-Player Games David S. Ahn February 006 Abstract We provide a bound on the size of simultaneous best response cycles for generic finite two-player games. The bound shows

More information

C31: Game Theory, Lecture 1

C31: Game Theory, Lecture 1 C31: Game Theory, Lecture 1 V. Bhaskar University College London 5 October 2006 C31 Lecture 1: Games in strategic form & Pure strategy equilibrium Osborne: ch 2,3, 12.2, 12.3 A game is a situation where:

More information

DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS

DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS Jörgen Weibull March 23, 2010 1 The multi-population replicator dynamic Domain of analysis: finite games in normal form, G =(N, S, π), with mixed-strategy

More information

Payoff Continuity in Incomplete Information Games

Payoff Continuity in Incomplete Information Games journal of economic theory 82, 267276 (1998) article no. ET982418 Payoff Continuity in Incomplete Information Games Atsushi Kajii* Institute of Policy and Planning Sciences, University of Tsukuba, 1-1-1

More information

Chapter III. Stability of Linear Systems

Chapter III. Stability of Linear Systems 1 Chapter III Stability of Linear Systems 1. Stability and state transition matrix 2. Time-varying (non-autonomous) systems 3. Time-invariant systems 1 STABILITY AND STATE TRANSITION MATRIX 2 In this chapter,

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

6.891 Games, Decision, and Computation February 5, Lecture 2

6.891 Games, Decision, and Computation February 5, Lecture 2 6.891 Games, Decision, and Computation February 5, 2015 Lecture 2 Lecturer: Constantinos Daskalakis Scribe: Constantinos Daskalakis We formally define games and the solution concepts overviewed in Lecture

More information

Reinforcement Learning

Reinforcement Learning 5 / 28 Reinforcement Learning Based on a simple principle: More likely to repeat an action, if it had to a positive outcome. 6 / 28 Reinforcement Learning Idea of reinforcement learning first formulated

More information

Pairwise Comparison Dynamics for Games with Continuous Strategy Space

Pairwise Comparison Dynamics for Games with Continuous Strategy Space Pairwise Comparison Dynamics for Games with Continuous Strategy Space Man-Wah Cheung https://sites.google.com/site/jennymwcheung University of Wisconsin Madison Department of Economics Nov 5, 2013 Evolutionary

More information

Iterated Strict Dominance in Pure Strategies

Iterated Strict Dominance in Pure Strategies Iterated Strict Dominance in Pure Strategies We know that no rational player ever plays strictly dominated strategies. As each player knows that each player is rational, each player knows that his opponents

More information

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm * Evolutionary Dynamics and Extensive Form Games by Ross Cressman Reviewed by William H. Sandholm * Noncooperative game theory is one of a handful of fundamental frameworks used for economic modeling. It

More information

Evolutionary Game Theory: Overview and Recent Results

Evolutionary Game Theory: Overview and Recent Results Overviews: Evolutionary Game Theory: Overview and Recent Results William H. Sandholm University of Wisconsin nontechnical survey: Evolutionary Game Theory (in Encyclopedia of Complexity and System Science,

More information

Belief-based Learning

Belief-based Learning Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium

More information

An Introduction to Evolutionary Game Theory

An Introduction to Evolutionary Game Theory An Introduction to Evolutionary Game Theory Lectures delivered at the Graduate School on Nonlinear and Stochastic Systems in Biology held in the Department of Applied Mathematics, School of Mathematics

More information

Equilibria in Games with Weak Payoff Externalities

Equilibria in Games with Weak Payoff Externalities NUPRI Working Paper 2016-03 Equilibria in Games with Weak Payoff Externalities Takuya Iimura, Toshimasa Maruta, and Takahiro Watanabe October, 2016 Nihon University Population Research Institute http://www.nihon-u.ac.jp/research/institute/population/nupri/en/publications.html

More information

A (Brief) Introduction to Game Theory

A (Brief) Introduction to Game Theory A (Brief) Introduction to Game Theory Johanne Cohen PRiSM/CNRS, Versailles, France. Goal Goal is a Nash equilibrium. Today The game of Chicken Definitions Nash Equilibrium Rock-paper-scissors Game Mixed

More information

A Note on the Existence of Ratifiable Acts

A Note on the Existence of Ratifiable Acts A Note on the Existence of Ratifiable Acts Joseph Y. Halpern Cornell University Computer Science Department Ithaca, NY 14853 halpern@cs.cornell.edu http://www.cs.cornell.edu/home/halpern August 15, 2018

More information

Stochastic Evolutionary Game Dynamics: Foundations, Deterministic Approximation, and Equilibrium Selection

Stochastic Evolutionary Game Dynamics: Foundations, Deterministic Approximation, and Equilibrium Selection Stochastic Evolutionary Game Dynamics: Foundations, Deterministic Approximation, and Equilibrium Selection William H. Sandholm October 31, 2010 Abstract We present a general model of stochastic evolution

More information

NEGOTIATION-PROOF CORRELATED EQUILIBRIUM

NEGOTIATION-PROOF CORRELATED EQUILIBRIUM DEPARTMENT OF ECONOMICS UNIVERSITY OF CYPRUS NEGOTIATION-PROOF CORRELATED EQUILIBRIUM Nicholas Ziros Discussion Paper 14-2011 P.O. Box 20537, 1678 Nicosia, CYPRUS Tel.: +357-22893700, Fax: +357-22895028

More information

Game Theory, Population Dynamics, Social Aggregation. Daniele Vilone (CSDC - Firenze) Namur

Game Theory, Population Dynamics, Social Aggregation. Daniele Vilone (CSDC - Firenze) Namur Game Theory, Population Dynamics, Social Aggregation Daniele Vilone (CSDC - Firenze) Namur - 18.12.2008 Summary Introduction ( GT ) General concepts of Game Theory Game Theory and Social Dynamics Application:

More information

Linear Programming in Matrix Form

Linear Programming in Matrix Form Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,

More information

DIMACS Technical Report March Game Seki 1

DIMACS Technical Report March Game Seki 1 DIMACS Technical Report 2007-05 March 2007 Game Seki 1 by Diogo V. Andrade RUTCOR, Rutgers University 640 Bartholomew Road Piscataway, NJ 08854-8003 dandrade@rutcor.rutgers.edu Vladimir A. Gurvich RUTCOR,

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population

More information

BELIEFS & EVOLUTIONARY GAME THEORY

BELIEFS & EVOLUTIONARY GAME THEORY 1 / 32 BELIEFS & EVOLUTIONARY GAME THEORY Heinrich H. Nax hnax@ethz.ch & Bary S. R. Pradelski bpradelski@ethz.ch May 15, 217: Lecture 1 2 / 32 Plan Normal form games Equilibrium invariance Equilibrium

More information

6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games

6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games 6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Asu Ozdaglar MIT March 2, 2010 1 Introduction Outline Review of Supermodular Games Reading: Fudenberg and Tirole, Section 12.3.

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

SONDERFORSCHUNGSBEREICH 504

SONDERFORSCHUNGSBEREICH 504 SONDERFORSCHUNGSBEREICH 504 Rationalitätskonzepte, Entscheidungsverhalten und ökonomische Modellierung No. 02-03 Two-Speed Evolution of Strategies and Preferences in Symmetric Games Possajennikov, Alex

More information

Tijmen Daniëls Universiteit van Amsterdam. Abstract

Tijmen Daniëls Universiteit van Amsterdam. Abstract Pure strategy dominance with quasiconcave utility functions Tijmen Daniëls Universiteit van Amsterdam Abstract By a result of Pearce (1984), in a finite strategic form game, the set of a player's serially

More information

Bargaining Efficiency and the Repeated Prisoners Dilemma. Bhaskar Chakravorti* and John Conley**

Bargaining Efficiency and the Repeated Prisoners Dilemma. Bhaskar Chakravorti* and John Conley** Bargaining Efficiency and the Repeated Prisoners Dilemma Bhaskar Chakravorti* and John Conley** Published as: Bhaskar Chakravorti and John P. Conley (2004) Bargaining Efficiency and the repeated Prisoners

More information

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013 n-person February 25, 2013 n-person Table of contents 1 Equations 5.4, 5.6 2 3 Types of dilemmas 4 n-person n-person GRIM, GRIM, ALLD Useful to think of equations 5.4 and 5.6 in terms of cooperation and

More information

Game Theory Fall 2003

Game Theory Fall 2003 Game Theory Fall 2003 Problem Set 1 [1] In this problem (see FT Ex. 1.1) you are asked to play with arbitrary 2 2 games just to get used to the idea of equilibrium computation. Specifically, consider the

More information

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples February 24, 2011 Summary: We introduce the Nash Equilibrium: an outcome (action profile) which is stable in the sense that no player

More information

Ex Post Cheap Talk : Value of Information and Value of Signals

Ex Post Cheap Talk : Value of Information and Value of Signals Ex Post Cheap Talk : Value of Information and Value of Signals Liping Tang Carnegie Mellon University, Pittsburgh PA 15213, USA Abstract. Crawford and Sobel s Cheap Talk model [1] describes an information

More information

University of Zurich. Best-reply matching in games. Zurich Open Repository and Archive. Droste, E; Kosfeld, M; Voorneveld, M.

University of Zurich. Best-reply matching in games. Zurich Open Repository and Archive. Droste, E; Kosfeld, M; Voorneveld, M. University of Zurich Zurich Open Repository and Archive Winterthurerstr. 190 CH-8057 Zurich http://www.zora.unizh.ch Year: 2003 Best-reply matching in games Droste, E; Kosfeld, M; Voorneveld, M Droste,

More information

Distributed Learning based on Entropy-Driven Game Dynamics

Distributed Learning based on Entropy-Driven Game Dynamics Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)

More information

Selfishness vs Altruism vs Balance

Selfishness vs Altruism vs Balance Selfishness vs Altruism vs Balance Pradeep Dubey and Yair Tauman 18 April 2017 Abstract We give examples of strategic interaction which are beneficial for players who follow a "middle path" of balance

More information

Fast Convergence in Evolutionary Equilibrium Selection 1

Fast Convergence in Evolutionary Equilibrium Selection 1 Fast Convergence in Evolutionary Equilibrium Selection 1 Gabriel E Kreindler H Peyton Young January 19, 2012 Abstract Stochastic selection models provide sharp predictions about equilibrium selection when

More information

Games and Their Equilibria

Games and Their Equilibria Chapter 1 Games and Their Equilibria The central notion of game theory that captures many aspects of strategic decision making is that of a strategic game Definition 11 (Strategic Game) An n-player strategic

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

Self-stabilizing uncoupled dynamics

Self-stabilizing uncoupled dynamics Self-stabilizing uncoupled dynamics Aaron D. Jaggard 1, Neil Lutz 2, Michael Schapira 3, and Rebecca N. Wright 4 1 U.S. Naval Research Laboratory, Washington, DC 20375, USA. aaron.jaggard@nrl.navy.mil

More information

Game interactions and dynamics on networked populations

Game interactions and dynamics on networked populations Game interactions and dynamics on networked populations Chiara Mocenni & Dario Madeo Department of Information Engineering and Mathematics University of Siena (Italy) ({mocenni, madeo}@dii.unisi.it) Siena,

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

Interval values for strategic games in which players cooperate

Interval values for strategic games in which players cooperate Interval values for strategic games in which players cooperate Luisa Carpente 1 Balbina Casas-Méndez 2 Ignacio García-Jurado 2 Anne van den Nouweland 3 September 22, 2005 Abstract In this paper we propose

More information

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks 6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks Daron Acemoglu and Asu Ozdaglar MIT November 4, 2009 1 Introduction Outline The role of networks in cooperation A model of social norms

More information

Evolutionary Game Theory Notes

Evolutionary Game Theory Notes Evolutionary Game Theory Notes James Massey These notes are intended to be a largely self contained guide to everything you need to know for the evolutionary game theory part of the EC341 module. During

More information

An Introduction to Evolutionary Game Theory: Lecture 2

An Introduction to Evolutionary Game Theory: Lecture 2 An Introduction to Evolutionary Game Theory: Lecture 2 Mauro Mobilia Lectures delivered at the Graduate School on Nonlinear and Stochastic Systems in Biology held in the Department of Applied Mathematics,

More information

Evolution Through Imitation in a. Single Population 1

Evolution Through Imitation in a. Single Population 1 Evolution Through Imitation in a Single Population 1 David K. Levine and Wolfgang Pesendorfer 2 First version: September 29, 1999 This version: May 10, 2000 Abstract: Kandori, Mailath and Rob [1993] and

More information

TWO COMPETING MODELSOF HOW PEOPLE LEARN IN GAMES. By Ed Hopkins 1

TWO COMPETING MODELSOF HOW PEOPLE LEARN IN GAMES. By Ed Hopkins 1 Econometrica, Vol. 70, No. 6 (November, 2002), 2141 2166 TWO COMPETING MODELSOF HOW PEOPLE LEARN IN GAMES By Ed Hopkins 1 Reinforcement learning and stochastic fictitious play are apparent rivals as models

More information

1 AUTOCRATIC STRATEGIES

1 AUTOCRATIC STRATEGIES AUTOCRATIC STRATEGIES. ORIGINAL DISCOVERY Recall that the transition matrix M for two interacting players X and Y with memory-one strategies p and q, respectively, is given by p R q R p R ( q R ) ( p R

More information

arxiv: v1 [cs.sy] 13 Sep 2017

arxiv: v1 [cs.sy] 13 Sep 2017 On imitation dynamics in potential population games Lorenzo Zino, Giacomo Como, and Fabio Fagnani arxiv:1709.04748v1 [cs.sy] 13 Sep 017 Abstract Imitation dynamics for population games are studied and

More information

Population Dynamics Approach for Resource Allocation Problems. Ashkan Pashaie

Population Dynamics Approach for Resource Allocation Problems. Ashkan Pashaie Population Dynamics Approach for Resource Allocation Problems by Ashkan Pashaie A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of

More information

Bargaining, Contracts, and Theories of the Firm. Dr. Margaret Meyer Nuffield College

Bargaining, Contracts, and Theories of the Firm. Dr. Margaret Meyer Nuffield College Bargaining, Contracts, and Theories of the Firm Dr. Margaret Meyer Nuffield College 2015 Course Overview 1. Bargaining 2. Hidden information and self-selection Optimal contracting with hidden information

More information

Quantitative Techniques (Finance) 203. Polynomial Functions

Quantitative Techniques (Finance) 203. Polynomial Functions Quantitative Techniques (Finance) 03 Polynomial Functions Felix Chan October 006 Introduction This topic discusses the properties and the applications of polynomial functions, specifically, linear and

More information

Product differences and prices

Product differences and prices Product differences and prices Claude d Aspremont, Jean Jaskold Gabszewicz and Jacques-François Thisse Abstract Under assumptions following Hotelling s 1929 paper Stability in Competition, the possibility

More information

LEARNING IN CONCAVE GAMES

LEARNING IN CONCAVE GAMES LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries

More information

Inertial Game Dynamics

Inertial Game Dynamics ... Inertial Game Dynamics R. Laraki P. Mertikopoulos CNRS LAMSADE laboratory CNRS LIG laboratory ADGO'13 Playa Blanca, October 15, 2013 ... Motivation Main Idea: use second order tools to derive efficient

More information

6 Evolution of Networks

6 Evolution of Networks last revised: March 2008 WARNING for Soc 376 students: This draft adopts the demography convention for transition matrices (i.e., transitions from column to row). 6 Evolution of Networks 6. Strategic network

More information

Problems on Evolutionary dynamics

Problems on Evolutionary dynamics Problems on Evolutionary dynamics Doctoral Programme in Physics José A. Cuesta Lausanne, June 10 13, 2014 Replication 1. Consider the Galton-Watson process defined by the offspring distribution p 0 =

More information

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively); STABILITY OF EQUILIBRIA AND LIAPUNOV FUNCTIONS. By topological properties in general we mean qualitative geometric properties (of subsets of R n or of functions in R n ), that is, those that don t depend

More information

1 Basic Game Modelling

1 Basic Game Modelling Max-Planck-Institut für Informatik, Winter 2017 Advanced Topic Course Algorithmic Game Theory, Mechanism Design & Computational Economics Lecturer: CHEUNG, Yun Kuen (Marco) Lecture 1: Basic Game Modelling,

More information

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria Tim Roughgarden November 4, 2013 Last lecture we proved that every pure Nash equilibrium of an atomic selfish routing

More information

Lecture Notes on Bargaining

Lecture Notes on Bargaining Lecture Notes on Bargaining Levent Koçkesen 1 Axiomatic Bargaining and Nash Solution 1.1 Preliminaries The axiomatic theory of bargaining originated in a fundamental paper by Nash (1950, Econometrica).

More information

Game theory Lecture 19. Dynamic games. Game theory

Game theory Lecture 19. Dynamic games. Game theory Lecture 9. Dynamic games . Introduction Definition. A dynamic game is a game Γ =< N, x, {U i } n i=, {H i } n i= >, where N = {, 2,..., n} denotes the set of players, x (t) = f (x, u,..., u n, t), x(0)

More information

Question 1. (p p) (x(p, w ) x(p, w)) 0. with strict inequality if x(p, w) x(p, w ).

Question 1. (p p) (x(p, w ) x(p, w)) 0. with strict inequality if x(p, w) x(p, w ). University of California, Davis Date: August 24, 2017 Department of Economics Time: 5 hours Microeconomics Reading Time: 20 minutes PRELIMINARY EXAMINATION FOR THE Ph.D. DEGREE Please answer any three

More information

Weak Dominance and Never Best Responses

Weak Dominance and Never Best Responses Chapter 4 Weak Dominance and Never Best Responses Let us return now to our analysis of an arbitrary strategic game G := (S 1,...,S n, p 1,...,p n ). Let s i, s i be strategies of player i. We say that

More information

SEQUENTIAL EQUILIBRIA IN BAYESIAN GAMES WITH COMMUNICATION. Dino Gerardi and Roger B. Myerson. December 2005

SEQUENTIAL EQUILIBRIA IN BAYESIAN GAMES WITH COMMUNICATION. Dino Gerardi and Roger B. Myerson. December 2005 SEQUENTIAL EQUILIBRIA IN BAYESIAN GAMES WITH COMMUNICATION By Dino Gerardi and Roger B. Myerson December 2005 COWLES FOUNDATION DISCUSSION AER NO. 1542 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Non-zero-sum Game and Nash Equilibarium

Non-zero-sum Game and Nash Equilibarium Non-zero-sum Game and Nash Equilibarium Team nogg December 21, 2016 Overview Prisoner s Dilemma Prisoner s Dilemma: Alice Deny Alice Confess Bob Deny (-1,-1) (-9,0) Bob Confess (0,-9) (-6,-6) Prisoner

More information

Symmetries and the Complexity of Pure Nash Equilibrium

Symmetries and the Complexity of Pure Nash Equilibrium Symmetries and the Complexity of Pure Nash Equilibrium Felix Brandt a Felix Fischer a, Markus Holzer b a Institut für Informatik, Universität München, Oettingenstr. 67, 80538 München, Germany b Institut

More information

The Game of Normal Numbers

The Game of Normal Numbers The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a

More information

Walras-Bowley Lecture 2003

Walras-Bowley Lecture 2003 Walras-Bowley Lecture 2003 Sergiu Hart This version: September 2004 SERGIU HART c 2004 p. 1 ADAPTIVE HEURISTICS A Little Rationality Goes a Long Way Sergiu Hart Center for Rationality, Dept. of Economics,

More information

Economics 201B Economic Theory (Spring 2017) Bargaining. Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7).

Economics 201B Economic Theory (Spring 2017) Bargaining. Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7). Economics 201B Economic Theory (Spring 2017) Bargaining Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7). The axiomatic approach (OR 15) Nash s (1950) work is the starting point

More information

WHEN ORDER MATTERS FOR ITERATED STRICT DOMINANCE *

WHEN ORDER MATTERS FOR ITERATED STRICT DOMINANCE * WHEN ORDER MATTERS FOR ITERATED STRICT DOMINANCE * Martin Dufwenberg ** & Mark Stegeman *** February 1999 Abstract: We demonstrate that iterated elimination of strictly dominated strategies is an order

More information