Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Size: px
Start display at page:

Download "Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden"

Transcription

1 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized payoffs, and does not need to know the actions or payoffs of anyone else. We demonstrate a simple, completely uncoupled learning rule such that, in any finite normal form game with generic payoffs, the players realized strategies implements a Pareto optimal coarse correlated (Hannan) equilibrium a very high proportion of the time. A variant of the rule implements correlated equilibrium a very high proportion of the time. I. INTRODUCTION This paper builds on a recent literature that seeks to identify learning rules that lead to equilibrium without the usual assumptions of perfect rationality and common knowledge. Of particular interest are learning rules that are simple to implement and require a minimum degree of information about what others in the population are doing. Such rules can be viewed as models of behavior in games with many dispersed agents and very limited observability. They also have practical application to the design of distributed control systems, where the agents can be designed to respond to their environment in ways that lead to desirable system-wide outcomes. One can distinguish between various classes of learning rules depending on the amount of information they require. A rule is uncoupled if it does not require any knowledge of the payoffs of the other players [1]. A rule is completely uncoupled if it does not require any knowledge of the actions or payoffs of the other players [2]. The latter paper identifies a family of completely uncoupled learning rules that come close to Nash equilibrium (pure or mixed) This research was supported by AFOSR grants #FA and #FA and by ONR grant #N J. R. Marden is with the Department of Electrical, Computer, and Energy Engineering, University of Colorado, Boulder, CO 80309, jason.marden@colorado.edu.

2 2 with high probability in two-person normal form games with generic payoffs. Subsequently, [3] showed that similar results hold for n-person normal form games with generic payoffs. Lastly, [4] exhibited a much simpler class of completely uncoupled rules that lead to Nash equilibrium in weakly acyclic games. These learning algorithms all have the feature that agents occasionally experiment with new strategies, which they adopt if they lead to higher realized payoffs. In [5], this approach was further developed by making an agent s search behavior dependent on his mood (an internal state variable). Changes in mood are triggered by changes in realized payoffs relative to the agent s current aspiration level. Rules of this nature can be designed that select pure Nash equilibria in any normal form game with generic payoffs that has at least one pure Nash equilibrium. Moreover the rule can be designed so that it selects a Pareto optimal pure Nash equilibrium [6] or even a Pareto optimal action profile (irrespective of whether this action profile is a pure Nash equilibrium) [7]. There is a quite different class of learning dynamics that leads to coarse correlated equilibrium (alternatively correlated equilibrium). These rules are based on the concept of no-regret. They can be formulated so that they depend only on a player s own realized payoffs, that is, they are completely uncoupled [8] [10]. However, while the resulting dynamics converge almost surely to the set of correlated equilibria, they do not necessarily converge to or even approximate correlated equilibrium behavior at a given point in time. The contribution of this paper is to demonstrate a class of completely uncoupled learning rules that bridges these two approaches. In overall structure the rules are similar to the learning dynamics introduced in [5] [7]. Like the no-regret rules, our approach selects (coarse) correlated equilibria instead of Nash equilibria. Unlike no-regret learning, however, our rule leads to equilibrium in the sense that players strategies actually constitute a coarse correlated equilibrium a high proportion of the time. In fact, as a bonus, they constitute a Pareto optimal coarse correlated equilibrium a high proportion of the time. It is important to highlight that there have been great strides in developing polynomialtime algorithms for computing (coarse) correlated equilibria, e.g., [11] [14]. The starting point associated with these algorithms is a complete representation of the game. Unfortunately, the applicability of these algorithms to the design of distributed control systems is limited as such representations are typically not available. Hence, the focus of this paper is on identifying distributed algorithms where agents can learn to play an efficient (coarse) correlated equilibrium

3 3 under less stringent informational demands. II. PRELIMINARIES Let G be a finite strategic-form game with n agents. The set of agents is denoted by N := {1,..., n}. Each agent i N has a finite action set A i and a utility function U i : A R, where A = A 1 A n denotes the joint action set. We shall henceforth refer to a finite strategic-form game simply as a game. For any joint distribution q = {q a } a A (A) where (A) denotes the simplex over the joint action set A, we extend the definition of an agent s utility function in the usual fashion U i (q) = a A U i (a)q a. The set of coarse correlated equilibria can then be characterized by the set of joint distributions { CCE = q (A) : U i (a)q a } U i (a i, a i )q a, i N, a i A i a A a A which is by definition non-empty. In this paper we focus on the derivation of learning rules that provide convergence to an efficient coarse correlated equilibria of the form q arg max q CCE U i (q). To that end, we consider the framework of repeated one-shot game where a given game G is repeated once each period t {0, 1, 2,... }. In period t, the agents simultaneously choose actions a(t) = (a 1 (t),..., a n (t)) and receive payoffs U i (a(t)). Agent i N chooses the action a i (t) according to a probability distribution p i (t) (A i ), which we refer to as the strategy of agent i at time t. We adopt the convention that p a i i (t) is the probability that agent i selects action a i at time t according to the strategy p i (t). Here, an agent s strategy at time t can only rely on observations from the one-shot games played at times {0, 1, 2,..., t 1}. Different learning algorithms are specified by the agents available information and the mechanism by which their strategies are updated as information is gathered. Here, we focus on one of the most informationally restrictive class of learning rules, termed completely uncoupled or payoff-based, where agents only have access to: (i) the action they played and (ii) the payoff they received. More formally, the strategy adjustment mechanism of agent i takes the form ( ) p i (t) = F i {a i (τ), U i (a(τ))} τ=0,...,t 1. (1) i N

4 4 Recent work has shown that for finite games with generic payoffs, there exist completely uncoupled learning rules that lead to Pareto optimal Nash equilibria [6] and also Pareto optimal action profile irrespective of whether or not they are a pure Nash equilibrium [7]; see also [5], [15], [16]. Here, we exhibit a different class of learning procedures that lead to efficient coarse correlated equilibria. III. ALGORITHM DESCRIPTION We will now introduce a payoff based learning algorithm which ensures that the agents collective behavior will constitute a coarse correlated equilibrium that maximizes the sum of the players average payoffs with high probability. In the forthcoming algorithm, each agent will commit to playing a sequence of actions as opposed to just a single action when faced with a decision. More specifically, the set of possible sequenced actions for agent i will be represented by the set Ai = k=1,...,w A k i where w represents the maximum length of a sequence of actions that any agent will play and A k i defines all action sequences of length k for agent i. Accordingly, if agent i commits to playing a sequence of actions a i A i of length l i = a i w at time t, then the resulting sequence of play for agent i is a i (t) = a i (1) a i (t + 1) = a i (2). =. a i (t + l i 1) = a i (l i ) The following algorithm follows the theme of [5] where an agent s search behavior is dependent on his mood (an internal state variable). Changes in mood are triggered by changes in realized payoffs relative to the agent s current aspiration level. In the following, we provide an informal description of the forthcoming algorithm. We divide the algorithm into two parts, agent dynamics and state dynamics, for a more fluid presentation. Agent Dynamics: At any given time, the specific action that agent i plays at time t > 0, i.e., a i (t), is determined solely by the agent s local state variable, which we represent by x i (t). The details of this state variable will be described in detail in the ensuing section. Since the agents commit to playing action sequences, most of the times the specific action played will merely be

5 5 the component of the action sequence that is to come next. At the end of an action sequence, each agent has the opportunity to revise his strategy and select a new action sequence. Here, each agent has an internal state variable (content, discontent, hopeful, and watchful) which governs this process in the following way. First, each agent has a baseline action sequence and a baseline utility. Roughly speaking, each agent presumes that the average utility attained by playing this baseline action sequence will be the baseline utility. When this is true, we say that the baseline action sequence and baseline utility are aligned. When an agent is content, the agent will select his baseline action sequence with high probability. Occasionally, the agent will experiment with an constant action sequence of the same length as his baseline action sequence. When an agent is discontent, the agent will select an action sequence of arbitrary length. When an agent is hopeful or watchful, an agent will repeat his baseline action sequence with certainty. Hopeful and watchful are intermediate states that are triggered when the realized average utility does not match the baseline utility. Hence, the agent enters an intermediate mode where he waits to get a better observation before overreacting. State Dynamics: At any given time, the state of each agent i, i.e., x i (t + 1), will be updated using only information regarding the previous state x i (t), the decision of agent i at time t, i.e., a i (t), and the utility of agent i at time t, i.e., U i (a(t)). As with the agent dynamics, the key state components will only change when an agent has completed a given action sequence. The key component of the state dynamics will be deriving how each agent s mood changes as a function of the (i) baseline utility and the (ii) the average utility received over the previously played action sequence. Roughly speaking, the process can be described as follows: A player switches from content to discontent for sure if his average utility is below his baseline utility for several periods in a row and he was not experimenting. He may also switch spontaneously from content to discontent with a very small probability even if this is not the case. A player switches from discontent to content with a probability that is an increasing function of his current average payoff (in which case he takes the previous action sequence and its realized average payoff as his new baseline). The details associated with the intermediate states hopeful and watchful will be spelled out

6 6 later. Their role will become clear when we give the learning rule in detail. A. Notation At each point in time, the action of agent i N can be represented by the tuple [ a i, a i ], where Agent i s action sequence is a i A i. Agent i s current action is a i a i. At each point in time an agent s state can be represented by the tuple a i : Trial sequence of actions k i : Element of trial sequence of actions currently on a b i : Baseline trial sequence of actions x i = u b i : Payoff over baseline trial sequence of actions u i : Payoff over trial sequence of actions m i : Mood (content, discontent, hopeful, or watchful) c H/W i : Counter for number of times hopeful/watchful periods repeated L H/W i : Number of times hopeful/watchful periods will be repeated The first three components of the state { a i, u i, k i } correspond to the action sequence that is currently being played by agent i. The action sequence is represented by a i A i. The counter k i {1,..., a i } keeps track of what component of a i the agent should play next. Lastly, the payoff u i represent the average utility received over the first (k i 1) iterations of the action sequence a i. The fourth and fifth components of the state { a b i, u b i} correspond to the baseline action sequence and baseline payoff. The baseline action sequence is represented by a b i A i and the baseline payoff u b i captures the average utility received for the baseline sequence of actions. The baseline payoff is used as a gauge to determine whether experimentations with alternative action sequences is advantageous. The sixth component of the state is the mood m i, which can take on four values: content (C), discontent (D), hopeful (H), and watchful (W). Each of the moods will lead to different types of behavior from the player as will be discussed in detail. The seventh and eighth components of the state {c H/W i, L H/W i } represent counters on the number of times that either a hopeful or watchful mood has been repeated. The number

7 7 L H/W i {0} {w + 1,..., w n + w} prescribes the number of times that the intermediate state (hopeful or watchful) should be repeated. The number c H/W i {0, 1, 2,..., w n + w} prescribes the number of times that the intermediate state (hopeful or watchful) has already been repeated. Accordingly, c H/W i watchful, we adopt the convention that c H/W i B. Formal Algorithm Description L H/W i. In the case when the mood is not hopeful or = L H/W i = 0. We divide the dynamics into the following two parts: the agent dynamics and the state dynamics. Without loss of generality we shall focus on the case where agent utility functions are strictly bounded between 0 and 1, i.e., for any agent i N and action profile a A we have 1 > U i (a) 0. Lastly, we define a constant c > n which will be utilized in the following algorithm. Agent Dynamics: Fix an experimentation rate ɛ > 0. The dynamics for agent i only rely on the state of agent i at that given time. Let x i (t) = [ a i, u i, k i, a b i, u b i, m i, c H/W i, L H/W i ] be the state of agent i at time t. For the following dynamics, each agent only has the opportunity to change strategies at the beginning of a planning window. Accordingly, if k i > 1 then a i (t) = a i, (2) a i (t) = a i (k i ), (3) where a i (k i ) denotes the k i -th component of the vector a i. If k i = 1, then a player makes a decision based on the player s underlying mood: Content (m i = C): In this state, the agent chooses a sequence of actions a i A i according to the following probability distribution 1 ɛ c for a i = a b i, Pr [ a i (t) = a i ] = ɛ c A i for any a i = (a i,..., a i ) A ab i i where a i A i, where A i represents the cardinality of the set A i. The action is then chosen as a i (t) = a i (1; t) where a i (1; t) denotes the first component of the vector a i (t). 1 (4) 1 We could consider variations of deviations to stabilize alternative equilibria, e.g., correlated equilibria. In particular, if (4) focused on conditional deviations as opposed to unconditional deviations, then the forthcoming dynamics would stabilize efficient correlated equilibria as opposes to efficient coarse correlated equilibria.

8 8 Discontent (m i = D): In this state, the agent chooses a sequence of actions a i according to the following probability distribution: Pr [ a i (t) = a i ] = 1 A i for every a i A i. (5) Note that the baseline action and utility play no role in the agent dynamics when the agent is discontent. The action is then chosen as a i (t) = a i (1; t). Hopeful (m i = H) or Watchful (m i = W ): In either of these states, the agent selects his trial action sequence, i.e., a i (t) = a i, (6) a i (t) = a i (1; t). (7) Note that the first component of the state vector corresponds to the current trial action. Hence, the agent dynamics update purely this component of the state vector. State Dynamics: First, the majority of the state components only change at the end of a sequence of actions. Let x i (t) = [ a i, u i, k i, a b i, u b i, m i, c H/W i, L H/W i ] be the state of agent i at time t, a i = a i (t) be the action sequence played at time t, a i (t) = a i (k i ) be the action that agent i played at time t, and U i (a(t)) be the utility player i received at time t. If k i < a i, then a i a i x i (t) = u i k i a b i u b i m i c H/W i L H/W i x i (t + 1) = k i 1 k i u i + 1 k i U i (a(t)) k i + 1 a b i u b i m i c H/W i L H/W i Otherwise, if k i = a i then the state is updated according to the underlying mood as follows: For shorthand notation, we define the running average of the payoff over the trial actions as u i (t) = k i 1 u i + 1 U i (a(t)). k i k i

9 9 Content (m i = C): If [ a i, u i (t)] = [ a b i, u b i], the state of agent i is updated as [ a b i, u b i, 1, a b i, u b i, C, 0, 0 ] with probability 1 ɛ 2c, x i (t + 1) = [ a b i, u b i, 1, a b i, u b i, D, 0, 0 ] with probability ɛ 2c. (8) If a i a b i, the state of agent i is updated as [ a i, u i (t), 1, a i, u i (t), C, 0, 0] if u i (t) > u b i, x i (t + 1) = [ a b i, u b i, 1, a b i, u b i, C, 0, 0 ] if u i (t) u b i. If a i (t) = a b i but u i (t) u b i, the state of agent i is updated as [ ] a b i, u b i, 1, a b i, u b i, H, 1, L H i if u i (t) > u b i, x i (t + 1) = [ ] a b i, u b i, 1, a b i, u b i, W, 1, L W i if ui (t) < u b i, where L H i (or L W i ) is randomly selected from the set {w + 1,..., w n + w} with uniform probability. 2 Discontent (m i = D): The new state is determined by the transition [ a i, u i (t), 1, a i, u i (t), C, 0, 0] with probability ɛ 1 ui(t), x i (t + 1) = [ a i, u i (t), 1, a i, u i (t), D, 0, 0] with probability 1 ɛ 1 ui(t). Hopeful (m i = H): First, it is important to highlight that if the mood of any player i N is hopeful then a i = a b i. The new state is determined as follows: If c H i < L H i, then If c H i = L H i and u i (t) u b i, then x i (t + 1) = [ a i, u i (t), 1, a i, u b i, H, c H i + 1, L H i ]. If c H i = L H i and u i (t) < u b i, then x i (t + 1) = [ a i, u i (t), 1, a i, u i (t), C, 0, 0]. x i (t + 1) = [ a i, u i (t), 1, a i, u b i, W, 1, L W i ], where L W i is randomly selected from the set {w + 1,..., w n + w} with uniform probability. Watchful (m i = W ): First, it is important to highlight that if the mood of any player i N is watchful then a i = a b i. The new state is determined as follows: If c W i < L W i, then x i (t + 1) = [ a b i, u i (t), 1, a b i, u b i, W, c W i + 1, L W i ]. 2 The need for this repetition arises from the fact that each of the agents could be playing action sequences of distinct lengths. The purpose of this repetition will become more clear during the proof.

10 10 If c W i = L W i and u i (t) < u b i, then If c W i = L W i and u i (t) u b i, then x s i (t + 1) = [ a b i, u i (t), 1, a i, u i (t), D, 0, 0 ]. x i (t + 1) = [ a i, u i (t), 1, a b i, u b i, H, 1, n H i ], where n H i is randomly selected from the set {w + 1,..., w n + w} with uniform probability. IV. MAIN RESULT Before stating the main result we introduce a bit of notation. Let X = i X i denote the full set of states of the players where X i is the set of possible states for player i. For a given state [ ] x = (x 1,..., x n ) where x i = a i, u i, k i, a b i, u b i, m i, c H/W i, L H/W i, define the ensuing sequence of baseline actions as follows: for every k {0, 1, 2,... } and agent i N we have a i (k x i ) = a b i(k + k i ) where we write a b i(k + k i ) even in the case when k + k i > a b i with the understanding that this implies the component ((k + k i 1) mod a b i ) + 1. We express the sequence of joint action profiles by a(k x) = (a 1 (k x 1 ),..., a n (k x n )). Define the average payoff over the forthcoming periods (provided that all players play according to their baseline action) for any player i N and period l {1, 2,... } as u i (0 x) = k i 1 u i + a a i k i k i + 1 i U i (a(k x)), (9) a i a i u i (l x) = 1 a i l a i k i k=l a i k i +1 k=0 U i (a(k x)). (10) We will characterize the above dynamics by analyzing the empirical distribution of the joint distribution. To that end, define the empirical distribution of the joint actions associated with the baseline sequence of actions for a given state x by q(x) = {q a (x)} a A (A) where q a (x) = lim t = t τ=0 Q i N a i I{a = a(τ x)}, (11) t + 1 τ=1 I{a = a(τ x)} i N a, (12) i

11 11 where I{ } represents the usual indicator function and the equality derives from the fact that players are repeating finite sequence of actions which ensures that for any k {0, 1,... } we have a(k x) = a ( k + i N a i x ). (13) Define the set of states which induce coarse correlated equilibria through repeated play of the baseline sequence of actions as X CCE := {x X : q(x) CCE} Lastly, define the set of states X which induce coarse correlated equilibria and are aligned, i.e., X = {x X : x X CCE, u i (0 x) = u i (k x) i N, k {1, 2,... }}. Note that in general the set X need not be empty. In fact, a sufficient condition for X to not be empty is { {q (a) : q a k=1,...,w 0, 1 k,..., k 1 } k, 1 } for all a A X CCE. The process described above can be characterized as a finite Markov chain parameterized by an exploration rate ɛ > 0. The following theorem characterizes the support of the limiting stationary distribution, whose elements are referred to as the stochastically stable states [17]. More precisely, a state x X is stochastically stable if and only if lim ɛ 0 + µ(x, ɛ) > 0 where µ(x, ɛ) is a stationary distribution of the process P ɛ for a fixed ɛ > 0. Our characterization requires a mild degree of genericity in the agents utility functions, which is summarized by the following notion of interdependence as introduced in [5]. Definition 1 (Interdependence). An n-person game G on the finite action space A is interdependent if, for every a A and every proper subset of agents J N, there exists an agent i / J and a choice of actions a J j J A j such that U i (a J, a J) U i (a J, a J ). Theorem 1. Let G be an finite interdependent game and suppose all players follow the above dynamics. If X, then a state x X is stochastically stable if and only if x X and u i (0 x) = max i N x X i N u i (0 x ).

12 12 If X =, then a state x X is stochastically stable if and only if u i (0 x) = max U i (a). a A i N This theorem demonstrates that as the exploration rates ɛ 0 +, the process will spend most of the time at the efficient coarse correlated equilibrium provided that the (discretized) set of coarse correlated equilibria in nonempty. If this set is empty, then the process will spend most of the time at the action profile which maximizes the sum of the agent s payoffs. We prove this theorem using the theory of resistance tree for regular perturbed processes developed in [18]. We provide a brief review of the theory of resistance trees in the Appendix. For a detailed review, we direct the readers to [18]. i N V. PROOF OF THEOREM 1 Let X i denote the set of admissible states for agent i. The above dynamics induce a Markov process over the finite state space X = i N X i. We shall denote the transition probability matrix by P ɛ for each ɛ > 0. Computing the stationary distribution of this process is challenging because of the large number of states and the fact that the underlying process is not reversible. Accordingly, we shall focus on characterizing the support of the limiting stationary distribution, whose elements are referred to as the stochastically stable states [17]. More precisely, a state x X is stochastically stable if and only if lim ɛ 0 + µ(x, ɛ) > 0 where µ(x, ɛ) is a stationary distribution of the process P ɛ for a fixed ɛ > 0. The proof of the above theorem will encompass two major parts. The first part involves characterizing the recurrence classes of the unperturbed process. The unperturbed process is the process induced by ɛ = 0. The importance of the first part of the proof centers on the fact that the stochastically stable states are contained in the recurrence classes of the unperturbed process. The second part of the proof involves characterizing the limiting behavior of the process using the theory of resistance trees for regular perturbed processes [18]. In particular, the theory of resistance trees provides a tool for evaluating which of the recurrence classes are stochastically stable. A. Part #1: Characterizing the recurrence classes of the unperturbed process The following lemma characterizes the recurrence classes of the unperturbed process. We will prove this lemma by a series of claims which we will present after the lemma.

13 13 Lemma 2. A state x = (x 1,..., x n ) is in a recurrence class of the unperturbed process P 0 if and only if the state x is in one of the following two forms: Form #1: The state for every agent i N is of the form x i = [ a i, u i, k i, a i, u b i, C, 0, 0 ] where a i A i, k i {1,..., a i }, and u b i = u i (l x) for every l {0, 1, 2,... }. Form #2: The state for every agent i N is of the form where a i A i and k i {1,..., a i }. x i = [ a i, u i, k i, a i, u b i, D, 0, 0 ] We begin by showing that the any state of the above forms is in fact a recurrence class of the unperturbed process. With that goal in mind, let C 0 represent all states of Form #1 and D 0 represent all states of Form #2. First, the set of states D 0 represents a single recurrence class of the unperturbed process since the probability of transitioning between any two states x 1, x 2 D 0 is O(1) and when ɛ = 0 there is no possibility of exiting from D 0. 3 Second, for any state x C 0, all components of the state will remain constant for all future times except for the counter {k i } i N. This is a result of the third condition which ensures that the payoff associated with all future periods, where we use the term period to describe the entire sequence of actions, is identical to the baseline payoff. Since players are repeating actions of finite length, we know that we will return to the same counters {k i } i N in exactly i N a i iterations. Hence, x is a recurrence class of the unperturbed process. We will now show through a series of claims that any state not of the above forms is not in a recurrence class of the above process. The first claim will show that in any recurrence class, there must be an equivalence between the baseline action vector and the trial action vector. Claim 3. If a state x = (x 1,..., x n ) is in a recurrence class of the unperturbed process P 0, then for every player i N the baseline action vector and the trial action vector must be identical, i.e., the state x i is of the form x i = [ a i, u i, k i, a i, u b i, m i, c H/W i, n H/W i ] 3 We use the notation O(1) to denote probabilities that are on the order of 1, i.e., probabilities that are bounded away from 0.

14 14 where m i {C, D, H, W }. Proof: According to the specified dynamics, we know that if a i a b i then the agent must be content, i.e., the state is of the form: x i = [ a i, u i, k i, a b i, u b i, C, 0, 0] where k i a i. For notational simplicity, let l i = a i be the number of actions in the trial vector of agent i. Given this state, the action of player i over the next l i k i + 1 iterations will be a i (1) = a i (k i ). =. a i (l i k i + 1) = a i (l i ) with probability 1. Let a i (1), a i (2),..., denote the ensuing sequence of actions chosen by the other players j i according to the unperturbed process. Define the running average of the trial action for player i over the next l i k i + 1 iterations as u i (1) = k i 1 u i + 1 U i (a(1)), k i u i (2) = k i k i k i + 1 u i(1) + 1 k i + 1 U i(a(2)),. =. ( ) li 1 u i (l i k i + 1) = u i (l i k i ) + l i ( 1 l i ) U i (a(l i k i + 1)). The state of player i evolves over the next l i k i iterations according to where for every k {1,..., l i k i } we have x i x i (1) x i (2) x i (l i k i ) x i (k) = [ a i, u i (k), k i + k, a b i, u b i, C, 0, 0]. The ensuing state resulting from the transition x i (l i k i ) x i (l i k i + 1) is then of the form [ a i, u i (l i k i + 1), 1, a i, u i (l i k i + 1), C, 0, 0] if u i (l i k i + 1) > u b i x i (l i k i + 1) = [ a b i, u b i, 1, a b i, u b i, C, 0, 0 ] if u i (l i k i + 1) u b i

15 15 Hence, irrespective of the play of the other players j i, player i returns to a content state with a i = a b i within l i periods with probability 1. Furthermore, when ɛ = 0 a player will never experiments in a content state; hence, a i = a b i for all future time periods. This completes the proof. The following claim will show that the average payoff received over all subsequent periods must be the same for any player in any recurrence class of the unperturbed process. Claim 4. If a state x = (x 1,..., x n ) is in a recurrence class of the unperturbed process then for every agent i N where m i {C, H, W } we have u i (0 x) = u i (l x) for every l {1, 2,... }. Proof: Suppose the state x is of the form depicted in Claim 3. We will focus on the case where every player continues to play according to their baseline action vector (regardless of their mood) which occurs with probability O(1). Let x i be the state after w time steps of the players playing according to their baseline actions. The state of agent i after w time steps is one of the following four forms: [ ai, u i, k i, a i, u b i, C, 0, 0 ] [ ] ai, u x i, k i, a i, u b i, H, c H i, n H i i = [ ] ai, u i, k i, a i, u b i, W, c W i, n W i [ ai, u i, k i, a i, u b i, D, 0, 0 ] We will refer to x i as the current state and refer to these four different states as content, hopeful, watchful, and discontent respectively. Because the players are repeating vectored actions, combining (10) and (13) gives us (( u i (l x) = u i l + ) a i ) x i j for any l {0, 1, 2,... }. Note that waiting the initial w periods was essential for ensuring the above equality for l = 0. For the forthcoming proof, we will focus on analyzing the state at the end of each period where we represent the state of agent i at the end of the l-th period by x j (l). We will prove the above claim by contradiction. In particular, if the utilities are not in the form specified by the claim, then we will specify a sequence of transitions, all that occur with probability O(1) in the unperturbed process, that lead to agent i becoming discontent. Once an agent becomes discontent the agent remains discontent for all future times in the unperturbed which completes the proof.

16 16 First, if the agent is discontent according to the state x then we are done. Accordingly, we will analyze each of the remaining possible states for agent i independently below. Case #1: Content We start with the case when agent i is content in the state x and suppose u i (l 0 x ) u i (l 1 x ) for some l 0, l 1 {0, 1, 2,... }. Let l0 {0, 1, 2,... } be the first such period where u j (l0 x ) u j (l0 1 x ) where we set u j ( 1 x ) = u b j. Player i will remain content to the end of the l0 period at which time the player transitions to hopeful if u i (l0 x) > u b i or watchful if u i (l0 x) < u b i. Suppose that u i (l0 x) < u b i. In this case, there is a probability O(1) that the state of agent i at the end of the l0 will be x i (l0) = [ a i, u i (l0 x), 1, a i, u b i, W, 1, n W i ] where n W i {n i {w + 1,..., w n + w} : n i mod i j a i = 0}. Note that this set is not empty since i j a i < w n. Furthermore, note that u i (l0 + n W i x) = u i (l0 x). Conditioned on this event, the state of agent i will remain watchful until the end of the n W i period at which it will transition to x i (l0 + n W i ) = [ a i, u i (l0 x), 1, a i, u i (l0 x), D, 0, 0] which completes the proof. Now suppose that u i (l0 x) > u b i. In this case, there is a probability O(1) that the state of agent i at the end of the l0 will be x i (l0) = [ a i, u i (l0 x), 1, a i, u b i, H, 1, n H i ] where n H i {n i {w + 1,..., w n + w 2 + w} : w mod i j a i = 0}. Conditioned on this event, the state of agent i will remain hopeful until the end of the n H i period at which it will transition to x i (l0 + n H i ) = [ a i, u i (l0 x), 1, a i, u i (l0 x), C, 0, 0]. Conditioned on this event, let l1 {1, 2,... } be the first such period where u j (l0 + l1 x) u j (l0 + l1 1 x). Player i will remain content to the end of the l0 + l1 period at which he transitions to hopeful if u i (l0 + l1 x) > u i (l0 x) or watchful if u i (l0 + l1 x) < u i (l0 x). If u i (l0 + l1 x) < u i (l0 x), then we can follow the first process depicted above which results in the agent becoming discontent and we are done. Otherwise, if u i (l0 + l1 x) > u i (l0 x), then we can follow the second process depicted above which results in the agent becoming content with a

17 17 baseline payoff u i (l0 + l1 x) > u i (l0 x) > u b i. Repeat the process depicted above. Note that an agent can only transition to hopeful a finite number of times, less than j i a j w n, before the agent will transition to watchful. Since this process happens with probability O(1), the mood of agent i will eventually transition to D. This completes the proof. Case #2: Hopeful Next, we focus on the case when agent i is hopeful in the state x and suppose u i (l 0 x ) u i (l 1 x ) for some l 0, l 1 {0, 1, 2,... }. In this case, agent i remains hopeful to the end of the n H i period at which point agent i transitions to content or watchful depending on how u i (n H i c H i x ) compares to u b i. If u i (n H i c H i x ) u b i, the the state of agent i at the end of the n H i c H i period is x i (n H i c H i ) = [ a i, u i (n H i c H i x ), 1, a i, u b i, C, 0, 0]. However, note that if the agent is in a state of this form, then this matches the form analyzed in Case #1; hence, with probability O(1) this agent will transition to discontent and we are done. Alternatively, if u i (n H i c H i x ) < u b i, then the state of agent i at the end of the n H i c H i period will be x i (n H i c H i ) = [ a i, u i (n H i c H i x ), 1, a i, u b i, W, 1, n W i ] where n W i {n i {w + 1,..., w n + w} : n i mod i j a i = 0} with probability O(1). Conditioned on this event, the agent will remain watchful for an additional n W i periods at which point the state of the agent will be x i (n W i + n H i c H i ) = [ a i, u i (n W i + n H i c H i x ), 1, a i, u i (n W i + n H i c H i x ), D, 0, 0] and we are done. This results from the fact that u i (n W i + n H i c H i x ) = u i (n H i c H i x ) < u b i. Case #3: Watchful Lastly, we focus on the case when agent i is watchful according to the state x. In this case, agent i remains watchful to the end of the n W i period at which point agent i transitions to hopeful or discontent depending on how u i (n W i c W i x ) compares to u b i. If u i (n W i c W i x ) < u b i, the the state of agent i at the end of the n W i c W i period will be x i (n W i c W i ) = [ a i, u i (n W i c W i x ), 1, a i, u i (n W i c W i x ), D, 0, 0] and we are done. Alternatively, suppose u i (n W i c W i x ) u b i. For the case, the state of agent i at the end of the n W i c W i period will be x i (n W i c W i ) = [ a i, u i (n W i c W i x ), 1, a i, u b i, H, 1, n H i ]

18 18 where n H i {n i {w + 1,..., w n + w} : n i mod i j a i = 0} with probability O(1). Conditioned on this event, the agent will remain hopeful for an additional n H i periods at which point the state of the agent will be x i (n H i + n W i c W i ) = [ a i, u i (n H i + n W i c W i ), 1, a i, u i (n H i + n W i c W i ), C, 0, 0]. However, note that the agent in this state now matches the form analyzed in Case #1; hence, with probability O(1) this agent will transition to discontent and we are done. This completes the proof. The next claim will show that in any recurrence class, if one agent is discontent, then all agents must be discontent. Claim 5. If a state x = (x 1,..., x n ) is in a recurrence class of the unperturbed process P 0 and m i = D for some agent i N, then m j = D for every agent j N. Proof: Suppose the state x is of the form depicted in Claims 3 and 4. We will focus on the case where every player plays according to their baseline action vector which occurs with probability O(1). As in the proof of Claim 4, let x i be the state after w time steps of the players playing according to their baseline actions. The state of each agent i N after w time steps is one of the following four forms: [ ai, u i, k i, a i, u b i, C, 0, 0 ] [ ] ai, u x i, k i, a i, u b i, H, c H i, n H i i = [ ] ai, u i, k i, a i, u b i, W, c W i, n W i [ ai, u i, k i, a i, u b i, D, 0, 0 ] We will refer to x as the current state, i.e., state at time 0, and refer to these four different states as content, hopeful, watchful, and discontent respectively. Let S N denote the subset of players that are discontent given the state x, i.e., m i = D for all agents i S and m j D for all agents j / S. If S = N then we are done. Otherwise, let ã {a(0 x ), a(1 x ),..., a(w n + w x )} be any ensuing action profile. By our interdependence condition, there exists a player j / S such that U j (ã) U j (a S, ã S ) for some action profile a S i S A i where ã S = {ã j : j / S}. Suppose all players play accordingly to their baseline action which happens with probability O(1). As in the previous

19 19 claims, we will focus on analyzing the state of agent j at the end of each period. We will denote the state of agent j at the end of the l-th period as x j (l). We begin by showing that if agent j is either hopeful or watchful, the agent transitions to being either content or discontent within 2w n periods with probability O(1). We start with the case when agent j is hopeful, i.e., the state of agent j is of the form x j = [ a j, u j, k j, a j, u b j, H, c H j, n H j ] where 0 < c H j n H j. The mood of agent j will continue to be hopeful until the end of the n H j -th period which yields a payoff of u H j = u j (n H j c H j x ). If u H j u b j, then at the end of the n H j period the state of agent j transitions to and we are done. Otherwise, if u H j j transitions to with probability O(1) where n W j x j (n H j c H j ) = [ a j, u H j, 1, a j, u H j, C, 0, 0] < u b j, then at the end of the n H j -th period the state of agent x j (n H j c H j ) = [ a j, u H j, 1, a j, u b j, W, 1, n W j ] {l {w +1,..., w n +w} : l mod i j a i = 0}. Note that this set is nonempty since i j a i w n. Conditioned on this event, we know that u j (n W j + n H j c H j x ) = u H j < u b j; hence, at the end of the (n W j + n H j )-th period the state of agent j transitions to x j (n W j + n H j c H j ) = [ a j, u H j, 1, a j, u j, D, 0, 0]. Similar arguments could be constructed to show that if agent j was initially watchful then the agent transitions to either content or discontent within the same number of periods (at most 2w n ) with probability O(1). We complete the proof by focusing on the case where the agent j is content or discontent, i.e., the state of agent j is of the form [ aj, u x j, 1, a j, u b j, C, 0, 0 ] j [ aj, u j, 1, a j, u b j, D, 0, 0 ] If agent j is discontent, then we can repeat the argument above for a new agent j which satisfies the interdependence condition. Otherwise, suppose agent j is content. If agent j s payoffs are not aligned, i.e., u j (l x ) u j (0 x ) for every l {0, 1,... }

20 20 then we can follow the arguments posed in the proof of Claim 4 which shows that agent j will become discontent with probability O(1). Now, suppose that agent j s payoffs are aligned, i.e., u j (l x ) = u j (0 x ) for every l {0, 1,... }. Consider the ensuing sequence of actions where for each k {0, 1,... } we have a(k x ) if a(k x ) ã, ã(k x ) = (a S, ã S ) if a(k x ) = ã. Note that such a sequence of actions will be played with probability O(1). Define ũ j ( ) in the same fashion as u j ( ) with the sole exception of using ã( x ) as opposed to a( x ). Suppose U j (a S, ã S ) < U j (ã) which in turn guarantees that ũ j (l x ) u j (l x ) = u b j for every l {0, 1,... }. Let l {0, 1,..., w n 1} denote the first time at which ũ j (l x ) < u j (l x ). Player j will remain content to the end of the l period at which time the player transitions to x j (l ) = [ a j, u j (l x ), 1, a j, u b j, W, 1, n W j ] with probability O(1) where n W j {l {w + 1,..., w n + w} : l mod i j a i = 0}. Conditioned on this event, the state of agent j after n W j and we are done. additional periods will be x j (l + n W i ) = [ a j, u j (l x ), 1, a j, u j (l x ), D, 0, 0] Alternatively, suppose U j (a S, ã S ) > U j (ã) which in turn guarantees that ũ j (l x ) u b j for every l {0, 1,... }. Player j will remain content to the end of the l period at which time the player transitions to x j (l ) = [ a j, u j (l x ), 1, a j, u b j, H, 1, n H j ] with probability O(1) where n H j {l {w + 1,..., w n + w} : l mod i j a i = 0}. Conditioned on this event, the state of agent j after n H i x j (l + n H i ) = [ a j, u j, 1, a j, u j, C, 0, 0] additional periods will be where u j = ũ j (l + n H j x ). Conditioned on this event, consider the case where the agents play according to a( x ) as opposed to ã( x ) for all future times and let ũ j ( ) reflect this change. Such a sequence will be played with probability O(1). Note that this situation is precisely the situation highlighted above; hence, the highlighted procedure demonstrates that agent j will transition to discontent with probability O(1). This completes the proof.

21 21 The following claim proves that either all agents must be content or all agents must be discontent in any recurrence class of the unperturbed process. Claim 6. If a state x = (x 1,..., x n ) is in a recurrence class of the unperturbed process P 0 then (i) m i = C for every agent i N or (ii) m i = D for every agent i N. Proof: Suppose the state x is of the form depicted in Claims 3 and 4. We will focus on the case where every player plays according to their baseline action vector which occurs with probability O(1). As in the proof of Claim 5, let x = (x 1,..., x n) be the state after w time steps of the players playing according to their baseline actions. First note that if m i = D for any agent i N, then by Claim 5 we know that m j = D for every agent j N. Hence we are done. Alternatively, suppose that m i {C, H, W } for every agent i N. By Claim 4, we know that since x and x are both in a recurrence class of the unperturbed process, then for every agent i N and every period l 1, l 2 {0, 1, 2,... } we have u i := u i (l 1 x ) = u i (l 2 x ). Suppose m i = H for some agent i N. Accordingly, the state of agent i is of the form x i = [ a i, u i, k i, a i, u b i, H, c H i, n H i ]. The mood of agent i will continue to be hopeful until the end of the n H i -th period which yields a payoff of u i. If u i u b i, then at the end of the n H i period the state of agent i transitions to x i (n H i c H i ) = [ a i, u i, 1, a i, u i, C, 0, 0]. Note that in this case, the agent will remain content for all future times and we are done. Otherwise, if u i < u b i, then at the end of the n H i -th period the state of agent i transitions to x i (n H i c H i ) = [ a i, u i, 1, a i, u b i, W, 1, n W i ] where n W i {w + 1,..., w n + w}. Since, u i = u i (l x ) for any l {0, 1,... }, at the end of the (n W i + n H i )-th period the state of agent j transitions to x i (n H i + n W i c H i ) = [ a i, u i, 1, a i, u i, D, 0, 0]. Hence, an agent cannot be hopeful in a recurrence class of the unperturbed process. Suppose m i = W for some agent i N. Accordingly, the state of agent i is of the form x i = [ a i, u i, k i, a i, u b i, W, c W i, n W i ].

22 22 The mood of agent i will continue to be watchful until the end of the n W i -th period which yields a payoff of u i. If u i < u b i, then at the end of the n W i period the state of agent i transitions to x i (n W i c W i ) = [ a i, u i, 1, a i, u i, D, 0, 0] and we are done. Otherwise, if u i u b i, then at the end of the n W i -th period the state of agent i transitions to x i (n W i c W i ) = [ a i, u i, 1, a i, u b i, H, 1, n H i ] where n H i {w + 1,..., w n + w}. Since, u i = u i (l x ) for any l {0, 1,... }, at the end of the (n H i + n W i )-th period the state of agent i transitions to x i (n H i + n W i c W i ) = [ a i, u i, 1, a i, u i, C, 0, 0]. Note that in this case, the agent will remain content for all future times and we are done. Hence, an agent cannot be watchful in a recurrence class of the unperturbed process. This completes the proof. The following claim finishes the proof of Lemma 2 by showing that in any recurrence class of the unperturbed process, if the agents are all content, then their baseline utilities must be aligned with their baseline action sequences. Claim 7. If a state x = (x 1,..., x n ) is in a recurrence class of the unperturbed process P 0 and m i = C for every agent i N then u b i = u i (l x) for every l {0, 1, 2,... }. Proof: By Claim 4, we know that if x is in a recurrence class of the unperturbed process then u i (l x) = u i (l x) for every l, l {0, 1,... }. Hence, we will complete the proof of this claim by showing that u b i = u i (0 x). As in the previous claims, we will prove this claim by contradiction. Suppose u i (0 x) > u b i for some agent i N. Then the state of player i at the end of the 0-th period will be x i (0) = [ a i, u i (0 x), 1, a i, u b i, H, 1, n H i ] where n H i {w + 1,..., w n + w}. Furthermore, after an additional n H i periods, the state of player i will be x i (n H i ) = [ a i, u i (0 x), 1, a i, u i (0 x), C, 0, 0]. The state of agent i will stay fixed for all future times so we are done.

23 23 Alternatively, suppose u i (0 x) < u b i for some agent i N. Then the state of player i at the end of the 0-th period will be x i (0) = [ a i, u i (0 x), 1, a i, u b i, W, 1, n W i ] where n W i {w + 1,..., w n + w}. Furthermore, after an additional n W i periods, the state of player i will be x i (n W i ) = [ a i, u i (0 x), 1, a i, u i (0 x), D, 0, 0]. The state of agent i will stay fixed for all future times so we are done. This completes the proof. B. Part #2: Derivation of Stochastically Stable States We know from [18] that the computation of the stochastically stable states can be reduced to an analysis of rooted trees on the vertex set consisting solely of the recurrence classes. To that end, we classify the recurrence classes as follows: We denote the collection of states of Form #2, i.e., m i = D for all agents i N, by a single variable D 0. For each state x = (x 1,..., x n ) of Form #1, consider the collection of states x(1), x(2),..., where for any agent i N and l {0, 1,... }, the state is of the form x i (l) = [ a i, u i (l), k i (l), a i, u i, C, 0, 0] where k i (l) = (k + l 1) mod a b i ) + 1 u i (0) = u i u i (k) = 1 k i + k U i(a(k x)) + k i + k 1 k i + k u i(k 1). for any k {1, 2,... }. Note that this collection of states represents a single recurrence class of the unperturbed process P 0. Consequently, we will represent this collection of states compactly by the tuple [ a i, u i, k i ]. We denote the collection of these recurrence classes by C 0. The set of recurrence classes of the unperturbed process are characterized by the set D C 0. The theory of resistance trees for regular perturbed processes provides an analytical technique

24 24 for evaluating the stochastically stable states using graph theoretic arguments constructed over the vertex set D C 0. Before proceeding with this derivation, we define the set of states C C 0 as follows: a state x C if for every player i N, every action a i l {0, 1,..., w n } we have 1 a i l a i k i k=l a i k i +1 U i (a i, a i (k x)) 1 a i l a i k i k=l a i k i +1 Note that if x C, then q(x) is a coarse correlated equilibrium. U i (a(k x)). A i, and every Definition 2 (Edge resistance). For every pair of distinct recurrence classes w and x, let r(w z) denote the total resistance of the least-resistance path that starts in w and ends in x. We call w z an edge and r(w z) the resistance of the edge. The following lemma will highlight five properties regarding the edge resistances. Lemma 8. The edge resistances defined over the states C 0 D satisfy the following five properties: (i) For any state x C 0, the resistance associated with the transitions D x satisfies r(d x) = n i N (ii) For any states x C 0 \ C and x C 0, the resistance associated with the transitions x x satisfies r(x x ) c + i N:u i <u i u i. (1 u i). (iii) For any sequence of transitions the form D x 0 x 1 x m = x where x k C 0 \ C for every k {0, 1,..., m 1} and x m C 0, the resistance associated with this sequence of transitions satisfies r(d x 0 ) + m 1 k=0 r(x k x k+1 ) mc + i N(1 u i ). (iv) For any states x C and x C 0 D, the resistance associated with the transition x x satisfies r(x x ) 2c.

25 25 (v) For any state x C, the resistance associated with the transition x D is r(x D) = 2c. Proof: The first three properties are relatively straightforward. Property (i) results from the fact that each agent needs to accept the baseline utility which has a resistance of 1 u i. Property (ii) results from two facts: First, at least one player needs to experiment in order to transition from a content state, which occurs with a resistance of c. Second, if a player transitions from a content state to an alternative content state with a lower baseline utility u i < u b i, then the agent must become content with this baseline payoff which occurs with a resistance of 1 u i. Lastly, Property (iii) follows immediately from Properties (i) and (ii). We will prove Property (iv) by demonstrating that it requires at least two experimentations in order to leave a state x C. To that end, let x i = [ a i, u i, k i, a i, u b i, C, 0, 0] denote the state of agent i N. Suppose agent i experiments with a constant block of actions (a i,..., a i ) A a i i at the beginning of the l i -th period of agent i. Likewise, assume this experimentation occurred during the periods {l j, l j + 1,..., l j } for each agent j. Since a i w and a j 1 we know that l j l j w. Since x C, we know that 1 a i l i a i k i k=l i a i k i +1 U i (a i, a i (k x)) u b i. (14) Hence, the state of agent i at the end of the l-th period will remain content and of the form where u i represents the sum expressed in (14). x i (l j ) = [ a i, u i, 1, a i, u b i, C, 0, 0] It is important to note that the utility of any other agent j i could have changed during the periods {l j, l j + 1,..., l j }. Let l j represent the first period for which u j (l j ) u b j. If u j (l j ) > u b j, then at the end of the l j -th period the state of agent j transitions to where n H j x j (l j ) = [ a j, u j (l j ), 1, a j, u b j, H, 1, n H j ]. {w + 1,..., w n + w}. However, note that for all k w + 1 > l j l j we have that u j (k) = u b j. Therefore, after n H i and we are done. additional periods, the state of agent j transitions to x j (l j + n H j ) = [ a j, u b j, 1, a j, u b j, C, 0, 0].

Achieving Pareto Optimality Through Distributed Learning

Achieving Pareto Optimality Through Distributed Learning 1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and

More information

Achieving Pareto Optimality Through Distributed Learning

Achieving Pareto Optimality Through Distributed Learning 1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and

More information

Achieving Pareto Optimality Through Distributed Learning

Achieving Pareto Optimality Through Distributed Learning 1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and

More information

1 Equilibrium Comparisons

1 Equilibrium Comparisons CS/SS 241a Assignment 3 Guru: Jason Marden Assigned: 1/31/08 Due: 2/14/08 2:30pm We encourage you to discuss these problems with others, but you need to write up the actual homework alone. At the top of

More information

Achieving Pareto Optimality Through Distributed Learning

Achieving Pareto Optimality Through Distributed Learning 1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and

More information

Performance Tradeoffs in Distributed Control Systems

Performance Tradeoffs in Distributed Control Systems University of Colorado, Boulder CU Scholar Aerospace Engineering Sciences Graduate Theses & Dissertations Aerospace Engineering Sciences Spring 1-1-2016 Performance Tradeoffs in Distributed Control Systems

More information

Achieving Pareto Optimality Through Distributed Learning

Achieving Pareto Optimality Through Distributed Learning 1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and

More information

Game Theoretic Learning in Distributed Control

Game Theoretic Learning in Distributed Control Game Theoretic Learning in Distributed Control Jason R. Marden Jeff S. Shamma November 1, 2016 May 11, 2017 (revised) Abstract In distributed architecture control problems, there is a collection of interconnected

More information

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative

More information

Understanding the Influence of Adversaries in Distributed Systems

Understanding the Influence of Adversaries in Distributed Systems Understanding the Influence of Adversaries in Distributed Systems Holly P. Borowski and Jason R. Marden Abstract Transitioning from a centralized to a distributed decision-making strategy can create vulnerability

More information

Weak Dominance and Never Best Responses

Weak Dominance and Never Best Responses Chapter 4 Weak Dominance and Never Best Responses Let us return now to our analysis of an arbitrary strategic game G := (S 1,...,S n, p 1,...,p n ). Let s i, s i be strategies of player i. We say that

More information

Decoupling Coupled Constraints Through Utility Design

Decoupling Coupled Constraints Through Utility Design 1 Decoupling Coupled Constraints Through Utility Design Na Li and Jason R. Marden Abstract The central goal in multiagent systems is to design local control laws for the individual agents to ensure that

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Understanding the Influence of Adversaries in Distributed Systems

Understanding the Influence of Adversaries in Distributed Systems Understanding the Influence of Adversaries in Distributed Systems Holly P. Borowski and Jason R. Marden Abstract Transitioning from a centralized to a distributed decision-making strategy can create vulnerability

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Self-stabilizing uncoupled dynamics

Self-stabilizing uncoupled dynamics Self-stabilizing uncoupled dynamics Aaron D. Jaggard 1, Neil Lutz 2, Michael Schapira 3, and Rebecca N. Wright 4 1 U.S. Naval Research Laboratory, Washington, DC 20375, USA. aaron.jaggard@nrl.navy.mil

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Connections Between Cooperative Control and Potential Games Illustrated on the Consensus Problem

Connections Between Cooperative Control and Potential Games Illustrated on the Consensus Problem Proceedings of the European Control Conference 2007 Kos, Greece, July 2-5, 2007 Connections Between Cooperative Control and Potential Games Illustrated on the Consensus Problem Jason R. Marden, Gürdal

More information

Generalized Efficiency Bounds In Distributed Resource Allocation

Generalized Efficiency Bounds In Distributed Resource Allocation 1 Generalized Efficiency Bounds In Distributed Resource Allocation Jason R. Marden Tim Roughgarden Abstract Game theory is emerging as a popular tool for distributed control of multiagent systems. To take

More information

ONLINE APPENDIX. Upping the Ante: The Equilibrium Effects of Unconditional Grants to Private Schools

ONLINE APPENDIX. Upping the Ante: The Equilibrium Effects of Unconditional Grants to Private Schools ONLINE APPENDIX Upping the Ante: The Equilibrium Effects of Unconditional Grants to Private Schools T. Andrabi, J. Das, A.I. Khwaja, S. Ozyurt, and N. Singh Contents A Theory A.1 Homogeneous Demand.................................

More information

Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks

Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks Marceau Coupechoux, Stefano Iellamo, Lin Chen + TELECOM ParisTech (INFRES/RMS) and CNRS LTCI + University

More information

Introduction to Game Theory

Introduction to Game Theory COMP323 Introduction to Computational Game Theory Introduction to Game Theory Paul G. Spirakis Department of Computer Science University of Liverpool Paul G. Spirakis (U. Liverpool) Introduction to Game

More information

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2 6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2 Daron Acemoglu and Asu Ozdaglar MIT October 14, 2009 1 Introduction Outline Mixed Strategies Existence of Mixed Strategy Nash Equilibrium

More information

6 Evolution of Networks

6 Evolution of Networks last revised: March 2008 WARNING for Soc 376 students: This draft adopts the demography convention for transition matrices (i.e., transitions from column to row). 6 Evolution of Networks 6. Strategic network

More information

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Computing Solution Concepts of Normal-Form Games. Song Chong EE, KAIST

Computing Solution Concepts of Normal-Form Games. Song Chong EE, KAIST Computing Solution Concepts of Normal-Form Games Song Chong EE, KAIST songchong@kaist.edu Computing Nash Equilibria of Two-Player, Zero-Sum Games Can be expressed as a linear program (LP), which means

More information

Games and Economic Behavior

Games and Economic Behavior Games and Economic Behavior 75 (2012) 882 897 Contents lists available at SciVerse ScienceDirect Games and Economic Behavior www.elsevier.com/locate/geb Learning efficient Nash equilibria in distributed

More information

Lecture Notes on Game Theory

Lecture Notes on Game Theory Lecture Notes on Game Theory Levent Koçkesen Strategic Form Games In this part we will analyze games in which the players choose their actions simultaneously (or without the knowledge of other players

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract

More information

Ergodicity and Non-Ergodicity in Economics

Ergodicity and Non-Ergodicity in Economics Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate

More information

Area I: Contract Theory Question (Econ 206)

Area I: Contract Theory Question (Econ 206) Theory Field Exam Summer 2011 Instructions You must complete two of the four areas (the areas being (I) contract theory, (II) game theory A, (III) game theory B, and (IV) psychology & economics). Be sure

More information

Industrial Organization Lecture 3: Game Theory

Industrial Organization Lecture 3: Game Theory Industrial Organization Lecture 3: Game Theory Nicolas Schutz Nicolas Schutz Game Theory 1 / 43 Introduction Why game theory? In the introductory lecture, we defined Industrial Organization as the economics

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria Tim Roughgarden November 4, 2013 Last lecture we proved that every pure Nash equilibrium of an atomic selfish routing

More information

Synthesis weakness of standard approach. Rational Synthesis

Synthesis weakness of standard approach. Rational Synthesis 1 Synthesis weakness of standard approach Rational Synthesis 3 Overview Introduction to formal verification Reactive systems Verification Synthesis Introduction to Formal Verification of Reactive Systems

More information

6.207/14.15: Networks Lecture 11: Introduction to Game Theory 3

6.207/14.15: Networks Lecture 11: Introduction to Game Theory 3 6.207/14.15: Networks Lecture 11: Introduction to Game Theory 3 Daron Acemoglu and Asu Ozdaglar MIT October 19, 2009 1 Introduction Outline Existence of Nash Equilibrium in Infinite Games Extensive Form

More information

STOCHASTIC STABILITY OF GROUP FORMATION IN COLLECTIVE ACTION GAMES. Toshimasa Maruta 1 and Akira Okada 2

STOCHASTIC STABILITY OF GROUP FORMATION IN COLLECTIVE ACTION GAMES. Toshimasa Maruta 1 and Akira Okada 2 STOCHASTIC STABILITY OF GROUP FORMATION IN COLLECTIVE ACTION GAMES Toshimasa Maruta 1 and Akira Okada 2 December 20, 2001 We present a game theoretic model of voluntary group formation in collective action

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture

More information

We set up the basic model of two-sided, one-to-one matching

We set up the basic model of two-sided, one-to-one matching Econ 805 Advanced Micro Theory I Dan Quint Fall 2009 Lecture 18 To recap Tuesday: We set up the basic model of two-sided, one-to-one matching Two finite populations, call them Men and Women, who want to

More information

האוניברסיטה העברית בירושלים

האוניברסיטה העברית בירושלים האוניברסיטה העברית בירושלים THE HEBREW UNIVERSITY OF JERUSALEM TOWARDS A CHARACTERIZATION OF RATIONAL EXPECTATIONS by ITAI ARIELI Discussion Paper # 475 February 2008 מרכז לחקר הרציונליות CENTER FOR THE

More information

A Note on the Existence of Ratifiable Acts

A Note on the Existence of Ratifiable Acts A Note on the Existence of Ratifiable Acts Joseph Y. Halpern Cornell University Computer Science Department Ithaca, NY 14853 halpern@cs.cornell.edu http://www.cs.cornell.edu/home/halpern August 15, 2018

More information

Tijmen Daniëls Universiteit van Amsterdam. Abstract

Tijmen Daniëls Universiteit van Amsterdam. Abstract Pure strategy dominance with quasiconcave utility functions Tijmen Daniëls Universiteit van Amsterdam Abstract By a result of Pearce (1984), in a finite strategic form game, the set of a player's serially

More information

6.891 Games, Decision, and Computation February 5, Lecture 2

6.891 Games, Decision, and Computation February 5, Lecture 2 6.891 Games, Decision, and Computation February 5, 2015 Lecture 2 Lecturer: Constantinos Daskalakis Scribe: Constantinos Daskalakis We formally define games and the solution concepts overviewed in Lecture

More information

Payoff Continuity in Incomplete Information Games

Payoff Continuity in Incomplete Information Games journal of economic theory 82, 267276 (1998) article no. ET982418 Payoff Continuity in Incomplete Information Games Atsushi Kajii* Institute of Policy and Planning Sciences, University of Tsukuba, 1-1-1

More information

Definitions and Proofs

Definitions and Proofs Giving Advice vs. Making Decisions: Transparency, Information, and Delegation Online Appendix A Definitions and Proofs A. The Informational Environment The set of states of nature is denoted by = [, ],

More information

WEAKLY DOMINATED STRATEGIES: A MYSTERY CRACKED

WEAKLY DOMINATED STRATEGIES: A MYSTERY CRACKED WEAKLY DOMINATED STRATEGIES: A MYSTERY CRACKED DOV SAMET Abstract. An informal argument shows that common knowledge of rationality implies the iterative elimination of strongly dominated strategies. Rationality

More information

NEGOTIATION-PROOF CORRELATED EQUILIBRIUM

NEGOTIATION-PROOF CORRELATED EQUILIBRIUM DEPARTMENT OF ECONOMICS UNIVERSITY OF CYPRUS NEGOTIATION-PROOF CORRELATED EQUILIBRIUM Nicholas Ziros Discussion Paper 14-2011 P.O. Box 20537, 1678 Nicosia, CYPRUS Tel.: +357-22893700, Fax: +357-22895028

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

Known Unknowns: Power Shifts, Uncertainty, and War.

Known Unknowns: Power Shifts, Uncertainty, and War. Known Unknowns: Power Shifts, Uncertainty, and War. Online Appendix Alexandre Debs and Nuno P. Monteiro May 10, 2016 he Appendix is structured as follows. Section 1 offers proofs of the formal results

More information

SF2972 Game Theory Exam with Solutions March 15, 2013

SF2972 Game Theory Exam with Solutions March 15, 2013 SF2972 Game Theory Exam with s March 5, 203 Part A Classical Game Theory Jörgen Weibull and Mark Voorneveld. (a) What are N, S and u in the definition of a finite normal-form (or, equivalently, strategic-form)

More information

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria 12. LOCAL SEARCH gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley h ttp://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Learning Approaches to the Witsenhausen Counterexample From a View of Potential Games

Learning Approaches to the Witsenhausen Counterexample From a View of Potential Games Learning Approaches to the Witsenhausen Counterexample From a View of Potential Games Na Li, Jason R. Marden and Jeff S. Shamma Abstract Since Witsenhausen put forward his remarkable counterexample in

More information

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Jacob W. Crandall and Michael A. Goodrich Computer Science Department Brigham Young University Provo, UT 84602

More information

A Modified Q-Learning Algorithm for Potential Games

A Modified Q-Learning Algorithm for Potential Games Preprints of the 19th World Congress The International Federation of Automatic Control A Modified Q-Learning Algorithm for Potential Games Yatao Wang Lacra Pavel Edward S. Rogers Department of Electrical

More information

Efficient Sensor Network Planning Method. Using Approximate Potential Game

Efficient Sensor Network Planning Method. Using Approximate Potential Game Efficient Sensor Network Planning Method 1 Using Approximate Potential Game Su-Jin Lee, Young-Jin Park, and Han-Lim Choi, Member, IEEE arxiv:1707.00796v1 [cs.gt] 4 Jul 2017 Abstract This paper addresses

More information

Area I: Contract Theory Question (Econ 206)

Area I: Contract Theory Question (Econ 206) Theory Field Exam Winter 2011 Instructions You must complete two of the three areas (the areas being (I) contract theory, (II) game theory, and (III) psychology & economics). Be sure to indicate clearly

More information

Preliminary Results on Social Learning with Partial Observations

Preliminary Results on Social Learning with Partial Observations Preliminary Results on Social Learning with Partial Observations Ilan Lobel, Daron Acemoglu, Munther Dahleh and Asuman Ozdaglar ABSTRACT We study a model of social learning with partial observations from

More information

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics Tim Roughgarden November 13, 2013 1 Do Players Learn Equilibria? In this lecture we segue into the third part of the course, which studies

More information

Realization Plans for Extensive Form Games without Perfect Recall

Realization Plans for Extensive Form Games without Perfect Recall Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in

More information

Monotonic ɛ-equilibria in strongly symmetric games

Monotonic ɛ-equilibria in strongly symmetric games Monotonic ɛ-equilibria in strongly symmetric games Shiran Rachmilevitch April 22, 2016 Abstract ɛ-equilibrium allows for worse actions to be played with higher probability than better actions. I introduce

More information

Equilibria in Games with Weak Payoff Externalities

Equilibria in Games with Weak Payoff Externalities NUPRI Working Paper 2016-03 Equilibria in Games with Weak Payoff Externalities Takuya Iimura, Toshimasa Maruta, and Takahiro Watanabe October, 2016 Nihon University Population Research Institute http://www.nihon-u.ac.jp/research/institute/population/nupri/en/publications.html

More information

Chapter 9. Mixed Extensions. 9.1 Mixed strategies

Chapter 9. Mixed Extensions. 9.1 Mixed strategies Chapter 9 Mixed Extensions We now study a special case of infinite strategic games that are obtained in a canonic way from the finite games, by allowing mixed strategies. Below [0, 1] stands for the real

More information

COORDINATION AND EQUILIBRIUM SELECTION IN GAMES WITH POSITIVE NETWORK EFFECTS

COORDINATION AND EQUILIBRIUM SELECTION IN GAMES WITH POSITIVE NETWORK EFFECTS COORDINATION AND EQUILIBRIUM SELECTION IN GAMES WITH POSITIVE NETWORK EFFECTS Alexander M. Jakobsen B. Curtis Eaton David Krause August 31, 2009 Abstract When agents make their choices simultaneously,

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Mixed Nash Equilibria

Mixed Nash Equilibria lgorithmic Game Theory, Summer 2017 Mixed Nash Equilibria Lecture 2 (5 pages) Instructor: Thomas Kesselheim In this lecture, we introduce the general framework of games. Congestion games, as introduced

More information

Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems

Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems March 17, 2011 Summary: The ultimate goal of this lecture is to finally prove Nash s theorem. First, we introduce and prove Sperner s

More information

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 These notes are based on Chapter 12 of David Blackwell and M. A.Girshick, Theory of Games and Statistical Decisions, John Wiley

More information

Microeconomics. 2. Game Theory

Microeconomics. 2. Game Theory Microeconomics 2. Game Theory Alex Gershkov http://www.econ2.uni-bonn.de/gershkov/gershkov.htm 18. November 2008 1 / 36 Dynamic games Time permitting we will cover 2.a Describing a game in extensive form

More information

Security Against Impersonation Attacks in Distributed Systems

Security Against Impersonation Attacks in Distributed Systems 1 Security Against Impersonation Attacks in Distributed Systems Philip N. Brown, Holly P. Borowski, and Jason R. Marden Abstract In a multi-agent system, transitioning from a centralized to a distributed

More information

Political Economy of Institutions and Development: Problem Set 1. Due Date: Thursday, February 23, in class.

Political Economy of Institutions and Development: Problem Set 1. Due Date: Thursday, February 23, in class. Political Economy of Institutions and Development: 14.773 Problem Set 1 Due Date: Thursday, February 23, in class. Answer Questions 1-3. handed in. The other two questions are for practice and are not

More information

Interacting Vehicles: Rules of the Game

Interacting Vehicles: Rules of the Game Chapter 7 Interacting Vehicles: Rules of the Game In previous chapters, we introduced an intelligent control method for autonomous navigation and path planning. The decision system mainly uses local information,

More information

Performance Analysis of Trial and Error Algorithms

Performance Analysis of Trial and Error Algorithms 1 Performance Analysis of Trial and Error Algorithms Jérôme Gaveau, Student Member, IEEE, Christophe J. Le Martret, Senior Member, IEEE and Mohamad Assaad, Senior Member, IEEE, arxiv:1711.01788v1 [cs.gt]

More information

Appendix of Homophily in Peer Groups The Costly Information Case

Appendix of Homophily in Peer Groups The Costly Information Case Appendix of Homophily in Peer Groups The Costly Information Case Mariagiovanna Baccara Leeat Yariv August 19, 2012 1 Introduction In this Appendix we study the information sharing application analyzed

More information

Correlated Equilibria: Rationality and Dynamics

Correlated Equilibria: Rationality and Dynamics Correlated Equilibria: Rationality and Dynamics Sergiu Hart June 2010 AUMANN 80 SERGIU HART c 2010 p. 1 CORRELATED EQUILIBRIA: RATIONALITY AND DYNAMICS Sergiu Hart Center for the Study of Rationality Dept

More information

Refinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible

Refinements - change set of equilibria to find better set of equilibria by eliminating some that are less plausible efinements efinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible Strategic Form Eliminate Weakly Dominated Strategies - Purpose - throwing

More information

Learning by (limited) forward looking players

Learning by (limited) forward looking players Learning by (limited) forward looking players Friederike Mengel Maastricht University April 2009 Abstract We present a model of adaptive economic agents who are k periods forward looking. Agents in our

More information

Notes on Coursera s Game Theory

Notes on Coursera s Game Theory Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have

More information

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about 0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 7 02 December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about Two-Player zero-sum games (min-max theorem) Mixed

More information

Computing Equilibria of Repeated And Dynamic Games

Computing Equilibria of Repeated And Dynamic Games Computing Equilibria of Repeated And Dynamic Games Şevin Yeltekin Carnegie Mellon University ICE 2012 July 2012 1 / 44 Introduction Repeated and dynamic games have been used to model dynamic interactions

More information

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8 0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 8 December 9, 2009 Scribe: Naama Ben-Aroya Last Week 2 player zero-sum games (min-max) Mixed NE (existence, complexity) ɛ-ne Correlated

More information

Game Theory for Linguists

Game Theory for Linguists Fritz Hamm, Roland Mühlenbernd 4. Mai 2016 Overview Overview 1. Exercises 2. Contribution to a Public Good 3. Dominated Actions Exercises Exercise I Exercise Find the player s best response functions in

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

BELIEFS & EVOLUTIONARY GAME THEORY

BELIEFS & EVOLUTIONARY GAME THEORY 1 / 32 BELIEFS & EVOLUTIONARY GAME THEORY Heinrich H. Nax hnax@ethz.ch & Bary S. R. Pradelski bpradelski@ethz.ch May 15, 217: Lecture 1 2 / 32 Plan Normal form games Equilibrium invariance Equilibrium

More information

Mechanism Design: Basic Concepts

Mechanism Design: Basic Concepts Advanced Microeconomic Theory: Economics 521b Spring 2011 Juuso Välimäki Mechanism Design: Basic Concepts The setup is similar to that of a Bayesian game. The ingredients are: 1. Set of players, i {1,

More information

Prediction and Playing Games

Prediction and Playing Games Prediction and Playing Games Vineel Pratap vineel@eng.ucsd.edu February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi K-Person Normal Form Games Each player k (k =,..., K)

More information

1 Lattices and Tarski s Theorem

1 Lattices and Tarski s Theorem MS&E 336 Lecture 8: Supermodular games Ramesh Johari April 30, 2007 In this lecture, we develop the theory of supermodular games; key references are the papers of Topkis [7], Vives [8], and Milgrom and

More information

Lecture Notes on Bargaining

Lecture Notes on Bargaining Lecture Notes on Bargaining Levent Koçkesen 1 Axiomatic Bargaining and Nash Solution 1.1 Preliminaries The axiomatic theory of bargaining originated in a fundamental paper by Nash (1950, Econometrica).

More information

Wars of Attrition with Budget Constraints

Wars of Attrition with Budget Constraints Wars of Attrition with Budget Constraints Gagan Ghosh Bingchao Huangfu Heng Liu October 19, 2017 (PRELIMINARY AND INCOMPLETE: COMMENTS WELCOME) Abstract We study wars of attrition between two bidders who

More information

Lecture 9 Classification of States

Lecture 9 Classification of States Lecture 9: Classification of States of 27 Course: M32K Intro to Stochastic Processes Term: Fall 204 Instructor: Gordan Zitkovic Lecture 9 Classification of States There will be a lot of definitions and

More information

Computation of Efficient Nash Equilibria for experimental economic games

Computation of Efficient Nash Equilibria for experimental economic games International Journal of Mathematics and Soft Computing Vol.5, No.2 (2015), 197-212. ISSN Print : 2249-3328 ISSN Online: 2319-5215 Computation of Efficient Nash Equilibria for experimental economic games

More information

Reinforcement Learning

Reinforcement Learning 5 / 28 Reinforcement Learning Based on a simple principle: More likely to repeat an action, if it had to a positive outcome. 6 / 28 Reinforcement Learning Idea of reinforcement learning first formulated

More information

A Game-Theoretic Analysis of Games with a Purpose

A Game-Theoretic Analysis of Games with a Purpose A Game-Theoretic Analysis of Games with a Purpose The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Published Version

More information

Possibility and Impossibility of Learning with Limited Behavior Rules

Possibility and Impossibility of Learning with Limited Behavior Rules Possibility and Impossibility of Learning with Limited Behavior Rules Takako Fujiwara-Greve Dept. of Economics Keio University, Tokyo, Japan and Norwegian School of Management BI, Sandvika, Norway and

More information

A Generic Bound on Cycles in Two-Player Games

A Generic Bound on Cycles in Two-Player Games A Generic Bound on Cycles in Two-Player Games David S. Ahn February 006 Abstract We provide a bound on the size of simultaneous best response cycles for generic finite two-player games. The bound shows

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Markov Processes Hamid R. Rabiee

Markov Processes Hamid R. Rabiee Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete

More information

Designing Games for Distributed Optimization

Designing Games for Distributed Optimization Designing Games for Distributed Optimization Na Li and Jason R. Marden Abstract The central goal in multiagent systems is to design local control laws for the individual agents to ensure that the emergent

More information