A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

Size: px
Start display at page:

Download "A Polynomial-time Nash Equilibrium Algorithm for Repeated Games"

Transcription

1 A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman Rutgers University Peter Stone The University of Texas at Austin

2 Main Result Present a polynomial-time algorithm for computing a Nash equilibrium for a 2- player, average-payoff repeated game. Not: A polynomial-time Nash equilibrium algorithm for one-shot games. This is a well-known open problem, possibly unnecessarily hard. 7/22/04 Polytime Repeated Nash 2

3 Example: Grid Game 3 U, D, R, L, X No move on collision Semiwalls (50%) A B (Hu & Wellman 01) -1 for step, -10 for collision, +100 for goal, 0 if back to initial config. Both can get goal. 7/22/04 Polytime Repeated Nash 3

4 Choices in Grid Game A XX B see: Hawks/Doves, Traffic, chicken Average reward: (32.3, 16.0), C, S (16.0, 32.3), S, C (-1.0, -1.0), C, C (15.8, 15.8), S, S (15.9, 15.9), mix (25.7, 25.8), L, F (25.8, 25.7), F, L 7/22/04 Polytime Repeated Nash 4

5 Grid Game 3: Matrix A B C S L F A C S L F B s matrix is the transpose of this. 7/22/04 Polytime Repeated Nash 5

6 One-Shot Strategy We play 1 round of (bimatrix game) GG3. Strategy is prob. distribution over choices. How do we choose? 7/22/04 Polytime Repeated Nash 6

7 Security Level Solution A doesn t know what B will do. Maximize reward in the worst case. If A plays C (prob. 0.01) and S (prob. 0.99), A s worst cases are C and F (15.85). (Defense) If B plays C (prob. 0.49) and F (prob. 0.51), A s best choices are C and S (15.85). (Attack) Computed efficiently via linear programming. Too pessimistic/paranoid? 7/22/04 Polytime Repeated Nash 7

8 Nash Equilibrium Pair of strategies such that neither player has incentive to deviate unilaterally. Always exists (Nash 51). Sometimes mixed. A B C S L F B C S L F B A C A C S S L L F F /22/04 Polytime Repeated Nash 8

9 Nash Values For GG3: (C, S) = (32.3, 16.0), very imbalanced, 24.2 each (S, C) =(16.0, 32.3) very imbalanced, 24.2 each ~1/2 mix (C/S, C/S) =(15.9, 15.9), imbalanced, very 15.9 each Computationally difficult to find in general. (L, F) =(25.7, 25.8), not Nash nearly balanced, 25.8 each 7/22/04 Polytime Repeated Nash 9

10 Repeated Games What if we face each other multiple times? Strategies: can be a function of history can be randomized Nash equilibrium still exists, of course. Philosophical claim: Equilibrium assumes games repeated; players choose best response. Computational observation: Easier to find. 7/22/04 Polytime Repeated Nash 10

11 Equilibrium in Repeated GG3 A: B: B faces L or C. Achieves max via F. Average: A faces F or C. L gets C gets But best vs. C gets 16.0, bringing avg to /22/04 Polytime Repeated Nash 11

12 Observations Can balance payoff by alternating roles. Like tit-for-tat from PD (Axelrod 84). Related to folk theorem 7/22/04 Polytime Repeated Nash 12

13 Repeated Games are Special Folk Theorem (Osborne & Rubinstein 94, e.g.): For any repeated game under the average-reward criterion, any achievable payoff profile that dominates the security-level payoffs is the payoff profile of a Nash equilibrium pair. Proof: Achievable payoff stabilized by each player threatening to reduce the other to its security level. 7/22/04 Polytime Repeated Nash 13

14 Algorithmic Application Algorithmic Result (Littman & Stone 03): For any two-player repeated game under the average-reward criterion, a Nash equilibrium pair of controllers can be synthesized in polynomial time. Builds on the structural Folk Theorem. Computational and representational result. Proof: Two tricks 7/22/04 Polytime Repeated Nash 14

15 Two-Player Plot Mark payoff for each action combination. Mark security level. Subtract security level (advantage game). 7/22/04 Polytime Repeated Nash 15

16 Two-Player Plot Mark payoff for each action combination. Mark security level. Subtract security level (advantage game). 7/22/04 Polytime Repeated Nash 16

17 Mutual advantage: Two Cases There is one or a pair of action combinations that can be averaged to get a point that dominates security level. Otherwise: There isn t. 7/22/04 Polytime Repeated Nash 17

18 Noticing Mutual Advantage Easy to state way: Compute convex hull. Easy to compute way: Check all pairs of action combinations. Advantage payoffs: x = (x 1,x 2 ), y = (y 1,y 2 ) Compute w x = (-y 2 (x 1 -y 1 )-y 1 (x 2 -y 2 ))/(2(x 2 -y 2 )(x 1 -y 1 )) If 0 w x 1, z = w x x + (1-w x ) y dominates security iff any combination does. Natural choice: Nash bargaining solution (Nash 50). 7/22/04 Polytime Repeated Nash 18

19 Counting Node Representation Nodes: probability distributions on actions Edges: opponent actions Counting nodes: repeat count, escape. c trick 1 π c i q * = π * iq π π * * i q... π iq * π * 7/22/04 Polytime Repeated Nash 19

20 Alternation Repeat one, then the other. Repeat. 7/22/04 Polytime Repeated Nash 20

21 Mutual Advantage Strategies Punish via attack strategy (α). Formulae for alternation (r i, r j ) and punishment (a 1, a 2 ) counts in paper. 7/22/04 Polytime Repeated Nash 21

22 Otherwise... Check defense against defense. If Nash, done. If not, at most one player can be improved unilaterally (since not mutual advantage) Defense against improved is Nash. trick 2 All steps polytime. Finds equilibrium. 7/22/04 Polytime Repeated Nash 22

23 Conclusion Threats can help. Find repeated Nash in polynomial time. Very simple structure for symmetric games. Some ideas work sequential games. 7/22/04 Polytime Repeated Nash 23

24 Future Work Discounted reward: as hard as one shot? More than two players: Feasible. Need uncoordinated punishment. Graphical games: Factored representation. Learning: Sizing up the opponent? Generalize to stochastic games. 7/22/04 Polytime Repeated Nash 24

25 From the paper: PD battle of sexes unbalanced game exponential game Examples 7/22/04 Polytime Repeated Nash 25

26 Symmetric Case R 1 (a, a ) = R 2 (a, a) Value of game just maximum average! Alternate or accept security-level. 7/22/04 Polytime Repeated Nash 26

27 Symmetric Markov Game AB BA Episodic Roles chosen randomly Algorithm: Maximize sum (MDP) Security-level (0-sum) Choose max if better Converges to Nash. 7/22/04 Polytime Repeated Nash 27

28 Discussion Objectives in game theory for agents? Desiderata? How learn state space when repeated? Multiobjective negotiation? Learning: combine leading and following? Different unknown discount rates?? Incomplete rationality? Incomplete information of rewards? 7/22/04 Polytime Repeated Nash 28

29 Markov Game S: Finite set of states A 1, A 2 : Finite set of action choices R 1 (s, a 1, a 2 ): Payoff to first player R 2 (s, a 1, a 2 ): Payoff to second player P(s s, a 1, a 2 ): Transition function G: Goal (terminal) states (subset of S) Objective: maximize expected total reward 7/22/04 Polytime Repeated Nash 29

30 Markov Games: Overview Combines Markov chain & matrix game: Players jointly set transitions and rewards One player: Markov decision processes Two-player zero sum best studied Also: sequential or stochastic games In general, equilibrium strategy probabilistic (unlike MDPs and games of alternation) 7/22/04 Polytime Repeated Nash 30

31 Zero-sum Markov Games How do we compute an equilibrium? Value iteration: Markov chain, except solve a mini zero-sum game at each stage. Work through example: Soccer showdown: two effective states 7/22/04 Polytime Repeated Nash 31

32 Complexity Results One player controls each state, alternating In NP co-np, in P? Otherwise, Optimal values can be irrational Even if transitions deterministic Can approximate iteratively 7/22/04 Polytime Repeated Nash 32

33 Collaborative Solution A Average total: (96, 96) (not Nash) A BA A won t wait. A AB B B changes incentives. Can we formalize collaboration like this? Simpler setting: matrix games 7/22/04 Polytime Repeated Nash 33

34 Repeated Matrix Game R1 = R 2 = One-state Markov game A 1 = A 2 = {cooperate, defect}: PD One (single-step) Nash 7/22/04 Polytime Repeated Nash 34

35 Two Special Cases Saddle-point equilibrium Deviation helps other player. Value is unique solution to zero-sum game. Coordination equilibrium Both players get maximum reward possible Value is unique max value R1 = Question: Can we check these properties efficiently? R 2 = /22/04 Polytime Repeated Nash 35

36 Tit-for-Tat R1 = R 2 = Saddle point, not coordination. Consider: cooperate, defect iff defected on. Better (3) than with defect-defect (1). In fact, pareto-optimal, although requires a sequence of decisions. 7/22/04 Polytime Repeated Nash 36

37 Tit-For-Tat is Nash Cooperation (TFT) is best response C: C, D: D = 3 C: C, D: C = 3 C: D, D: D = 1 C: D, D: C = 2.5 7/22/04 Polytime Repeated Nash 37

38 Generalized TFT TFT stabilizes mutually beneficial outcome. General class of policies: Play beneficial action Punish deviation to suppress temptation Need to generalize both components. 7/22/04 Polytime Repeated Nash 38

Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference

Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference Dipyaman Banerjee Department of Math & CS University of Tulsa Tulsa, OK, USA dipyaman@gmail.com Sandip Sen Department

More information

Cyclic Equilibria in Markov Games

Cyclic Equilibria in Markov Games Cyclic Equilibria in Markov Games Martin Zinkevich and Amy Greenwald Department of Computer Science Brown University Providence, RI 02912 {maz,amy}@cs.brown.edu Michael L. Littman Department of Computer

More information

Multiagent Value Iteration in Markov Games

Multiagent Value Iteration in Markov Games Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges

More information

Notes on Coursera s Game Theory

Notes on Coursera s Game Theory Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have

More information

Bargaining Efficiency and the Repeated Prisoners Dilemma. Bhaskar Chakravorti* and John Conley**

Bargaining Efficiency and the Repeated Prisoners Dilemma. Bhaskar Chakravorti* and John Conley** Bargaining Efficiency and the Repeated Prisoners Dilemma Bhaskar Chakravorti* and John Conley** Published as: Bhaskar Chakravorti and John P. Conley (2004) Bargaining Efficiency and the repeated Prisoners

More information

Belief-based Learning

Belief-based Learning Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium

More information

Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games

Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games Chris Murray Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Geoffrey J.

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

REPEATED GAMES. Jörgen Weibull. April 13, 2010

REPEATED GAMES. Jörgen Weibull. April 13, 2010 REPEATED GAMES Jörgen Weibull April 13, 2010 Q1: Can repetition induce cooperation? Peace and war Oligopolistic collusion Cooperation in the tragedy of the commons Q2: Can a game be repeated? Game protocols

More information

Extensive Form Games I

Extensive Form Games I Extensive Form Games I Definition of Extensive Form Game a finite game tree X with nodes x X nodes are partially ordered and have a single root (minimal element) terminal nodes are z Z (maximal elements)

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

Industrial Organization Lecture 3: Game Theory

Industrial Organization Lecture 3: Game Theory Industrial Organization Lecture 3: Game Theory Nicolas Schutz Nicolas Schutz Game Theory 1 / 43 Introduction Why game theory? In the introductory lecture, we defined Industrial Organization as the economics

More information

Game Theory: introduction and applications to computer networks

Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Introduction Giovanni Neglia INRIA EPI Maestro 27 January 2014 Part of the slides are based on a previous course with D. Figueiredo (UFRJ)

More information

Quantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli

Quantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli Quantum Games Quantum Strategies in Classical Games Presented by Yaniv Carmeli 1 Talk Outline Introduction Game Theory Why quantum games? PQ Games PQ penny flip 2x2 Games Quantum strategies 2 Game Theory

More information

For general queries, contact

For general queries, contact PART I INTRODUCTION LECTURE Noncooperative Games This lecture uses several examples to introduce the key principles of noncooperative game theory Elements of a Game Cooperative vs Noncooperative Games:

More information

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018 Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections

More information

Solving Extensive Form Games

Solving Extensive Form Games Chapter 8 Solving Extensive Form Games 8.1 The Extensive Form of a Game The extensive form of a game contains the following information: (1) the set of players (2) the order of moves (that is, who moves

More information

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Jacob W. Crandall and Michael A. Goodrich Computer Science Department Brigham Young University Provo, UT 84602

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Cyber-Awareness and Games of Incomplete Information

Cyber-Awareness and Games of Incomplete Information Cyber-Awareness and Games of Incomplete Information Jeff S Shamma Georgia Institute of Technology ARO/MURI Annual Review August 23 24, 2010 Preview Game theoretic modeling formalisms Main issue: Information

More information

Bargaining, Contracts, and Theories of the Firm. Dr. Margaret Meyer Nuffield College

Bargaining, Contracts, and Theories of the Firm. Dr. Margaret Meyer Nuffield College Bargaining, Contracts, and Theories of the Firm Dr. Margaret Meyer Nuffield College 2015 Course Overview 1. Bargaining 2. Hidden information and self-selection Optimal contracting with hidden information

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013 n-person February 25, 2013 n-person Table of contents 1 Equations 5.4, 5.6 2 3 Types of dilemmas 4 n-person n-person GRIM, GRIM, ALLD Useful to think of equations 5.4 and 5.6 in terms of cooperation and

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Optimal Convergence in Multi-Agent MDPs

Optimal Convergence in Multi-Agent MDPs Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,

More information

VII. Cooperation & Competition

VII. Cooperation & Competition VII. Cooperation & Competition A. The Iterated Prisoner s Dilemma Read Flake, ch. 17 4/23/18 1 The Prisoners Dilemma Devised by Melvin Dresher & Merrill Flood in 1950 at RAND Corporation Further developed

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Game Theory. Professor Peter Cramton Economics 300

Game Theory. Professor Peter Cramton Economics 300 Game Theory Professor Peter Cramton Economics 300 Definition Game theory is the study of mathematical models of conflict and cooperation between intelligent and rational decision makers. Rational: each

More information

Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games

Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Jacob W. Crandall Michael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 USA crandall@cs.byu.edu

More information

MS&E 246: Lecture 4 Mixed strategies. Ramesh Johari January 18, 2007

MS&E 246: Lecture 4 Mixed strategies. Ramesh Johari January 18, 2007 MS&E 246: Lecture 4 Mixed strategies Ramesh Johari January 18, 2007 Outline Mixed strategies Mixed strategy Nash equilibrium Existence of Nash equilibrium Examples Discussion of Nash equilibrium Mixed

More information

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

6 The Principle of Optimality

6 The Principle of Optimality 6 The Principle of Optimality De nition A T shot deviation from a strategy s i is a strategy bs i such that there exists T such that bs i (h t ) = s i (h t ) for all h t 2 H with t T De nition 2 A one-shot

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information

Computational Problems Related to Graph Structures in Evolution

Computational Problems Related to Graph Structures in Evolution BACHELOR THESIS Štěpán Šimsa Computational Problems Related to Graph Structures in Evolution Department of Applied Mathematics Supervisor of the bachelor thesis: Study programme: Study branch: Prof. Krishnendu

More information

GAMES: MIXED STRATEGIES

GAMES: MIXED STRATEGIES Prerequisites Almost essential Game Theory: Strategy and Equilibrium GAMES: MIXED STRATEGIES MICROECONOMICS Principles and Analysis Frank Cowell April 2018 Frank Cowell: Mixed Strategy Games 1 Introduction

More information

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Introduction to Game Theory. Outline. Topics. Recall how we model rationality. Notes. Notes. Notes. Notes. Tyler Moore.

Introduction to Game Theory. Outline. Topics. Recall how we model rationality. Notes. Notes. Notes. Notes. Tyler Moore. Introduction to Game Theory Tyler Moore Tandy School of Computer Science, University of Tulsa Slides are modified from version written by Benjamin Johnson, UC Berkeley Lecture 15 16 Outline 1 Preferences

More information

Understanding and Solving Societal Problems with Modeling and Simulation

Understanding and Solving Societal Problems with Modeling and Simulation Understanding and Solving Societal Problems with Modeling and Simulation Lecture 8: The Breakdown of Cooperation ETH Zurich April 15, 2013 Dr. Thomas Chadefaux Why Cooperation is Hard The Tragedy of the

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games

A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games Jun Li Jun.Li@warnerbros.com Kandethody Ramachandran ram@cas.usf.edu

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 7, JULY Yujing Hu, Yang Gao, Member, IEEE, andboan,member, IEEE

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 7, JULY Yujing Hu, Yang Gao, Member, IEEE, andboan,member, IEEE IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 7, JULY 2015 1289 Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer Yujing Hu, Yang Gao, Member, IEEE, andboan,member, IEEE Abstract

More information

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and

More information

Rationality and bounded information in repeated games, with application to the iterated Prisoner s Dilemma

Rationality and bounded information in repeated games, with application to the iterated Prisoner s Dilemma Journal of Mathematical Psychology 48 (24) 334 354 www.elsevier.com/locate/jmp Rationality and bounded information in repeated games, with application to the iterated Prisoner s Dilemma Matt Jones, Jun

More information

Computing Solution Concepts of Normal-Form Games. Song Chong EE, KAIST

Computing Solution Concepts of Normal-Form Games. Song Chong EE, KAIST Computing Solution Concepts of Normal-Form Games Song Chong EE, KAIST songchong@kaist.edu Computing Nash Equilibria of Two-Player, Zero-Sum Games Can be expressed as a linear program (LP), which means

More information

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij

More information

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Ronen I. Brafman Department of Computer Science Stanford University Stanford, CA 94305 brafman@cs.stanford.edu Moshe Tennenholtz

More information

Title: The Castle on the Hill. Author: David K. Levine. Department of Economics UCLA. Los Angeles, CA phone/fax

Title: The Castle on the Hill. Author: David K. Levine. Department of Economics UCLA. Los Angeles, CA phone/fax Title: The Castle on the Hill Author: David K. Levine Department of Economics UCLA Los Angeles, CA 90095 phone/fax 310-825-3810 email dlevine@ucla.edu Proposed Running Head: Castle on the Hill Forthcoming:

More information

Correlated Q-Learning

Correlated Q-Learning Journal of Machine Learning Research 1 (27) 1 1 Submitted /; Published / Correlated Q-Learning Amy Greenwald Department of Computer Science Brown University Providence, RI 2912 Keith Hall Department of

More information

Introduction to Game Theory

Introduction to Game Theory COMP323 Introduction to Computational Game Theory Introduction to Game Theory Paul G. Spirakis Department of Computer Science University of Liverpool Paul G. Spirakis (U. Liverpool) Introduction to Game

More information

Lecture 1. Evolution of Market Concentration

Lecture 1. Evolution of Market Concentration Lecture 1 Evolution of Market Concentration Take a look at : Doraszelski and Pakes, A Framework for Applied Dynamic Analysis in IO, Handbook of I.O. Chapter. (see link at syllabus). Matt Shum s notes are

More information

Lectures Road Map

Lectures Road Map Lectures 0 - Repeated Games 4. Game Theory Muhamet Yildiz Road Map. Forward Induction Examples. Finitely Repeated Games with observable actions. Entry-Deterrence/Chain-store paradox. Repeated Prisoners

More information

Mixed Strategies. Krzysztof R. Apt. CWI, Amsterdam, the Netherlands, University of Amsterdam. (so not Krzystof and definitely not Krystof)

Mixed Strategies. Krzysztof R. Apt. CWI, Amsterdam, the Netherlands, University of Amsterdam. (so not Krzystof and definitely not Krystof) Mixed Strategies Krzysztof R. Apt (so not Krzystof and definitely not Krystof) CWI, Amsterdam, the Netherlands, University of Amsterdam Mixed Strategies p. 1/1 Mixed Extension of a Finite Game Probability

More information

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8 0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 8 December 9, 2009 Scribe: Naama Ben-Aroya Last Week 2 player zero-sum games (min-max) Mixed NE (existence, complexity) ɛ-ne Correlated

More information

BARGAINING AND EFFICIENCY IN NETWORKS

BARGAINING AND EFFICIENCY IN NETWORKS BARGAINING AND EFFICIENCY IN NETWORKS DILIP ABREU AND MIHAI MANEA Department of Economics, Princeton University, dabreu@princeton.edu Department of Economics, Harvard University, mmanea@fas.harvard.edu

More information

Negotiation: Strategic Approach

Negotiation: Strategic Approach Negotiation: Strategic pproach (September 3, 007) How to divide a pie / find a compromise among several possible allocations? Wage negotiations Price negotiation between a seller and a buyer Bargaining

More information

A Folk Theorem For Stochastic Games With Finite Horizon

A Folk Theorem For Stochastic Games With Finite Horizon A Folk Theorem For Stochastic Games With Finite Horizon Chantal Marlats January 2010 Chantal Marlats () A Folk Theorem For Stochastic Games With Finite Horizon January 2010 1 / 14 Introduction: A story

More information

A Few Games and Geometric Insights

A Few Games and Geometric Insights A Few Games and Geometric Insights Brian Powers Arizona State University brpowers@asu.edu January 20, 2017 1 / 56 Outline 1 Nash/Correlated Equilibria A Motivating Example The Nash Equilibrium Correlated

More information

Bounded Rationality, Strategy Simplification, and Equilibrium

Bounded Rationality, Strategy Simplification, and Equilibrium Bounded Rationality, Strategy Simplification, and Equilibrium UPV/EHU & Ikerbasque Donostia, Spain BCAM Workshop on Interactions, September 2014 Bounded Rationality Frequently raised criticism of game

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

Algorithmic Game Theory. Alexander Skopalik

Algorithmic Game Theory. Alexander Skopalik Algorithmic Game Theory Alexander Skopalik Today Course Mechanics & Overview Introduction into game theory and some examples Chapter 1: Selfish routing Alexander Skopalik Skopalik@mail.uni-paderborn.de

More information

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

BELIEFS & EVOLUTIONARY GAME THEORY

BELIEFS & EVOLUTIONARY GAME THEORY 1 / 32 BELIEFS & EVOLUTIONARY GAME THEORY Heinrich H. Nax hnax@ethz.ch & Bary S. R. Pradelski bpradelski@ethz.ch May 15, 217: Lecture 1 2 / 32 Plan Normal form games Equilibrium invariance Equilibrium

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

The Folk Theorem for Finitely Repeated Games with Mixed Strategies

The Folk Theorem for Finitely Repeated Games with Mixed Strategies The Folk Theorem for Finitely Repeated Games with Mixed Strategies Olivier Gossner February 1994 Revised Version Abstract This paper proves a Folk Theorem for finitely repeated games with mixed strategies.

More information

Writing Game Theory in L A TEX

Writing Game Theory in L A TEX Writing Game Theory in L A TEX Thiago Silva First Version: November 22, 2015 This Version: November 13, 2017 List of Figures and Tables 1 2x2 Matrix: Prisoner s ilemma Normal-Form Game............. 3 2

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Fictitious Self-Play in Extensive-Form Games

Fictitious Self-Play in Extensive-Form Games Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making

More information

QUICR-learning for Multi-Agent Coordination

QUICR-learning for Multi-Agent Coordination QUICR-learning for Multi-Agent Coordination Adrian K. Agogino UCSC, NASA Ames Research Center Mailstop 269-3 Moffett Field, CA 94035 adrian@email.arc.nasa.gov Kagan Tumer NASA Ames Research Center Mailstop

More information

Convergence and No-Regret in Multiagent Learning

Convergence and No-Regret in Multiagent Learning Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent

More information

Iterated Strict Dominance in Pure Strategies

Iterated Strict Dominance in Pure Strategies Iterated Strict Dominance in Pure Strategies We know that no rational player ever plays strictly dominated strategies. As each player knows that each player is rational, each player knows that his opponents

More information

arxiv: v1 [cs.gt] 18 Dec 2017

arxiv: v1 [cs.gt] 18 Dec 2017 Invincible Strategies of Iterated Prisoner s Dilemma Shiheng Wang and Fangzhen Lin Department of Computer Science The Hong Kong University of Science and Technology Clear Water Bay,Kowloon,Hong Kong arxiv:1712.06488v1

More information

Evolutionary Bargaining Strategies

Evolutionary Bargaining Strategies Evolutionary Bargaining Strategies Nanlin Jin http://cswww.essex.ac.uk/csp/bargain Evolutionary Bargaining Two players alternative offering game x A =?? Player A Rubinstein 1982, 1985: Subgame perfect

More information

Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining

Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining Jeffrey L. Stimpson & ichael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 jstim,mike@cs.byu.edu

More information

Computation of Efficient Nash Equilibria for experimental economic games

Computation of Efficient Nash Equilibria for experimental economic games International Journal of Mathematics and Soft Computing Vol.5, No.2 (2015), 197-212. ISSN Print : 2249-3328 ISSN Online: 2319-5215 Computation of Efficient Nash Equilibria for experimental economic games

More information

CS343 Artificial Intelligence

CS343 Artificial Intelligence CS343 Artificial Intelligence Prof: Department of Computer Science The University of Texas at Austin Good Afternoon, Colleagues Good Afternoon, Colleagues Are there any questions? Logistics Problems with

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

The Evolution of Cooperation under Cheap Pseudonyms

The Evolution of Cooperation under Cheap Pseudonyms The Evolution of Cooperation under Cheap Pseudonyms Michal Feldman John Chuang School of Information Management and Systems University of California, Berkeley Abstract A wide variety of interactions on

More information

Introduction to Game Theory. Outline. Proposal feedback. Proposal feedback: written feedback. Notes. Notes. Notes. Notes. Tyler Moore.

Introduction to Game Theory. Outline. Proposal feedback. Proposal feedback: written feedback. Notes. Notes. Notes. Notes. Tyler Moore. Introduction to Game Theory Tyler Moore CSE 7338 Computer Science & Engineering Department, SMU, Dallas, TX Lectures 7 8 Outline Proposal feedback 2 3 4 5 2 / 6 Proposal feedback Proposal feedback Each

More information

Economics 703 Advanced Microeconomics. Professor Peter Cramton Fall 2017

Economics 703 Advanced Microeconomics. Professor Peter Cramton Fall 2017 Economics 703 Advanced Microeconomics Professor Peter Cramton Fall 2017 1 Outline Introduction Syllabus Web demonstration Examples 2 About Me: Peter Cramton B.S. Engineering, Cornell University Ph.D. Business

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Microeconomics. 2. Game Theory

Microeconomics. 2. Game Theory Microeconomics 2. Game Theory Alex Gershkov http://www.econ2.uni-bonn.de/gershkov/gershkov.htm 18. November 2008 1 / 36 Dynamic games Time permitting we will cover 2.a Describing a game in extensive form

More information

CMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia

CMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia CMU 15-896 Noncooperative games 4: Stackelberg games Teacher: Ariel Procaccia A curious game Playing up is a dominant strategy for row player So column player would play left Therefore, (1,1) is the only

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Patience and Ultimatum in Bargaining

Patience and Ultimatum in Bargaining Patience and Ultimatum in Bargaining Björn Segendorff Department of Economics Stockholm School of Economics PO Box 6501 SE-113 83STOCKHOLM SWEDEN SSE/EFI Working Paper Series in Economics and Finance No

More information

Graph topology and the evolution of cooperation

Graph topology and the evolution of cooperation Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Graph topology and the evolution of cooperation Author(s) Li, Menglin

More information

Normal-form games. Vincent Conitzer

Normal-form games. Vincent Conitzer Normal-form games Vincent Conitzer conitzer@cs.duke.edu 2/3 of the average game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: A says 50 B says 10 C

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Reputation and Conflict

Reputation and Conflict Reputation and Conflict Sandeep Baliga Northwestern University Tomas Sjöström Rutgers University July 2011 Abstract We study reputation in conflict games. The players can use their first round actions

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information