A Polynomial-time Nash Equilibrium Algorithm for Repeated Games
|
|
- Bryan Black
- 5 years ago
- Views:
Transcription
1 A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman Rutgers University Peter Stone The University of Texas at Austin
2 Main Result Present a polynomial-time algorithm for computing a Nash equilibrium for a 2- player, average-payoff repeated game. Not: A polynomial-time Nash equilibrium algorithm for one-shot games. This is a well-known open problem, possibly unnecessarily hard. 7/22/04 Polytime Repeated Nash 2
3 Example: Grid Game 3 U, D, R, L, X No move on collision Semiwalls (50%) A B (Hu & Wellman 01) -1 for step, -10 for collision, +100 for goal, 0 if back to initial config. Both can get goal. 7/22/04 Polytime Repeated Nash 3
4 Choices in Grid Game A XX B see: Hawks/Doves, Traffic, chicken Average reward: (32.3, 16.0), C, S (16.0, 32.3), S, C (-1.0, -1.0), C, C (15.8, 15.8), S, S (15.9, 15.9), mix (25.7, 25.8), L, F (25.8, 25.7), F, L 7/22/04 Polytime Repeated Nash 4
5 Grid Game 3: Matrix A B C S L F A C S L F B s matrix is the transpose of this. 7/22/04 Polytime Repeated Nash 5
6 One-Shot Strategy We play 1 round of (bimatrix game) GG3. Strategy is prob. distribution over choices. How do we choose? 7/22/04 Polytime Repeated Nash 6
7 Security Level Solution A doesn t know what B will do. Maximize reward in the worst case. If A plays C (prob. 0.01) and S (prob. 0.99), A s worst cases are C and F (15.85). (Defense) If B plays C (prob. 0.49) and F (prob. 0.51), A s best choices are C and S (15.85). (Attack) Computed efficiently via linear programming. Too pessimistic/paranoid? 7/22/04 Polytime Repeated Nash 7
8 Nash Equilibrium Pair of strategies such that neither player has incentive to deviate unilaterally. Always exists (Nash 51). Sometimes mixed. A B C S L F B C S L F B A C A C S S L L F F /22/04 Polytime Repeated Nash 8
9 Nash Values For GG3: (C, S) = (32.3, 16.0), very imbalanced, 24.2 each (S, C) =(16.0, 32.3) very imbalanced, 24.2 each ~1/2 mix (C/S, C/S) =(15.9, 15.9), imbalanced, very 15.9 each Computationally difficult to find in general. (L, F) =(25.7, 25.8), not Nash nearly balanced, 25.8 each 7/22/04 Polytime Repeated Nash 9
10 Repeated Games What if we face each other multiple times? Strategies: can be a function of history can be randomized Nash equilibrium still exists, of course. Philosophical claim: Equilibrium assumes games repeated; players choose best response. Computational observation: Easier to find. 7/22/04 Polytime Repeated Nash 10
11 Equilibrium in Repeated GG3 A: B: B faces L or C. Achieves max via F. Average: A faces F or C. L gets C gets But best vs. C gets 16.0, bringing avg to /22/04 Polytime Repeated Nash 11
12 Observations Can balance payoff by alternating roles. Like tit-for-tat from PD (Axelrod 84). Related to folk theorem 7/22/04 Polytime Repeated Nash 12
13 Repeated Games are Special Folk Theorem (Osborne & Rubinstein 94, e.g.): For any repeated game under the average-reward criterion, any achievable payoff profile that dominates the security-level payoffs is the payoff profile of a Nash equilibrium pair. Proof: Achievable payoff stabilized by each player threatening to reduce the other to its security level. 7/22/04 Polytime Repeated Nash 13
14 Algorithmic Application Algorithmic Result (Littman & Stone 03): For any two-player repeated game under the average-reward criterion, a Nash equilibrium pair of controllers can be synthesized in polynomial time. Builds on the structural Folk Theorem. Computational and representational result. Proof: Two tricks 7/22/04 Polytime Repeated Nash 14
15 Two-Player Plot Mark payoff for each action combination. Mark security level. Subtract security level (advantage game). 7/22/04 Polytime Repeated Nash 15
16 Two-Player Plot Mark payoff for each action combination. Mark security level. Subtract security level (advantage game). 7/22/04 Polytime Repeated Nash 16
17 Mutual advantage: Two Cases There is one or a pair of action combinations that can be averaged to get a point that dominates security level. Otherwise: There isn t. 7/22/04 Polytime Repeated Nash 17
18 Noticing Mutual Advantage Easy to state way: Compute convex hull. Easy to compute way: Check all pairs of action combinations. Advantage payoffs: x = (x 1,x 2 ), y = (y 1,y 2 ) Compute w x = (-y 2 (x 1 -y 1 )-y 1 (x 2 -y 2 ))/(2(x 2 -y 2 )(x 1 -y 1 )) If 0 w x 1, z = w x x + (1-w x ) y dominates security iff any combination does. Natural choice: Nash bargaining solution (Nash 50). 7/22/04 Polytime Repeated Nash 18
19 Counting Node Representation Nodes: probability distributions on actions Edges: opponent actions Counting nodes: repeat count, escape. c trick 1 π c i q * = π * iq π π * * i q... π iq * π * 7/22/04 Polytime Repeated Nash 19
20 Alternation Repeat one, then the other. Repeat. 7/22/04 Polytime Repeated Nash 20
21 Mutual Advantage Strategies Punish via attack strategy (α). Formulae for alternation (r i, r j ) and punishment (a 1, a 2 ) counts in paper. 7/22/04 Polytime Repeated Nash 21
22 Otherwise... Check defense against defense. If Nash, done. If not, at most one player can be improved unilaterally (since not mutual advantage) Defense against improved is Nash. trick 2 All steps polytime. Finds equilibrium. 7/22/04 Polytime Repeated Nash 22
23 Conclusion Threats can help. Find repeated Nash in polynomial time. Very simple structure for symmetric games. Some ideas work sequential games. 7/22/04 Polytime Repeated Nash 23
24 Future Work Discounted reward: as hard as one shot? More than two players: Feasible. Need uncoordinated punishment. Graphical games: Factored representation. Learning: Sizing up the opponent? Generalize to stochastic games. 7/22/04 Polytime Repeated Nash 24
25 From the paper: PD battle of sexes unbalanced game exponential game Examples 7/22/04 Polytime Repeated Nash 25
26 Symmetric Case R 1 (a, a ) = R 2 (a, a) Value of game just maximum average! Alternate or accept security-level. 7/22/04 Polytime Repeated Nash 26
27 Symmetric Markov Game AB BA Episodic Roles chosen randomly Algorithm: Maximize sum (MDP) Security-level (0-sum) Choose max if better Converges to Nash. 7/22/04 Polytime Repeated Nash 27
28 Discussion Objectives in game theory for agents? Desiderata? How learn state space when repeated? Multiobjective negotiation? Learning: combine leading and following? Different unknown discount rates?? Incomplete rationality? Incomplete information of rewards? 7/22/04 Polytime Repeated Nash 28
29 Markov Game S: Finite set of states A 1, A 2 : Finite set of action choices R 1 (s, a 1, a 2 ): Payoff to first player R 2 (s, a 1, a 2 ): Payoff to second player P(s s, a 1, a 2 ): Transition function G: Goal (terminal) states (subset of S) Objective: maximize expected total reward 7/22/04 Polytime Repeated Nash 29
30 Markov Games: Overview Combines Markov chain & matrix game: Players jointly set transitions and rewards One player: Markov decision processes Two-player zero sum best studied Also: sequential or stochastic games In general, equilibrium strategy probabilistic (unlike MDPs and games of alternation) 7/22/04 Polytime Repeated Nash 30
31 Zero-sum Markov Games How do we compute an equilibrium? Value iteration: Markov chain, except solve a mini zero-sum game at each stage. Work through example: Soccer showdown: two effective states 7/22/04 Polytime Repeated Nash 31
32 Complexity Results One player controls each state, alternating In NP co-np, in P? Otherwise, Optimal values can be irrational Even if transitions deterministic Can approximate iteratively 7/22/04 Polytime Repeated Nash 32
33 Collaborative Solution A Average total: (96, 96) (not Nash) A BA A won t wait. A AB B B changes incentives. Can we formalize collaboration like this? Simpler setting: matrix games 7/22/04 Polytime Repeated Nash 33
34 Repeated Matrix Game R1 = R 2 = One-state Markov game A 1 = A 2 = {cooperate, defect}: PD One (single-step) Nash 7/22/04 Polytime Repeated Nash 34
35 Two Special Cases Saddle-point equilibrium Deviation helps other player. Value is unique solution to zero-sum game. Coordination equilibrium Both players get maximum reward possible Value is unique max value R1 = Question: Can we check these properties efficiently? R 2 = /22/04 Polytime Repeated Nash 35
36 Tit-for-Tat R1 = R 2 = Saddle point, not coordination. Consider: cooperate, defect iff defected on. Better (3) than with defect-defect (1). In fact, pareto-optimal, although requires a sequence of decisions. 7/22/04 Polytime Repeated Nash 36
37 Tit-For-Tat is Nash Cooperation (TFT) is best response C: C, D: D = 3 C: C, D: C = 3 C: D, D: D = 1 C: D, D: C = 2.5 7/22/04 Polytime Repeated Nash 37
38 Generalized TFT TFT stabilizes mutually beneficial outcome. General class of policies: Play beneficial action Punish deviation to suppress temptation Need to generalize both components. 7/22/04 Polytime Repeated Nash 38
Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference
Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference Dipyaman Banerjee Department of Math & CS University of Tulsa Tulsa, OK, USA dipyaman@gmail.com Sandip Sen Department
More informationCyclic Equilibria in Markov Games
Cyclic Equilibria in Markov Games Martin Zinkevich and Amy Greenwald Department of Computer Science Brown University Providence, RI 02912 {maz,amy}@cs.brown.edu Michael L. Littman Department of Computer
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More informationNotes on Coursera s Game Theory
Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have
More informationBargaining Efficiency and the Repeated Prisoners Dilemma. Bhaskar Chakravorti* and John Conley**
Bargaining Efficiency and the Repeated Prisoners Dilemma Bhaskar Chakravorti* and John Conley** Published as: Bhaskar Chakravorti and John P. Conley (2004) Bargaining Efficiency and the repeated Prisoners
More informationBelief-based Learning
Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium
More informationMulti-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games
Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games Chris Murray Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Geoffrey J.
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationREPEATED GAMES. Jörgen Weibull. April 13, 2010
REPEATED GAMES Jörgen Weibull April 13, 2010 Q1: Can repetition induce cooperation? Peace and war Oligopolistic collusion Cooperation in the tragedy of the commons Q2: Can a game be repeated? Game protocols
More informationExtensive Form Games I
Extensive Form Games I Definition of Extensive Form Game a finite game tree X with nodes x X nodes are partially ordered and have a single root (minimal element) terminal nodes are z Z (maximal elements)
More informationCS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationIndustrial Organization Lecture 3: Game Theory
Industrial Organization Lecture 3: Game Theory Nicolas Schutz Nicolas Schutz Game Theory 1 / 43 Introduction Why game theory? In the introductory lecture, we defined Industrial Organization as the economics
More informationGame Theory: introduction and applications to computer networks
Game Theory: introduction and applications to computer networks Introduction Giovanni Neglia INRIA EPI Maestro 27 January 2014 Part of the slides are based on a previous course with D. Figueiredo (UFRJ)
More informationQuantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli
Quantum Games Quantum Strategies in Classical Games Presented by Yaniv Carmeli 1 Talk Outline Introduction Game Theory Why quantum games? PQ Games PQ penny flip 2x2 Games Quantum strategies 2 Game Theory
More informationFor general queries, contact
PART I INTRODUCTION LECTURE Noncooperative Games This lecture uses several examples to introduce the key principles of noncooperative game theory Elements of a Game Cooperative vs Noncooperative Games:
More informationSection Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018
Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections
More informationSolving Extensive Form Games
Chapter 8 Solving Extensive Form Games 8.1 The Extensive Form of a Game The extensive form of a game contains the following information: (1) the set of players (2) the order of moves (that is, who moves
More informationLearning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing
Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Jacob W. Crandall and Michael A. Goodrich Computer Science Department Brigham Young University Provo, UT 84602
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationCyber-Awareness and Games of Incomplete Information
Cyber-Awareness and Games of Incomplete Information Jeff S Shamma Georgia Institute of Technology ARO/MURI Annual Review August 23 24, 2010 Preview Game theoretic modeling formalisms Main issue: Information
More informationBargaining, Contracts, and Theories of the Firm. Dr. Margaret Meyer Nuffield College
Bargaining, Contracts, and Theories of the Firm Dr. Margaret Meyer Nuffield College 2015 Course Overview 1. Bargaining 2. Hidden information and self-selection Optimal contracting with hidden information
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationPrisoner s Dilemma. Veronica Ciocanel. February 25, 2013
n-person February 25, 2013 n-person Table of contents 1 Equations 5.4, 5.6 2 3 Types of dilemmas 4 n-person n-person GRIM, GRIM, ALLD Useful to think of equations 5.4 and 5.6 in terms of cooperation and
More informationSatisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games
Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),
More informationOptimal Convergence in Multi-Agent MDPs
Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,
More informationVII. Cooperation & Competition
VII. Cooperation & Competition A. The Iterated Prisoner s Dilemma Read Flake, ch. 17 4/23/18 1 The Prisoners Dilemma Devised by Melvin Dresher & Merrill Flood in 1950 at RAND Corporation Further developed
More informationLearning to Coordinate Efficiently: A Model-based Approach
Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion
More informationGame Theory. Professor Peter Cramton Economics 300
Game Theory Professor Peter Cramton Economics 300 Definition Game theory is the study of mathematical models of conflict and cooperation between intelligent and rational decision makers. Rational: each
More informationLearning to Compete, Compromise, and Cooperate in Repeated General-Sum Games
Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Jacob W. Crandall Michael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 USA crandall@cs.byu.edu
More informationMS&E 246: Lecture 4 Mixed strategies. Ramesh Johari January 18, 2007
MS&E 246: Lecture 4 Mixed strategies Ramesh Johari January 18, 2007 Outline Mixed strategies Mixed strategy Nash equilibrium Existence of Nash equilibrium Examples Discussion of Nash equilibrium Mixed
More informationCS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash
CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and
More information6 The Principle of Optimality
6 The Principle of Optimality De nition A T shot deviation from a strategy s i is a strategy bs i such that there exists T such that bs i (h t ) = s i (h t ) for all h t 2 H with t T De nition 2 A one-shot
More informationThe Reinforcement Learning Problem
The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence
More informationComputational Problems Related to Graph Structures in Evolution
BACHELOR THESIS Štěpán Šimsa Computational Problems Related to Graph Structures in Evolution Department of Applied Mathematics Supervisor of the bachelor thesis: Study programme: Study branch: Prof. Krishnendu
More informationGAMES: MIXED STRATEGIES
Prerequisites Almost essential Game Theory: Strategy and Equilibrium GAMES: MIXED STRATEGIES MICROECONOMICS Principles and Analysis Frank Cowell April 2018 Frank Cowell: Mixed Strategy Games 1 Introduction
More informationIncremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests
Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationIntroduction to Game Theory. Outline. Topics. Recall how we model rationality. Notes. Notes. Notes. Notes. Tyler Moore.
Introduction to Game Theory Tyler Moore Tandy School of Computer Science, University of Tulsa Slides are modified from version written by Benjamin Johnson, UC Berkeley Lecture 15 16 Outline 1 Preferences
More informationUnderstanding and Solving Societal Problems with Modeling and Simulation
Understanding and Solving Societal Problems with Modeling and Simulation Lecture 8: The Breakdown of Cooperation ETH Zurich April 15, 2013 Dr. Thomas Chadefaux Why Cooperation is Hard The Tragedy of the
More informationCS 188 Introduction to Fall 2007 Artificial Intelligence Midterm
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationA Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games
Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games Jun Li Jun.Li@warnerbros.com Kandethody Ramachandran ram@cas.usf.edu
More informationLearning in Zero-Sum Team Markov Games using Factored Value Functions
Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer
More informationIEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 7, JULY Yujing Hu, Yang Gao, Member, IEEE, andboan,member, IEEE
IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 7, JULY 2015 1289 Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer Yujing Hu, Yang Gao, Member, IEEE, andboan,member, IEEE Abstract
More informationDynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition
Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and
More informationRationality and bounded information in repeated games, with application to the iterated Prisoner s Dilemma
Journal of Mathematical Psychology 48 (24) 334 354 www.elsevier.com/locate/jmp Rationality and bounded information in repeated games, with application to the iterated Prisoner s Dilemma Matt Jones, Jun
More informationComputing Solution Concepts of Normal-Form Games. Song Chong EE, KAIST
Computing Solution Concepts of Normal-Form Games Song Chong EE, KAIST songchong@kaist.edu Computing Nash Equilibria of Two-Player, Zero-Sum Games Can be expressed as a linear program (LP), which means
More informationGame Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin
Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij
More informationOptimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games
Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Ronen I. Brafman Department of Computer Science Stanford University Stanford, CA 94305 brafman@cs.stanford.edu Moshe Tennenholtz
More informationTitle: The Castle on the Hill. Author: David K. Levine. Department of Economics UCLA. Los Angeles, CA phone/fax
Title: The Castle on the Hill Author: David K. Levine Department of Economics UCLA Los Angeles, CA 90095 phone/fax 310-825-3810 email dlevine@ucla.edu Proposed Running Head: Castle on the Hill Forthcoming:
More informationCorrelated Q-Learning
Journal of Machine Learning Research 1 (27) 1 1 Submitted /; Published / Correlated Q-Learning Amy Greenwald Department of Computer Science Brown University Providence, RI 2912 Keith Hall Department of
More informationIntroduction to Game Theory
COMP323 Introduction to Computational Game Theory Introduction to Game Theory Paul G. Spirakis Department of Computer Science University of Liverpool Paul G. Spirakis (U. Liverpool) Introduction to Game
More informationLecture 1. Evolution of Market Concentration
Lecture 1 Evolution of Market Concentration Take a look at : Doraszelski and Pakes, A Framework for Applied Dynamic Analysis in IO, Handbook of I.O. Chapter. (see link at syllabus). Matt Shum s notes are
More informationLectures Road Map
Lectures 0 - Repeated Games 4. Game Theory Muhamet Yildiz Road Map. Forward Induction Examples. Finitely Repeated Games with observable actions. Entry-Deterrence/Chain-store paradox. Repeated Prisoners
More informationMixed Strategies. Krzysztof R. Apt. CWI, Amsterdam, the Netherlands, University of Amsterdam. (so not Krzystof and definitely not Krystof)
Mixed Strategies Krzysztof R. Apt (so not Krzystof and definitely not Krystof) CWI, Amsterdam, the Netherlands, University of Amsterdam Mixed Strategies p. 1/1 Mixed Extension of a Finite Game Probability
More information: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8
0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 8 December 9, 2009 Scribe: Naama Ben-Aroya Last Week 2 player zero-sum games (min-max) Mixed NE (existence, complexity) ɛ-ne Correlated
More informationBARGAINING AND EFFICIENCY IN NETWORKS
BARGAINING AND EFFICIENCY IN NETWORKS DILIP ABREU AND MIHAI MANEA Department of Economics, Princeton University, dabreu@princeton.edu Department of Economics, Harvard University, mmanea@fas.harvard.edu
More informationNegotiation: Strategic Approach
Negotiation: Strategic pproach (September 3, 007) How to divide a pie / find a compromise among several possible allocations? Wage negotiations Price negotiation between a seller and a buyer Bargaining
More informationA Folk Theorem For Stochastic Games With Finite Horizon
A Folk Theorem For Stochastic Games With Finite Horizon Chantal Marlats January 2010 Chantal Marlats () A Folk Theorem For Stochastic Games With Finite Horizon January 2010 1 / 14 Introduction: A story
More informationA Few Games and Geometric Insights
A Few Games and Geometric Insights Brian Powers Arizona State University brpowers@asu.edu January 20, 2017 1 / 56 Outline 1 Nash/Correlated Equilibria A Motivating Example The Nash Equilibrium Correlated
More informationBounded Rationality, Strategy Simplification, and Equilibrium
Bounded Rationality, Strategy Simplification, and Equilibrium UPV/EHU & Ikerbasque Donostia, Spain BCAM Workshop on Interactions, September 2014 Bounded Rationality Frequently raised criticism of game
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:
More informationAlgorithmic Game Theory. Alexander Skopalik
Algorithmic Game Theory Alexander Skopalik Today Course Mechanics & Overview Introduction into game theory and some examples Chapter 1: Selfish routing Alexander Skopalik Skopalik@mail.uni-paderborn.de
More informationCSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes
CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationBELIEFS & EVOLUTIONARY GAME THEORY
1 / 32 BELIEFS & EVOLUTIONARY GAME THEORY Heinrich H. Nax hnax@ethz.ch & Bary S. R. Pradelski bpradelski@ethz.ch May 15, 217: Lecture 1 2 / 32 Plan Normal form games Equilibrium invariance Equilibrium
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationSome AI Planning Problems
Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533
More informationThe Folk Theorem for Finitely Repeated Games with Mixed Strategies
The Folk Theorem for Finitely Repeated Games with Mixed Strategies Olivier Gossner February 1994 Revised Version Abstract This paper proves a Folk Theorem for finitely repeated games with mixed strategies.
More informationWriting Game Theory in L A TEX
Writing Game Theory in L A TEX Thiago Silva First Version: November 22, 2015 This Version: November 13, 2017 List of Figures and Tables 1 2x2 Matrix: Prisoner s ilemma Normal-Form Game............. 3 2
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationFictitious Self-Play in Extensive-Form Games
Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making
More informationQUICR-learning for Multi-Agent Coordination
QUICR-learning for Multi-Agent Coordination Adrian K. Agogino UCSC, NASA Ames Research Center Mailstop 269-3 Moffett Field, CA 94035 adrian@email.arc.nasa.gov Kagan Tumer NASA Ames Research Center Mailstop
More informationConvergence and No-Regret in Multiagent Learning
Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent
More informationIterated Strict Dominance in Pure Strategies
Iterated Strict Dominance in Pure Strategies We know that no rational player ever plays strictly dominated strategies. As each player knows that each player is rational, each player knows that his opponents
More informationarxiv: v1 [cs.gt] 18 Dec 2017
Invincible Strategies of Iterated Prisoner s Dilemma Shiheng Wang and Fangzhen Lin Department of Computer Science The Hong Kong University of Science and Technology Clear Water Bay,Kowloon,Hong Kong arxiv:1712.06488v1
More informationEvolutionary Bargaining Strategies
Evolutionary Bargaining Strategies Nanlin Jin http://cswww.essex.ac.uk/csp/bargain Evolutionary Bargaining Two players alternative offering game x A =?? Player A Rubinstein 1982, 1985: Subgame perfect
More informationLearning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining
Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining Jeffrey L. Stimpson & ichael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 jstim,mike@cs.byu.edu
More informationComputation of Efficient Nash Equilibria for experimental economic games
International Journal of Mathematics and Soft Computing Vol.5, No.2 (2015), 197-212. ISSN Print : 2249-3328 ISSN Online: 2319-5215 Computation of Efficient Nash Equilibria for experimental economic games
More informationCS343 Artificial Intelligence
CS343 Artificial Intelligence Prof: Department of Computer Science The University of Texas at Austin Good Afternoon, Colleagues Good Afternoon, Colleagues Are there any questions? Logistics Problems with
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationThe Evolution of Cooperation under Cheap Pseudonyms
The Evolution of Cooperation under Cheap Pseudonyms Michal Feldman John Chuang School of Information Management and Systems University of California, Berkeley Abstract A wide variety of interactions on
More informationIntroduction to Game Theory. Outline. Proposal feedback. Proposal feedback: written feedback. Notes. Notes. Notes. Notes. Tyler Moore.
Introduction to Game Theory Tyler Moore CSE 7338 Computer Science & Engineering Department, SMU, Dallas, TX Lectures 7 8 Outline Proposal feedback 2 3 4 5 2 / 6 Proposal feedback Proposal feedback Each
More informationEconomics 703 Advanced Microeconomics. Professor Peter Cramton Fall 2017
Economics 703 Advanced Microeconomics Professor Peter Cramton Fall 2017 1 Outline Introduction Syllabus Web demonstration Examples 2 About Me: Peter Cramton B.S. Engineering, Cornell University Ph.D. Business
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationCMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING
More informationMicroeconomics. 2. Game Theory
Microeconomics 2. Game Theory Alex Gershkov http://www.econ2.uni-bonn.de/gershkov/gershkov.htm 18. November 2008 1 / 36 Dynamic games Time permitting we will cover 2.a Describing a game in extensive form
More informationCMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia
CMU 15-896 Noncooperative games 4: Stackelberg games Teacher: Ariel Procaccia A curious game Playing up is a dominant strategy for row player So column player would play left Therefore, (1,1) is the only
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationPatience and Ultimatum in Bargaining
Patience and Ultimatum in Bargaining Björn Segendorff Department of Economics Stockholm School of Economics PO Box 6501 SE-113 83STOCKHOLM SWEDEN SSE/EFI Working Paper Series in Economics and Finance No
More informationGraph topology and the evolution of cooperation
Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Graph topology and the evolution of cooperation Author(s) Li, Menglin
More informationNormal-form games. Vincent Conitzer
Normal-form games Vincent Conitzer conitzer@cs.duke.edu 2/3 of the average game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: A says 50 B says 10 C
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationReputation and Conflict
Reputation and Conflict Sandeep Baliga Northwestern University Tomas Sjöström Rutgers University July 2011 Abstract We study reputation in conflict games. The players can use their first round actions
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationGame Theory and its Applications to Networks - Part I: Strict Competition
Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More information